473,406 Members | 2,356 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

numF += 1 doesn't return correctly

TMS
119 100+
Another assignment, and another question. I am writing a concordance, and just to begin I'm trying to get it to find a word and count it correctly. Here is my code:

Expand|Select|Wrap|Line Numbers
  1. import sys
  2.  
  3. #filename = sys.argv[1]        #at some point this will be used, not now
  4.  
  5. def concordance(f, unique= True):
  6.     temp = open('someFile.txt')
  7.     for line in temp:
  8.         lineList = line.split(" ")
  9.         numF = 0
  10.         for word in lineList:
  11.             num = word.find(f)
  12.             if num == 0:
  13.                 print "num: ", num  #I want to see how its working
  14.                 numF +=1
  15.                 print "numF: ", numF  #I want to see how its adding
  16.     return f, numF
  17.  
  18.  
  19. print concordance("and")    
  20.  
The file I'm using is just jibberish:

NewFile and, more new file.
Now for some more myths, enjoy.
But what happen's
when I add an apostrophy?
Well, it seems to work, so (that) is good news!
The parrot is deceased.

The parrot is deceased.
The and some more and and some more and

I was using it for the pig latin test. Anyway, This is the result I get:
Expand|Select|Wrap|Line Numbers
  1. num:  0            #if it finds it shouldn't it be a 1??
  2. numF:  1          #see, here is the 1st and
  3. num:  0   
  4. numF:  1          #here is the next and... this should be 2
  5. num:  0
  6. numF:  2          #now it starts to count
  7. num:  0
  8. numF:  3
  9. num:  0
  10. numF:  4
  11. ('and', 4)
  12.  
It should be 'and', 5. but you can see that I am not coding it correctly so that it will add right. numF is 1 twice, then starts to add.

Your help would be appreciated. :)

TMS
Jan 26 '07 #1
25 2177
bartonc
6,596 Expert 4TB
Another assignment, and another question. I am writing a concordance, and just to begin I'm trying to get it to find a word and count it correctly. Here is my code:

Expand|Select|Wrap|Line Numbers
  1. import sys
  2.  
  3. #filename = sys.argv[1]        #at some point this will be used, not now
  4.  
  5. def concordance(f, unique= True):
  6.     temp = open('someFile.txt')
  7.     for line in temp:
  8.         lineList = line.split(" ")
  9.         numF = 0
  10.         for word in lineList:
  11.             num = word.find(f)
  12.             if num == 0:
  13.                 print "num: ", num  #I want to see how its working
  14.                 numF +=1
  15.                 print "numF: ", numF  #I want to see how its adding
  16.     return f, numF
  17.  
  18.  
  19. print concordance("and")    
  20.  
The file I'm using is just jibberish:

NewFile and, more new file.
Now for some more myths, enjoy.
But what happen's
when I add an apostrophy?
Well, it seems to work, so (that) is good news!
The parrot is deceased.

The parrot is deceased.
The and some more and and some more and

I was using it for the pig latin test. Anyway, This is the result I get:
Expand|Select|Wrap|Line Numbers
  1. num:  0            #if it finds it shouldn't it be a 1??
  2. numF:  1          #see, here is the 1st and
  3. num:  0   
  4. numF:  1          #here is the next and... this should be 2
  5. num:  0
  6. numF:  2          #now it starts to count
  7. num:  0
  8. numF:  3
  9. num:  0
  10. numF:  4
  11. ('and', 4)
  12.  
It should be 'and', 5. but you can see that I am not coding it correctly so that it will add right. numF is 1 twice, then starts to add.

Your help would be appreciated. :)

TMS
Ok. A couple of things:
Expand|Select|Wrap|Line Numbers
  1.             num = word.find(f)
is asking for the posision of f (say, "and") in word. What you want to do is increment a counter
Expand|Select|Wrap|Line Numbers
  1. if word == f:
  2.     numF += 1
unless you really want substrings to count as words.
If you want to keep track of which line/word you are at, I like python's enumerate function:
Expand|Select|Wrap|Line Numbers
  1. import sys
  2.  
  3. #filename = sys.argv[1]        #at some point this will be used, not now
  4.  
  5. def concordance(f, unique= True):
  6.     numF = 0
  7.     temp = open('someFile.txt')
  8.     for lineNum, line in enumerate(temp):
  9.         for wordNum, word in enumerate(line.split()): # split() defaults to any whitespace
  10.             if word == f:
  11.                 numF += 1
  12.                 print lineNum, wordNum, word
  13.     return numF
  14.  
Jan 27 '07 #2
bvdet
2,851 Expert Mod 2GB
Another assignment, and another question. I am writing a concordance, and just to begin I'm trying to get it to find a word and count it correctly. ........................
.......................................
Your help would be appreciated. :)

TMS
There is a better way to get a word count:
  • Use 'f.readlines()' to read the file into a list
  • Create a word list - wordList = "".join(fList).split()
  • Use list method wordList.count('word') to count the occurances of the word.

HTH :)
BV
Jan 27 '07 #3
bartonc
6,596 Expert 4TB
There is a better way to get a word count:
  • Use 'f.readlines()' to read the file into a list
  • Create a word list - wordList = "".join(fList).split()
  • Use list method wordList.count('word') to count the occurances of the word.

HTH :)
BV
Nice, BV. I have to admit ignorance of list.count(). It shows that we can all read the docs more or do things like
Expand|Select|Wrap|Line Numbers
  1. help(list)
more often.
But a concordance really needs to know where the occurance is located.
Jan 27 '07 #4
bvdet
2,851 Expert Mod 2GB
Nice, BV. I have to admit ignorance of list.count(). It shows that we can all read the docs more or do things like
Expand|Select|Wrap|Line Numbers
  1. help(list)
more often.
But a concordance really needs to know where the occurance is located.
I overlooked punctuation, otherwise a word could be counted like this:
Expand|Select|Wrap|Line Numbers
  1. open(data_file, 'r').read().split().count('and')
Punctuation could be stripped from each word:
Expand|Select|Wrap|Line Numbers
  1. wordList = []
  2. for word in wordList1:
  3.     wordList.append(word.strip(",.!?:()[]/\\"))
I had to look up 'concordance'!
Jan 27 '07 #5
TMS
119 100+
I just the enumerate code and I get this:

8 1 and
8 4 and
8 5 and
8 8 and

which isn't correct. It is saying that there are 8 lines and 8 times that and appears in the text. There is only 5 times that and appears, so its still the same problem. I do need to keep track of lines, and which line the word appears. First I was working on just counting the times and appears, and my code had it off by 1.

I will work on the other ideas now. Thank you
Jan 27 '07 #6
TMS
119 100+
ok, so, I reduced the amount of code by quite a bit, but I'm having the same problem. The count is off by 1.

Expand|Select|Wrap|Line Numbers
  1. import sys
  2.  
  3. #filename = sys.argv[1]
  4.  
  5. def concordance(f, unique= True):
  6.     temp = open('someFile.txt')
  7.     line = temp.readlines()
  8.     numF = 0
  9.     wordList = "".join(line).split()
  10.     numF = wordList.count(f)
  11.     return f, numF   
  12.  
  13. print concordance("and")    
  14.  
Result:

>>>
('and', 4)
>>>


Why in the world won't it count all of them?
TMS
Jan 27 '07 #7
bvdet
2,851 Expert Mod 2GB
ok, so, I reduced the amount of code by quite a bit, but I'm having the same problem. The count is off by 1.

Expand|Select|Wrap|Line Numbers
  1. import sys
  2.  
  3. #filename = sys.argv[1]
  4.  
  5. def concordance(f, unique= True):
  6.     temp = open('someFile.txt')
  7.     line = temp.readlines()
  8.     numF = 0
  9.     wordList = "".join(line).split()
  10.     numF = wordList.count(f)
  11.     return f, numF   
  12.  
  13. print concordance("and")    
  14.  
Result:

>>>
('and', 4)
>>>


Why in the world won't it count all of them?
TMS
Here's a couple of options:
Expand|Select|Wrap|Line Numbers
  1. def concordance(f, unique= True):
  2.     lines = open('someFile.txt').read()
  3.     numF = 0
  4.     wordList = []
  5.     wordList1 = "".join(lines).split()
  6.     for word in wordList1:
  7.         wordList.append(word.strip(",.!?:()[]/\\"))
  8.     numF = wordList.count(f)
  9.     return f, numF
  10.  
  11. >>> word, quan = concordance('and')
  12. >>> print "The word '%s' occured %s times." % (word, quan)
  13. The word 'and' occured 5 times.
  14.  
Borrowing from one of Barton's posts:
Expand|Select|Wrap|Line Numbers
  1. def concordance(f, unique= True):
  2.     numF = 0
  3.     temp = open(data_file, 'r')
  4.     for lineNum, line in enumerate(temp):
  5.         for wordNum, word in enumerate(line.split()): # split() defaults to any whitespace
  6.             if word.strip(",.!?:()[]/\\") == f:
  7.                 numF += 1
  8.                 print lineNum, wordNum, word
  9.     return numF
  10.  
  11. >>> word, quan = concordance('and')
  12. >>> print "The word '%s' occured %s times." % (word, quan)
  13. 0 1 and,
  14. 2 3 and.
  15. 8 1 and
  16. 8 4 and
  17. 8 5 and
  18. The word 'and' occured 5 times.
  19.  
Jan 27 '07 #8
TMS
119 100+
Cool. That works, thank you.

Now I have to work on only counting one instance per line, and what line it(they) appear on. That is why the unique = True is part of the function.

I will think on that for a bit, and see if I can find a way to make that work. If not... I will be back.....

TMS
Jan 27 '07 #9
bartonc
6,596 Expert 4TB
I had to look up 'concordance'!
It's a biblical thing (the reason I know the def).
Jan 27 '07 #10
bartonc
6,596 Expert 4TB
Cool. That works, thank you.

Now I have to work on only counting one instance per line, and what line it(they) appear on. That is why the unique = True is part of the function.

I will think on that for a bit, and see if I can find a way to make that work. If not... I will be back.....

TMS
Expand|Select|Wrap|Line Numbers
  1. def concordance(f, unique= True):
  2.     numF = 0
  3.     temp = open(data_file, 'r')
  4.     for lineNum, line in enumerate(temp):
  5.         for wordNum, word in enumerate(line.split()): # split() defaults to any whitespace
  6.             if word.strip(",.!?:()[]/\\") == f:
  7.                 numF += 1
  8.                 print lineNum, wordNum, word
  9.                 break
  10.     return numF
Jan 27 '07 #11
TMS
119 100+
You figured it out! What was it?

I was thinking that the count started on 0, that might cause it to be 1 off, but on other words it seemed to work. I'm looking at the code, and I just tested it and it works, so I'm trying to understand what you changed other than adding the print and break statement.
Jan 27 '07 #12
TMS
119 100+
please disregard my previous reply as it was premature.
At this point, I'm more concerned with keeping track of the line that the word shows up in. After re-reading the instructions, I only need to show the word, the amount of lines it shows up and which lines.

I think he wants us to do this with a dictionary, but it seems to me that I will have to start with a list, then convert the list to a dictionary using zip or something.

Well, I'm still working on it. I hate to admit I dreamed about it last night, woke up several times going over it in my mind. I think I'm going crazy.

tms

:(
Jan 27 '07 #13
bartonc
6,596 Expert 4TB
please disregard my previous reply as it was premature.
At this point, I'm more concerned with keeping track of the line that the word shows up in. After re-reading the instructions, I only need to show the word, the amount of lines it shows up and which lines.

I think he wants us to do this with a dictionary, but it seems to me that I will have to start with a list, then convert the list to a dictionary using zip or something.

Well, I'm still working on it. I hate to admit I dreamed about it last night, woke up several times going over it in my mind. I think I'm going crazy.

tms

:(
That's the way it goes... Not crazy, maybe addicted.. The problem is a good one. The cool thing about using a dictionary is that it could grow into a true concordance. For now it'll just have one word in it so it may seem a bit overboard. It seems to me that you may want a dictionary whose keys are the words that your function has processed and whose value is actually a list containing a list of line occurances and the total number of qualified lines:
Expand|Select|Wrap|Line Numbers
  1. def concordance(f, unique= True):
  2.     concDict = {f:[[],0]}  # eventually, create this dict outside the function
  3.     temp = open(data_file, 'r')
  4.     for lineNum, line in enumerate(temp):
  5.         for word in line.split(): # split() defaults to any whitespace
  6.             if word.strip(",.!?:()[]/\\") == f:
  7.                 concDict[f][0].append(lineNum)
  8.                 concDict[f][1] += 1
  9.                 break
  10.     return concDict
  11.  
  12. for key, value in concordance("and").items():
  13.     print key, value
Jan 27 '07 #14
TMS
119 100+
ok, how about this?

Expand|Select|Wrap|Line Numbers
  1. def concordance(f, unique= True):
  2.     lineNumber = []
  3.     temp = open("someFile.txt", 'r')
  4.     for lineNum, line in enumerate(temp):
  5.         for wordNum, word in enumerate(line.split()): # split() defaults to any whitespace
  6.             if word.strip(",.!?:()[]/\\") == f:
  7.                 lineNumber.append(lineNum)
  8.                 break
  9.     return f, lineNumber
  10.  
  11. print concordance("and")
  12.  
  13.  
My text is this:

This is the first line and I don't know how many more there will be.
This is the second line and I assure you there will be more, and more.
This is the third line.
This is the fourth and so on.
This is the fifth, and there are 2 and words in this line.

and it returns this:
('and', [1, 2, 4, 5])

which is what I want because then I can put the words into a list, and the line numbers are already in a list. I can use Zip or Map (I need to investigate both) to combine them in a dictionary that would look like this:

dict1 = {'and':['1','2','4','5'], 'this':['1','2','3'.'4'.'5']} etc. Then, the second part of the assignment requires that I write a script, not a module (?) concord that compiles concordances for every file specified on its command line, merges them into one big one, then prints them out in alpabetical order fo the keys, like this:

and (4):
someFile.txt: 1,2,4,5
this (5):
someFile.txt: 1,2,3,4,5
zoo (2):
someOtherfile.txt: 24
andAnotherText.txt: 36

There are idiosyncracies (wow, did I spell that right?) that I still need to work out and understand, but this first part is done, so I can start on the second part and ask some questions in class on Monday.

Thanks!!!
Jan 28 '07 #15
bartonc
6,596 Expert 4TB
ok, how about this?

Expand|Select|Wrap|Line Numbers
  1. def concordance(f, unique= True):
  2.     lineNumber = []
  3.     temp = open("someFile.txt", 'r')
  4.     for lineNum, line in enumerate(temp):
  5.         for wordNum, word in enumerate(line.split()): # split() defaults to any whitespace
  6.             if word.strip(",.!?:()[]/\\") == f:
  7.                 lineNumber.append(lineNum)
  8.                 break
  9.     return f, lineNumber
  10.  
  11. print concordance("and")
  12.  
  13.  
My text is this:

This is the first line and I don't know how many more there will be.
This is the second line and I assure you there will be more, and more.
This is the third line.
This is the fourth and so on.
This is the fifth, and there are 2 and words in this line.

and it returns this:
('and', [1, 2, 4, 5])

which is what I want because then I can put the words into a list, and the line numbers are already in a list. I can use Zip or Map (I need to investigate both) to combine them in a dictionary that would look like this:

dict1 = {'and':['1','2','4','5'], 'this':['1','2','3'.'4'.'5']} etc. Then, the second part of the assignment requires that I write a script, not a module (?) concord that compiles concordances for every file specified on its command line, merges them into one big one, then prints them out in alpabetical order fo the keys, like this:

and (4):
someFile.txt: 1,2,4,5
this (5):
someFile.txt: 1,2,3,4,5
zoo (2):
someOtherfile.txt: 24
andAnotherText.txt: 36

There are idiosyncracies (wow, did I spell that right?) that I still need to work out and understand, but this first part is done, so I can start on the second part and ask some questions in class on Monday.

Thanks!!!
You're welcome. Which ever way makes most sense to you is the "best" way for you to do this. This looks great. keep it up.
Jan 28 '07 #16
TMS
119 100+
OK, I'm stuck again.

It seems that my concordance won't work the way it is written because it requires the whole file to be run at once. In other words, the function will read in the file (it does that) and make a cross-listing keyed by every word in the file. The value associated with each key is a sequence whose elements are the name of the file and the line within that file on which the name appears. In other words, only one entry in the sequence for each instance of the word.

My problem is that if I bring the entire file in as a list, I would need to convert it to a dictionary ('keys'). Then each key has the line number associated with the key (the word) and if it appears more than once on a line, that line number has the value of how many times the word appears on the line.

Right now my function takes one word at a time. It should be printed on the previous post, but I'll put it here again, since I've changed it a bit:

Expand|Select|Wrap|Line Numbers
  1. import sys
  2.  
  3. filename= sys.argv[1]
  4.  
  5. def concordance(f, unique= True):
  6.     lineNumber = []
  7.     temp = open(filename, 'r')
  8.     for lineNum, line in enumerate(temp):
  9.         a = line.count(f)
  10.         for wordNum, word in enumerate(line.split()):
  11.             word = word.lower()
  12.             if word.strip(",.!?:()[]/\\") == f:
  13.                 lineNumber.append(lineNum+1)    #list index starts with 0, chng to start with 1
  14.                 if a > 1:
  15.                     words = {f: (a, [lineNumber])}
  16.                 else:
  17.                     words = {f: [lineNumber]}
  18.                 break
  19.     return words
  20.  
  21.  
  22. print concordance("and")
  23.  
So, my question is this: How do I iterate through the file and add each word to the dictionary? When I try I end up with the last word only. Also, this function gives a count if the word appears on a line more than once, but its not real effective because it doesn't assign it to the line, it just sort of counts it. Really what I want in its place is to count how many times the word shows up in the whole file. Later I will work on the individual lines (if I ever get that far).

I thought I was so close. Now I think I have to start over :(
Jan 30 '07 #17
TMS
119 100+
I think I'm closer, but I'm still having trouble iterating through the file. I decided I could use the code I have, IFF (the mathematical if and only if) I can iterate the file seperately. Treat it like a state machine or something. Here is my code:

Expand|Select|Wrap|Line Numbers
  1. import sys
  2.  
  3. filename= sys.argv[1]
  4.  
  5. conDict = {}
  6. def concordance(f, unique= True):
  7.     lineNumber = []
  8.     temp = open(filename, 'r')
  9.     for lineNum, line in enumerate(temp):
  10.         a = line.count(f)
  11.         for wordNum, word in enumerate(line.split()):
  12.             word = word.lower()
  13.             if word.strip(",.!?:()[]/\\") == f:
  14.                 lineNumber.append(lineNum+1)    #list index starts with 0, change to 1
  15.                 if a > 1:
  16.                     words = {f: (a, [lineNumber])}
  17.                 else:
  18.                     words = {f: [lineNumber]}
  19.                 break
  20.     conDict.update(words)
  21.     return conDict
  22.  
  23.  
  24. for word in enumerate(filename):
  25.     concordance(word)
  26.     word += 1
  27. print conDict
  28.  
  29.  
but I get the following error:

C:\Python25>python concordance2.py someFile.txt
Traceback (most recent call last):
File "concordance2.py", line 25, in <module>
concordance(word)
File "concordance2.py", line 10, in concordance
a = line.count(f)
TypeError: expected a character buffer object

SO, any ideas? I would really like to hand in this project on Wednesday, and this is only the first part! The second part is making a script that compiles a whole bunch of files into one concordance..... using this module, of course.

Thank you for your help!
Jan 30 '07 #18
bvdet
2,851 Expert Mod 2GB
I think I'm closer, but I'm still having trouble iterating through the file. I decided I could use the code I have, IFF (the mathematical if and only if) I can iterate the file seperately. Treat it like a state machine or something. Here is my code:

Expand|Select|Wrap|Line Numbers
  1. import sys
  2.  
  3. filename= sys.argv[1]
  4.  
  5. conDict = {}
  6. def concordance(f, unique= True):
  7.     lineNumber = []
  8.     temp = open(filename, 'r')
  9.     for lineNum, line in enumerate(temp):
  10.         a = line.count(f)
  11.         for wordNum, word in enumerate(line.split()):
  12.             word = word.lower()
  13.             if word.strip(",.!?:()[]/\\") == f:
  14.                 lineNumber.append(lineNum+1)    #list index starts with 0, change to 1
  15.                 if a > 1:
  16.                     words = {f: (a, [lineNumber])}
  17.                 else:
  18.                     words = {f: [lineNumber]}
  19.                 break
  20.     conDict.update(words)
  21.     return conDict
  22.  
  23.  
  24. for word in enumerate(filename):
  25.     concordance(word)
  26.     word += 1
  27. print conDict
  28.  
  29.  
but I get the following error:

C:\Python25>python concordance2.py someFile.txt
Traceback (most recent call last):
File "concordance2.py", line 25, in <module>
concordance(word)
File "concordance2.py", line 10, in concordance
a = line.count(f)
TypeError: expected a character buffer object

SO, any ideas? I would really like to hand in this project on Wednesday, and this is only the first part! The second part is making a script that compiles a whole bunch of files into one concordance..... using this module, of course.

Thank you for your help!
I would go about it something like this:
Expand|Select|Wrap|Line Numbers
  1. conDict = {}
  2. d, lines = wordList(data_file)
  3.  
  4. for item in d.items():
  5.     conDict.update(concordance({item[0]: item[1]}, lines)) 
where d is a dictionary of unique words from the text file and lines is a list of lines from the text file. Function 'wordList' opens the file, reads the file with 'readlines()', closes the file, compiles a dictionary of unique words (already stripped and case lowered), and returns the word dictionary and line list. Each item in the dictionary would be like this: ['word': [[],0]. I have written the function 'wordList' and modified 'concordance', but I thought you may want to write it yourself. Here's my output:
Expand|Select|Wrap|Line Numbers
  1. {'and': [[1, 3, 9, 9, 9], 5], 'enjoy': [[2], 1], 'latin': [[11], 1], 'anyway': [[], 0], 'get': [[11], 1], 'when': [[4], 1], 'is': [[5, 6, 8, 11], 4], 'some': [[2, 9, 9], 3], 'it': [[5, 11], 2], 'but': [[], 0], 'an': [[4], 1], 'this': [[], 0], 'good': [[5], 1], 'result': [[11], 1], 'file': [[1], 1], 'news': [[5], 1], 'using': [[11], 1], 'pig': [[11], 1], 'work': [[5], 1], 'newfile': [[], 0], 'well': [[], 0], 'what': [[3], 1], 'now': [[], 0], "happen's": [[3], 1], 'for': [[2, 11], 2], 'i': [[], 0], 'that': [[5], 1], 'seems': [[5], 1], 'apostrophy': [[4], 1], 'myths': [[2], 1], 'to': [[5], 1], 'add': [[4], 1], 'so': [[5], 1], 'test': [[11], 1], 'new': [[1], 1], 'the': [[11, 11], 2], 'was': [[11], 1], 'parrot': [[6, 8], 2], 'deceased': [[6, 8], 2], 'more': [[1, 2, 9, 9], 4]}
Jan 30 '07 #19
bvdet
2,851 Expert Mod 2GB
I forgot to check for lower case in concordance() so some of the words were skipped.
Expand|Select|Wrap|Line Numbers
  1. >>> listCopy = d.keys()
  2. >>> listCopy.sort()
  3. >>> for w in listCopy:
  4. ...     print w, d[w]
  5. ...     
  6. add [[4], 1]
  7. an [[4], 1]
  8. and [[1, 3, 9, 9, 9], 5]
  9. anyway [[11], 1]
  10. apostrophy [[4], 1]
  11. but [[3], 1]
  12. deceased [[6, 8], 2]
  13. enjoy [[2], 1]
  14. file [[1], 1]
  15. for [[2, 11], 2]
  16. get [[11], 1]
  17. good [[5], 1]
  18. happen's [[3], 1]
  19. i [[4, 11, 11], 3]
  20. is [[5, 6, 8, 11], 4]
  21. it [[5, 11], 2]
  22. latin [[11], 1]
  23. more [[1, 2, 9, 9], 4]
  24. myths [[2], 1]
  25. new [[1], 1]
  26. newfile [[1], 1]
  27. news [[5], 1]
  28. now [[2], 1]
  29. parrot [[6, 8], 2]
  30. pig [[11], 1]
  31. result [[11], 1]
  32. seems [[5], 1]
  33. so [[5], 1]
  34. some [[2, 9, 9], 3]
  35. test [[11], 1]
  36. that [[5], 1]
  37. the [[6, 8, 9, 11, 11], 5]
  38. this [[11], 1]
  39. to [[5], 1]
  40. using [[11], 1]
  41. was [[11], 1]
  42. well [[5], 1]
  43. what [[3], 1]
  44. when [[4], 1]
  45. work [[5], 1]
  46. >>> 
Jan 30 '07 #20
TMS
119 100+
very nice. I actually figured another way, but I like yours better. I did this:

Expand|Select|Wrap|Line Numbers
  1. for i in lineListOut:                                       # run the list
  2.     concordance(i)                                        # through the concordance function
  3. print conDict                                               # print results
  4.  
  5.  
My problem was that I didn't initialize the dictionary before I started using it, because when I initialized it the error went away. Here is my results:


C:\Python25>python concordance2.py someFile.txt
{'and': [[2, 3, 5, 6]], 'be': [[2, 3]], "don't": [[2]], 'is': [[2, 3, 4, 5, 6]],
'second': [[3]], 'know': [[2]], 'words': [[6]], 'in': [[6]], 'line': [[2, 3, 4,
6]], 'the': [[2, 3, 4, 5, 6]], 'are': [[6]], 'third': [[4]], 'how': [[2]], 'thi
s': [[2, 3, 4, 5, 6]], 'many': [[2]], 'there': [[2, 3, 6]], 'will': [[2, 3]], 'a
ssure': [[3]], '2': [[6]], 'so': [[5]], 'fourth': [[5]], 'you': [[3]], 'more': [
[2, 3]], 'first': [[2]]}

But this isn't alphabatized, and I do need to do that. Plus your output is much nicer than mine, as far as readability. I also started to use the unique flag. Your code does it right. If it is in the line more than once, the line should appear twice as an int. Mine doesn't do that. So, I have to look at that. Thank you once again for your help. I learn so much from this messag board. Its really great. I'm so glad you are here!!!
Jan 31 '07 #21
bvdet
2,851 Expert Mod 2GB
very nice. I actually figured another way, but I like yours better. I did this:

Expand|Select|Wrap|Line Numbers
  1. for i in lineListOut:                                       # run the list
  2.     concordance(i)                                        # through the concordance function
  3. print conDict                                               # print results
  4.  
  5.  
My problem was that I didn't initialize the dictionary before I started using it, because when I initialized it the error went away. Here is my results:


C:\Python25>python concordance2.py someFile.txt
{'and': [[2, 3, 5, 6]], 'be': [[2, 3]], "don't": [[2]], 'is': [[2, 3, 4, 5, 6]],
'second': [[3]], 'know': [[2]], 'words': [[6]], 'in': [[6]], 'line': [[2, 3, 4,
6]], 'the': [[2, 3, 4, 5, 6]], 'are': [[6]], 'third': [[4]], 'how': [[2]], 'thi
s': [[2, 3, 4, 5, 6]], 'many': [[2]], 'there': [[2, 3, 6]], 'will': [[2, 3]], 'a
ssure': [[3]], '2': [[6]], 'so': [[5]], 'fourth': [[5]], 'you': [[3]], 'more': [
[2, 3]], 'first': [[2]]}

But this isn't alphabatized, and I do need to do that. Plus your output is much nicer than mine, as far as readability. I also started to use the unique flag. Your code does it right. If it is in the line more than once, the line should appear twice as an int. Mine doesn't do that. So, I have to look at that. Thank you once again for your help. I learn so much from this messag board. Its really great. I'm so glad you are here!!!
Glad to help. Dictionaries are unordered by design. If you need the data to be sorted, you will need a list.
Jan 31 '07 #22
TMS
119 100+
OK, thanks again, so much. Now ONE MORE question:

Let's say my teacher wants to put 3 files on the command line, and my concordance is to run them all. This is how I've coded the script:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/env python
  2.  
  3. import sys
  4. import concordance
  5.  
  6. for arg in sys.argv[1:]:
  7.     concordance.prepare(arg)
  8.  
  9.  
When I call the script called concord from the command line, like this:

C:\Python25>python concord RoadDog.txt someFile.txt

I get the same file twice, instead of reading both files and processing them. (Honestly, I don't know how I would have gotten through this class without this message board!!!)
Its gotta be how I'm writing my for loop to go through the arguments, but I started by having it print arg, and they both printed. prepare() by the way, is what I call the function that turns the file into a list and prepares it for the concordance. Here is the code:

Expand|Select|Wrap|Line Numbers
  1. def prepare(filename):
  2.     temp = open(filename, 'r')                             # open the file 
  3.     lineListOut = []                                            # initialize a list
  4.     for lineNum, line in enumerate(temp):            # iterate through the list
  5.         for wordNum, word in enumerate(line.split()):
  6.              lineListOut.append(word)                     # add file to list
  7.     temp.close
  8.     for i in lineListOut:                                       # run the list
  9.         concordance(i)                                        # through the concordance funct
  10.     listCopy = conDict.keys()                            # create a list for sorting
  11.     listCopy.sort()
  12.     for w in listCopy:                                         # print sorted list
  13.         print w, ":\n", "    ", filename, ":", conDict[w]
  14.  
Jan 31 '07 #23
bvdet
2,851 Expert Mod 2GB
OK, thanks again, so much. Now ONE MORE question:

Let's say my teacher wants to put 3 files on the command line, and my concordance is to run them all. This is how I've coded the script:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/env python
  2.  
  3. import sys
  4. import concordance
  5.  
  6. for arg in sys.argv[1:]:
  7.     concordance.prepare(arg)
  8.  
  9.  
When I call the script called concord from the command line, like this:

C:\Python25>python concord RoadDog.txt someFile.txt

I get the same file twice, instead of reading both files and processing them. (Honestly, I don't know how I would have gotten through this class without this message board!!!)
Its gotta be how I'm writing my for loop to go through the arguments, but I started by having it print arg, and they both printed. prepare() by the way, is what I call the function that turns the file into a list and prepares it for the concordance. Here is the code:

Expand|Select|Wrap|Line Numbers
  1. def prepare(filename):
  2.     temp = open(filename, 'r')                             # open the file 
  3.     lineListOut = []                                            # initialize a list
  4.     for lineNum, line in enumerate(temp):            # iterate through the list
  5.         for wordNum, word in enumerate(line.split()):
  6.              lineListOut.append(word)                     # add file to list
  7.     temp.close
  8.     for i in lineListOut:                                       # run the list
  9.         concordance(i)                                        # through the concordance funct
  10.     listCopy = conDict.keys()                            # create a list for sorting
  11.     listCopy.sort()
  12.     for w in listCopy:                                         # print sorted list
  13.         print w, ":\n", "    ", filename, ":", conDict[w]
  14.  
You should try to get away from accessing global variables inside your functions. Design your functions to receive arguments and return results. Assign the results to variable names from the calling function or script.
Example:
Expand|Select|Wrap|Line Numbers
  1. def dim(value_string):
  2.     ..................
  3.     ..................
  4.     return real_number
  5.  
  6. def round_length_near(length, increment="1/16"):
  7.     if increment == "0":
  8.         return length
  9.     else:
  10.         return round(length/dim(increment)) * dim(increment)
  11.  
  12. calculated_length = 122.3456
  13.  
  14. rounded_length = round_length_near(calculated_length, "1/4")
  15. print rounded_length
  16.  
  17. valueStr = "10'-2 3/8"
  18. value = dim(valueStr)
  19. print '\nThe dimension string %s evaluates to %0.4f.' % (valueStr, value)
  20.  
  21. Output:
  22. >>> 122.25
  23.  
  24. The dimension string 10'-2 3/8 evaluates to 122.3750
  25. >>>
  26.  
Notice that the functions do not access variables outside their scope.

You should use 'enumerate' when you need a count and a sequence method. Example from your code:
Expand|Select|Wrap|Line Numbers
  1.     for lineNum, line in enumerate(temp):
  2.         a = line.count(f)
  3.         for wordNum, word in enumerate(line.split()):
  4.             word = word.lower()
  5.  
You do not use 'lineNum' or 'wordNum' anywhere else in the function 'prepare()'. Use this instead:
Expand|Select|Wrap|Line Numbers
  1.     for line in temp:
  2.         a = line.count(f)
  3.         for word in line.split():
  4.             word = word.lower()
  5.  
Function 'prepare()' is not creating a list with unique words, so you are sending some of the words to 'concordance()' several times. The words in this list also have punctuation attached with some upper case letters and are being compared to stripped and lowered words.
Try this link for more info: http://www.bvdetailing.com/wordcount.htm
Jan 31 '07 #24
TMS
119 100+
Thank you. That is good advice. My code was messy and I appreciate your input.

I'm still stuck, however, on the argument list. I can only seem to get it to process one file, even though my loop prints out all arguments I ask for. I've added the enumerate like this:

Expand|Select|Wrap|Line Numbers
  1.  
  2. for arg in enumerate(sys.argv[1:]):
  3.     concordance2.prepare(arg)
  4.  
If I simply ask it to print arg, like this:

Expand|Select|Wrap|Line Numbers
  1.  
  2. for arg in enumerate(sys.argv[1:]):
  3.     print arg
  4.  
And call the script like this:

python concord someFile.txt README.txt
I get this:

(0, 'someFile.txt')
(1, 'README.txt')

which tells me that my loop should be working, but its not sending the appropriate file to prepare to go through the list. I also tried it with your code, just to see if there was a bug in mine (wouldn't surprise me...) but no difference.

Once I get this done, then I'm done. At least for this assignment :)
Jan 31 '07 #25
TMS
119 100+
Ok, I got it. nevermind....
Jan 31 '07 #26

Sign in to post your reply or Sign up for a free account.

Similar topics

14
by: OldGuy | last post by:
Hi All Sendmail 8.12.11 php 4.3.9 Sendmail is installed and works properly. php is NOT in safemode from the command line; mail user@domain.com < testmsg works fine.
9
by: Jon Perez | last post by:
I have a C extension function into which I pass a list of lists of tuples: , , , ] I then unpack the values (down to the tuple elements) into their C values using:
2
by: lmeng | last post by:
Hi, I am new to this Forum. Thanks in advance for any kind help. In the following HTML code, when I change the value of one text field then click "Modify" button, if the validation fails a...
15
by: Steve | last post by:
I have a form with about 25 fields. In the BeforeUpdate event of the form, I have code that sets the default value of each field to its current value. For a new record, I can put the focus in any...
0
by: Roman | last post by:
I'm trying to create the form which would allow data entry to the Client table, as well as modification and deletion of existing data rows. For some reason the DataGrid part of functionality stops...
16
by: Dany | last post by:
Our web service was working fine until we installed .net Framework 1.1 service pack 1. Uninstalling SP1 is not an option because our largest customer says service packs marked as "critical" by...
2
by: PleegWat | last post by:
Hi, I'm using this function: function die_quietly( $text='', $title='', $file='', $line='', $sql='' ) { global $wowdb, $roster_conf, $wordings, $roster_menu; // die_quitely died quietly...
1
by: littlealex | last post by:
IE6 not displaying text correctly - IE 7 & Firefox 3 are fine! Need some help with this as fairly new to CSS! In IE6 the text for the following page doesn't display properly - rather than being...
8
by: lovecreatesbea... | last post by:
Does this part of C code call and check strtol() correctly? port = strtol(argv, &endptr, 10); if (argv == endptr){ fprintf(stderr, "%s\n", "Invalid port number form"); return 1; } if (port ==...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.