numF += 1 doesn't return correctly

119 100+

Another assignment, and another question. I am writing a concordance, and just to begin I'm trying to get it to find a word and count it correctly. Here is my code:

Expand|Select|Wrap|Line Numbers

 
import sys
 
#filename = sys.argv[1]        #at some point this will be used, not now
 
def concordance(f, unique= True):

    temp = open('someFile.txt')

    for line in temp:

        lineList = line.split(" ")

        numF = 0

        for word in lineList:

            num = word.find(f)

            if num == 0:

                print "num: ", num  #I want to see how its working

                numF +=1

                print "numF: ", numF  #I want to see how its adding

    return f, numF
 
print concordance("and")

The file I'm using is just jibberish:

NewFile and, more new file.
Now for some more myths, enjoy.
But what happen's
when I add an apostrophy?
Well, it seems to work, so (that) is good news!
The parrot is deceased.

The parrot is deceased.
The and some more and and some more and

I was using it for the pig latin test. Anyway, This is the result I get:

Expand|Select|Wrap|Line Numbers

 
num:  0            #if it finds it shouldn't it be a 1??

numF:  1          #see, here is the 1st and

num:  0   

numF:  1          #here is the next and... this should be 2

num:  0

numF:  2          #now it starts to count

num:  0

numF:  3

num:  0

numF:  4

('and', 4)

It should be 'and', 5. but you can see that I am not coding it correctly so that it will add right. numF is 1 twice, then starts to add.

Your help would be appreciated. :)

TMS

Jan 26 '07 #1

Subscribe Post Reply

2177

bartonc

6,596

Expert 4TB

Another assignment, and another question. I am writing a concordance, and just to begin I'm trying to get it to find a word and count it correctly. Here is my code:

Expand|Select|Wrap|Line Numbers

import sys

#filename = sys.argv[1]        #at some point this will be used, not now

def concordance(f, unique= True):

    temp = open('someFile.txt')

    for line in temp:

        lineList = line.split(" ")

        numF = 0

        for word in lineList:

            num = word.find(f)

            if num == 0:

                print "num: ", num  #I want to see how its working

                numF +=1

                print "numF: ", numF  #I want to see how its adding

    return f, numF

print concordance("and")

The file I'm using is just jibberish:

NewFile and, more new file.
Now for some more myths, enjoy.
But what happen's
when I add an apostrophy?
Well, it seems to work, so (that) is good news!
The parrot is deceased.

The parrot is deceased.
The and some more and and some more and

I was using it for the pig latin test. Anyway, This is the result I get:

Expand|Select|Wrap|Line Numbers

num:  0            #if it finds it shouldn't it be a 1??

numF:  1          #see, here is the 1st and

num:  0

numF:  1          #here is the next and... this should be 2

num:  0

numF:  2          #now it starts to count

num:  0

numF:  3

num:  0

numF:  4

('and', 4)

It should be 'and', 5. but you can see that I am not coding it correctly so that it will add right. numF is 1 twice, then starts to add.

Your help would be appreciated. :)

TMS

Ok. A couple of things:

Expand|Select|Wrap|Line Numbers

num = word.find(f)

is asking for the posision of f (say, "and") in word. What you want to do is increment a counter

Expand|Select|Wrap|Line Numbers

 if word == f:

    numF += 1

unless you really want substrings to count as words.
If you want to keep track of which line/word you are at, I like python's enumerate function:

Expand|Select|Wrap|Line Numbers

 
import sys
 
#filename = sys.argv[1]        #at some point this will be used, not now
 
def concordance(f, unique= True):

    numF = 0

    temp = open('someFile.txt')

    for lineNum, line in enumerate(temp):

        for wordNum, word in enumerate(line.split()): # split() defaults to any whitespace

            if word == f:

                numF += 1

                print lineNum, wordNum, word

    return numF

Jan 27 '07 #2

bvdet

2,851

Expert Mod 2GB

Another assignment, and another question. I am writing a concordance, and just to begin I'm trying to get it to find a word and count it correctly. ........................
.......................................
Your help would be appreciated. :)

TMS

There is a better way to get a word count:

Use 'f.readlines()' to read the file into a list
Create a word list - wordList = "".join(fList).split()
Use list method wordList.count('word') to count the occurances of the word.

HTH :)
BV

Jan 27 '07 #3

bartonc

6,596

Expert 4TB

There is a better way to get a word count:
Use 'f.readlines()' to read the file into a list

Create a word list - wordList = "".join(fList).split()

Use list method wordList.count('word') to count the occurances of the word.

HTH :)
BV

Nice, BV. I have to admit ignorance of list.count(). It shows that we can all read the docs more or do things like

Expand|Select|Wrap|Line Numbers

help(list)

more often.
But a concordance really needs to know where the occurance is located.

Jan 27 '07 #4

bvdet

2,851

Expert Mod 2GB

Nice, BV. I have to admit ignorance of list.count(). It shows that we can all read the docs more or do things like

Expand|Select|Wrap|Line Numbers

help(list)

more often.
But a concordance really needs to know where the occurance is located.

I overlooked punctuation, otherwise a word could be counted like this:

Expand|Select|Wrap|Line Numbers

open(data_file, 'r').read().split().count('and')

Punctuation could be stripped from each word:

Expand|Select|Wrap|Line Numbers

 wordList = []

for word in wordList1:

    wordList.append(word.strip(",.!?:()[]/\\"))

I had to look up 'concordance'!

Jan 27 '07 #5

TMS

119

100+

I just the enumerate code and I get this:

8 1 and
8 4 and
8 5 and
8 8 and

which isn't correct. It is saying that there are 8 lines and 8 times that and appears in the text. There is only 5 times that and appears, so its still the same problem. I do need to keep track of lines, and which line the word appears. First I was working on just counting the times and appears, and my code had it off by 1.

I will work on the other ideas now. Thank you

Jan 27 '07 #6

TMS

119

100+

ok, so, I reduced the amount of code by quite a bit, but I'm having the same problem. The count is off by 1.

Expand|Select|Wrap|Line Numbers

 
import sys
 
#filename = sys.argv[1]
 
def concordance(f, unique= True):

    temp = open('someFile.txt')

    line = temp.readlines()

    numF = 0

    wordList = "".join(line).split()

    numF = wordList.count(f)

    return f, numF   
 
print concordance("and")

Result:

>>>
('and', 4)
>>>

Why in the world won't it count all of them?
TMS

Jan 27 '07 #7

bvdet

2,851

Expert Mod 2GB

ok, so, I reduced the amount of code by quite a bit, but I'm having the same problem. The count is off by 1.

Expand|Select|Wrap|Line Numbers

import sys

#filename = sys.argv[1]

def concordance(f, unique= True):

    temp = open('someFile.txt')

    line = temp.readlines()

    numF = 0

    wordList = "".join(line).split()

    numF = wordList.count(f)

    return f, numF

print concordance("and")

Result:

>>>
('and', 4)
>>>

Why in the world won't it count all of them?
TMS

Here's a couple of options:

Expand|Select|Wrap|Line Numbers

 def concordance(f, unique= True):

    lines = open('someFile.txt').read()

    numF = 0

    wordList = []

    wordList1 = "".join(lines).split()

    for word in wordList1:

        wordList.append(word.strip(",.!?:()[]/\\"))

    numF = wordList.count(f)

    return f, numF
 
>>> word, quan = concordance('and')

>>> print "The word '%s' occured %s times." % (word, quan)

The word 'and' occured 5 times.

Borrowing from one of Barton's posts:

Expand|Select|Wrap|Line Numbers

 def concordance(f, unique= True):

    numF = 0

    temp = open(data_file, 'r')

    for lineNum, line in enumerate(temp):

        for wordNum, word in enumerate(line.split()): # split() defaults to any whitespace

            if word.strip(",.!?:()[]/\\") == f:

                numF += 1

                print lineNum, wordNum, word

    return numF
 
>>> word, quan = concordance('and')

>>> print "The word '%s' occured %s times." % (word, quan)

0 1 and,

2 3 and.

8 1 and

8 4 and

8 5 and

The word 'and' occured 5 times.

Jan 27 '07 #8

TMS

119

100+

Cool. That works, thank you.

Now I have to work on only counting one instance per line, and what line it(they) appear on. That is why the unique = True is part of the function.

I will think on that for a bit, and see if I can find a way to make that work. If not... I will be back.....

TMS

Jan 27 '07 #9

bartonc

6,596

Expert 4TB

I had to look up 'concordance'!

It's a biblical thing (the reason I know the def).

Jan 27 '07 #10

bartonc

6,596

Expert 4TB

Cool. That works, thank you.

Now I have to work on only counting one instance per line, and what line it(they) appear on. That is why the unique = True is part of the function.

I will think on that for a bit, and see if I can find a way to make that work. If not... I will be back.....

TMS

Expand|Select|Wrap|Line Numbers

 def concordance(f, unique= True):

    numF = 0

    temp = open(data_file, 'r')

    for lineNum, line in enumerate(temp):

        for wordNum, word in enumerate(line.split()): # split() defaults to any whitespace

            if word.strip(",.!?:()[]/\\") == f:

                numF += 1

                print lineNum, wordNum, word

                break

    return numF

Jan 27 '07 #11

TMS

119

100+

You figured it out! What was it?

I was thinking that the count started on 0, that might cause it to be 1 off, but on other words it seemed to work. I'm looking at the code, and I just tested it and it works, so I'm trying to understand what you changed other than adding the print and break statement.

Jan 27 '07 #12

TMS

119

100+

please disregard my previous reply as it was premature.
At this point, I'm more concerned with keeping track of the line that the word shows up in. After re-reading the instructions, I only need to show the word, the amount of lines it shows up and which lines.

I think he wants us to do this with a dictionary, but it seems to me that I will have to start with a list, then convert the list to a dictionary using zip or something.

Well, I'm still working on it. I hate to admit I dreamed about it last night, woke up several times going over it in my mind. I think I'm going crazy.

tms

:(

Jan 27 '07 #13

bartonc

6,596

Expert 4TB

please disregard my previous reply as it was premature.
At this point, I'm more concerned with keeping track of the line that the word shows up in. After re-reading the instructions, I only need to show the word, the amount of lines it shows up and which lines.

I think he wants us to do this with a dictionary, but it seems to me that I will have to start with a list, then convert the list to a dictionary using zip or something.

Well, I'm still working on it. I hate to admit I dreamed about it last night, woke up several times going over it in my mind. I think I'm going crazy.

tms

:(

That's the way it goes... Not crazy, maybe addicted.. The problem is a good one. The cool thing about using a dictionary is that it could grow into a true concordance. For now it'll just have one word in it so it may seem a bit overboard. It seems to me that you may want a dictionary whose keys are the words that your function has processed and whose value is actually a list containing a list of line occurances and the total number of qualified lines:

Expand|Select|Wrap|Line Numbers

 def concordance(f, unique= True):

    concDict = {f:[[],0]}  # eventually, create this dict outside the function

    temp = open(data_file, 'r')

    for lineNum, line in enumerate(temp):

        for word in line.split(): # split() defaults to any whitespace

            if word.strip(",.!?:()[]/\\") == f:

                concDict[f][0].append(lineNum)

                concDict[f][1] += 1

                break

    return concDict
 
for key, value in concordance("and").items():

    print key, value

Jan 27 '07 #14

TMS

119

100+

ok, how about this?

Expand|Select|Wrap|Line Numbers

 
def concordance(f, unique= True):

    lineNumber = []

    temp = open("someFile.txt", 'r')

    for lineNum, line in enumerate(temp):

        for wordNum, word in enumerate(line.split()): # split() defaults to any whitespace

            if word.strip(",.!?:()[]/\\") == f:

                lineNumber.append(lineNum)

                break

    return f, lineNumber
 
print concordance("and")

My text is this:

This is the first line and I don't know how many more there will be.
This is the second line and I assure you there will be more, and more.
This is the third line.
This is the fourth and so on.
This is the fifth, and there are 2 and words in this line.

and it returns this:
('and', [1, 2, 4, 5])

which is what I want because then I can put the words into a list, and the line numbers are already in a list. I can use Zip or Map (I need to investigate both) to combine them in a dictionary that would look like this:

dict1 = {'and':['1','2','4','5'], 'this':['1','2','3'.'4'.'5']} etc. Then, the second part of the assignment requires that I write a script, not a module (?) concord that compiles concordances for every file specified on its command line, merges them into one big one, then prints them out in alpabetical order fo the keys, like this:

and (4):
someFile.txt: 1,2,4,5
this (5):
someFile.txt: 1,2,3,4,5
zoo (2):
someOtherfile.txt: 24
andAnotherText.txt: 36

There are idiosyncracies (wow, did I spell that right?) that I still need to work out and understand, but this first part is done, so I can start on the second part and ask some questions in class on Monday.

Thanks!!!

Jan 28 '07 #15

bartonc

6,596

Expert 4TB

ok, how about this?

Expand|Select|Wrap|Line Numbers

def concordance(f, unique= True):

    lineNumber = []

    temp = open("someFile.txt", 'r')

    for lineNum, line in enumerate(temp):

        for wordNum, word in enumerate(line.split()): # split() defaults to any whitespace

            if word.strip(",.!?:()[]/\\") == f:

                lineNumber.append(lineNum)

                break

    return f, lineNumber

print concordance("and")

My text is this:

This is the first line and I don't know how many more there will be.
This is the second line and I assure you there will be more, and more.
This is the third line.
This is the fourth and so on.
This is the fifth, and there are 2 and words in this line.

and it returns this:
('and', [1, 2, 4, 5])

which is what I want because then I can put the words into a list, and the line numbers are already in a list. I can use Zip or Map (I need to investigate both) to combine them in a dictionary that would look like this:

dict1 = {'and':['1','2','4','5'], 'this':['1','2','3'.'4'.'5']} etc. Then, the second part of the assignment requires that I write a script, not a module (?) concord that compiles concordances for every file specified on its command line, merges them into one big one, then prints them out in alpabetical order fo the keys, like this:

and (4):
someFile.txt: 1,2,4,5
this (5):
someFile.txt: 1,2,3,4,5
zoo (2):
someOtherfile.txt: 24
andAnotherText.txt: 36

There are idiosyncracies (wow, did I spell that right?) that I still need to work out and understand, but this first part is done, so I can start on the second part and ask some questions in class on Monday.

Thanks!!!

You're welcome. Which ever way makes most sense to you is the "best" way for you to do this. This looks great. keep it up.

Jan 28 '07 #16

TMS

119

100+

OK, I'm stuck again.

It seems that my concordance won't work the way it is written because it requires the whole file to be run at once. In other words, the function will read in the file (it does that) and make a cross-listing keyed by every word in the file. The value associated with each key is a sequence whose elements are the name of the file and the line within that file on which the name appears. In other words, only one entry in the sequence for each instance of the word.

My problem is that if I bring the entire file in as a list, I would need to convert it to a dictionary ('keys'). Then each key has the line number associated with the key (the word) and if it appears more than once on a line, that line number has the value of how many times the word appears on the line.

Right now my function takes one word at a time. It should be printed on the previous post, but I'll put it here again, since I've changed it a bit:

Expand|Select|Wrap|Line Numbers

 
import sys
 
filename= sys.argv[1]
 
def concordance(f, unique= True):

    lineNumber = []

    temp = open(filename, 'r')

    for lineNum, line in enumerate(temp):

        a = line.count(f)

        for wordNum, word in enumerate(line.split()):

            word = word.lower()

            if word.strip(",.!?:()[]/\\") == f:

                lineNumber.append(lineNum+1)    #list index starts with 0, chng to start with 1

                if a > 1:

                    words = {f: (a, [lineNumber])}

                else:

                    words = {f: [lineNumber]}

                break

    return words
 
print concordance("and")

So, my question is this: How do I iterate through the file and add each word to the dictionary? When I try I end up with the last word only. Also, this function gives a count if the word appears on a line more than once, but its not real effective because it doesn't assign it to the line, it just sort of counts it. Really what I want in its place is to count how many times the word shows up in the whole file. Later I will work on the individual lines (if I ever get that far).

I thought I was so close. Now I think I have to start over :(

Jan 30 '07 #17

TMS

119

100+

I think I'm closer, but I'm still having trouble iterating through the file. I decided I could use the code I have, IFF (the mathematical if and only if) I can iterate the file seperately. Treat it like a state machine or something. Here is my code:

Expand|Select|Wrap|Line Numbers

 
import sys
 
filename= sys.argv[1]
 
conDict = {}

def concordance(f, unique= True):

    lineNumber = []

    temp = open(filename, 'r')

    for lineNum, line in enumerate(temp):

        a = line.count(f)

        for wordNum, word in enumerate(line.split()):

            word = word.lower()

            if word.strip(",.!?:()[]/\\") == f:

                lineNumber.append(lineNum+1)    #list index starts with 0, change to 1

                if a > 1:

                    words = {f: (a, [lineNumber])}

                else:

                    words = {f: [lineNumber]}

                break

    conDict.update(words)

    return conDict
 
for word in enumerate(filename):

    concordance(word)

    word += 1

print conDict

but I get the following error:

C:\Python25>python concordance2.py someFile.txt
Traceback (most recent call last):
File "concordance2.py", line 25, in <module>
concordance(word)
File "concordance2.py", line 10, in concordance
a = line.count(f)
TypeError: expected a character buffer object

SO, any ideas? I would really like to hand in this project on Wednesday, and this is only the first part! The second part is making a script that compiles a whole bunch of files into one concordance..... using this module, of course.

Thank you for your help!

Jan 30 '07 #18

bvdet

2,851

Expert Mod 2GB

I think I'm closer, but I'm still having trouble iterating through the file. I decided I could use the code I have, IFF (the mathematical if and only if) I can iterate the file seperately. Treat it like a state machine or something. Here is my code:

Expand|Select|Wrap|Line Numbers

import sys

filename= sys.argv[1]

conDict = {}

def concordance(f, unique= True):

    lineNumber = []

    temp = open(filename, 'r')

    for lineNum, line in enumerate(temp):

        a = line.count(f)

        for wordNum, word in enumerate(line.split()):

            word = word.lower()

            if word.strip(",.!?:()[]/\\") == f:

                lineNumber.append(lineNum+1)    #list index starts with 0, change to 1

                if a > 1:

                    words = {f: (a, [lineNumber])}

                else:

                    words = {f: [lineNumber]}

                break

    conDict.update(words)

    return conDict

for word in enumerate(filename):

    concordance(word)

    word += 1

print conDict

but I get the following error:

C:\Python25>python concordance2.py someFile.txt
Traceback (most recent call last):
File "concordance2.py", line 25, in <module>
concordance(word)
File "concordance2.py", line 10, in concordance
a = line.count(f)
TypeError: expected a character buffer object

SO, any ideas? I would really like to hand in this project on Wednesday, and this is only the first part! The second part is making a script that compiles a whole bunch of files into one concordance..... using this module, of course.

Thank you for your help!

I would go about it something like this:

Expand|Select|Wrap|Line Numbers

 conDict = {}

d, lines = wordList(data_file)
 
for item in d.items():

    conDict.update(concordance({item[0]: item[1]}, lines))

where d is a dictionary of unique words from the text file and lines is a list of lines from the text file. Function 'wordList' opens the file, reads the file with 'readlines()', closes the file, compiles a dictionary of unique words (already stripped and case lowered), and returns the word dictionary and line list. Each item in the dictionary would be like this: ['word': [[],0]. I have written the function 'wordList' and modified 'concordance', but I thought you may want to write it yourself. Here's my output:

Expand|Select|Wrap|Line Numbers

 {'and': [[1, 3, 9, 9, 9], 5], 'enjoy': [[2], 1], 'latin': [[11], 1], 'anyway': [[], 0], 'get': [[11], 1], 'when': [[4], 1], 'is': [[5, 6, 8, 11], 4], 'some': [[2, 9, 9], 3], 'it': [[5, 11], 2], 'but': [[], 0], 'an': [[4], 1], 'this': [[], 0], 'good': [[5], 1], 'result': [[11], 1], 'file': [[1], 1], 'news': [[5], 1], 'using': [[11], 1], 'pig': [[11], 1], 'work': [[5], 1], 'newfile': [[], 0], 'well': [[], 0], 'what': [[3], 1], 'now': [[], 0], "happen's": [[3], 1], 'for': [[2, 11], 2], 'i': [[], 0], 'that': [[5], 1], 'seems': [[5], 1], 'apostrophy': [[4], 1], 'myths': [[2], 1], 'to': [[5], 1], 'add': [[4], 1], 'so': [[5], 1], 'test': [[11], 1], 'new': [[1], 1], 'the': [[11, 11], 2], 'was': [[11], 1], 'parrot': [[6, 8], 2], 'deceased': [[6, 8], 2], 'more': [[1, 2, 9, 9], 4]}
 

Jan 30 '07 #19

bvdet

2,851

Expert Mod 2GB

I forgot to check for lower case in concordance() so some of the words were skipped.

Expand|Select|Wrap|Line Numbers

 >>> listCopy = d.keys()

>>> listCopy.sort()

>>> for w in listCopy:

...     print w, d[w]

...     

add [[4], 1]

an [[4], 1]

and [[1, 3, 9, 9, 9], 5]

anyway [[11], 1]

apostrophy [[4], 1]

but [[3], 1]

deceased [[6, 8], 2]

enjoy [[2], 1]

file [[1], 1]

for [[2, 11], 2]

get [[11], 1]

good [[5], 1]

happen's [[3], 1]

i [[4, 11, 11], 3]

is [[5, 6, 8, 11], 4]

it [[5, 11], 2]

latin [[11], 1]

more [[1, 2, 9, 9], 4]

myths [[2], 1]

new [[1], 1]

newfile [[1], 1]

news [[5], 1]

now [[2], 1]

parrot [[6, 8], 2]

pig [[11], 1]

result [[11], 1]

seems [[5], 1]

so [[5], 1]

some [[2, 9, 9], 3]

test [[11], 1]

that [[5], 1]

the [[6, 8, 9, 11, 11], 5]

this [[11], 1]

to [[5], 1]

using [[11], 1]

was [[11], 1]

well [[5], 1]

what [[3], 1]

when [[4], 1]

work [[5], 1]

>>>

Jan 30 '07 #20

TMS

119

100+

very nice. I actually figured another way, but I like yours better. I did this:

Expand|Select|Wrap|Line Numbers

 
for i in lineListOut:                                       # run the list

    concordance(i)                                        # through the concordance function

print conDict                                               # print results

My problem was that I didn't initialize the dictionary before I started using it, because when I initialized it the error went away. Here is my results:

C:\Python25>python concordance2.py someFile.txt
{'and': [[2, 3, 5, 6]], 'be': [[2, 3]], "don't": [[2]], 'is': [[2, 3, 4, 5, 6]],
'second': [[3]], 'know': [[2]], 'words': [[6]], 'in': [[6]], 'line': [[2, 3, 4,
6]], 'the': [[2, 3, 4, 5, 6]], 'are': [[6]], 'third': [[4]], 'how': [[2]], 'thi
s': [[2, 3, 4, 5, 6]], 'many': [[2]], 'there': [[2, 3, 6]], 'will': [[2, 3]], 'a
ssure': [[3]], '2': [[6]], 'so': [[5]], 'fourth': [[5]], 'you': [[3]], 'more': [
[2, 3]], 'first': [[2]]}

But this isn't alphabatized, and I do need to do that. Plus your output is much nicer than mine, as far as readability. I also started to use the unique flag. Your code does it right. If it is in the line more than once, the line should appear twice as an int. Mine doesn't do that. So, I have to look at that. Thank you once again for your help. I learn so much from this messag board. Its really great. I'm so glad you are here!!!

Jan 31 '07 #21

bvdet

2,851

Expert Mod 2GB

very nice. I actually figured another way, but I like yours better. I did this:

Expand|Select|Wrap|Line Numbers

for i in lineListOut:                                       # run the list

    concordance(i)                                        # through the concordance function

print conDict                                               # print results

My problem was that I didn't initialize the dictionary before I started using it, because when I initialized it the error went away. Here is my results:

C:\Python25>python concordance2.py someFile.txt
{'and': [[2, 3, 5, 6]], 'be': [[2, 3]], "don't": [[2]], 'is': [[2, 3, 4, 5, 6]],
'second': [[3]], 'know': [[2]], 'words': [[6]], 'in': [[6]], 'line': [[2, 3, 4,
6]], 'the': [[2, 3, 4, 5, 6]], 'are': [[6]], 'third': [[4]], 'how': [[2]], 'thi
s': [[2, 3, 4, 5, 6]], 'many': [[2]], 'there': [[2, 3, 6]], 'will': [[2, 3]], 'a
ssure': [[3]], '2': [[6]], 'so': [[5]], 'fourth': [[5]], 'you': [[3]], 'more': [
[2, 3]], 'first': [[2]]}

But this isn't alphabatized, and I do need to do that. Plus your output is much nicer than mine, as far as readability. I also started to use the unique flag. Your code does it right. If it is in the line more than once, the line should appear twice as an int. Mine doesn't do that. So, I have to look at that. Thank you once again for your help. I learn so much from this messag board. Its really great. I'm so glad you are here!!!

Glad to help. Dictionaries are unordered by design. If you need the data to be sorted, you will need a list.

Jan 31 '07 #22

TMS

119

100+

OK, thanks again, so much. Now ONE MORE question:

Let's say my teacher wants to put 3 files on the command line, and my concordance is to run them all. This is how I've coded the script:

Expand|Select|Wrap|Line Numbers

 
#!/usr/bin/env python
 
import sys

import concordance
 
for arg in sys.argv[1:]:

    concordance.prepare(arg)

When I call the script called concord from the command line, like this:

C:\Python25>python concord RoadDog.txt someFile.txt

I get the same file twice, instead of reading both files and processing them. (Honestly, I don't know how I would have gotten through this class without this message board!!!)
Its gotta be how I'm writing my for loop to go through the arguments, but I started by having it print arg, and they both printed. prepare() by the way, is what I call the function that turns the file into a list and prepares it for the concordance. Here is the code:

Expand|Select|Wrap|Line Numbers

 
def prepare(filename):

    temp = open(filename, 'r')                             # open the file 

    lineListOut = []                                            # initialize a list

    for lineNum, line in enumerate(temp):            # iterate through the list

        for wordNum, word in enumerate(line.split()):

             lineListOut.append(word)                     # add file to list

    temp.close

    for i in lineListOut:                                       # run the list

        concordance(i)                                        # through the concordance funct

    listCopy = conDict.keys()                            # create a list for sorting

    listCopy.sort()

    for w in listCopy:                                         # print sorted list

        print w, ":\n", "    ", filename, ":", conDict[w]

Jan 31 '07 #23

bvdet

2,851

Expert Mod 2GB

OK, thanks again, so much. Now ONE MORE question:

Let's say my teacher wants to put 3 files on the command line, and my concordance is to run them all. This is how I've coded the script:

Expand|Select|Wrap|Line Numbers

#!/usr/bin/env python

import sys

import concordance

for arg in sys.argv[1:]:

    concordance.prepare(arg)

When I call the script called concord from the command line, like this:

C:\Python25>python concord RoadDog.txt someFile.txt

I get the same file twice, instead of reading both files and processing them. (Honestly, I don't know how I would have gotten through this class without this message board!!!)
Its gotta be how I'm writing my for loop to go through the arguments, but I started by having it print arg, and they both printed. prepare() by the way, is what I call the function that turns the file into a list and prepares it for the concordance. Here is the code:

Expand|Select|Wrap|Line Numbers

def prepare(filename):

    temp = open(filename, 'r')                             # open the file

    lineListOut = []                                            # initialize a list

    for lineNum, line in enumerate(temp):            # iterate through the list

        for wordNum, word in enumerate(line.split()):

             lineListOut.append(word)                     # add file to list

    temp.close

    for i in lineListOut:                                       # run the list

        concordance(i)                                        # through the concordance funct

    listCopy = conDict.keys()                            # create a list for sorting

    listCopy.sort()

    for w in listCopy:                                         # print sorted list

        print w, ":\n", "    ", filename, ":", conDict[w]

You should try to get away from accessing global variables inside your functions. Design your functions to receive arguments and return results. Assign the results to variable names from the calling function or script.
Example:

Expand|Select|Wrap|Line Numbers

 
def dim(value_string):

    ..................

    ..................

    return real_number
 
def round_length_near(length, increment="1/16"):

    if increment == "0":

        return length

    else:

        return round(length/dim(increment)) * dim(increment)
 
calculated_length = 122.3456
 
rounded_length = round_length_near(calculated_length, "1/4")

print rounded_length
 
valueStr = "10'-2 3/8"

value = dim(valueStr)

print '\nThe dimension string %s evaluates to %0.4f.' % (valueStr, value)
 
Output:

>>> 122.25
 
The dimension string 10'-2 3/8 evaluates to 122.3750

>>>

Notice that the functions do not access variables outside their scope.

You should use 'enumerate' when you need a count and a sequence method. Example from your code:

Expand|Select|Wrap|Line Numbers

 
    for lineNum, line in enumerate(temp):

        a = line.count(f)

        for wordNum, word in enumerate(line.split()):

            word = word.lower()

You do not use 'lineNum' or 'wordNum' anywhere else in the function 'prepare()'. Use this instead:

Expand|Select|Wrap|Line Numbers

 
    for line in temp:

        a = line.count(f)

        for word in line.split():

            word = word.lower()

Function 'prepare()' is not creating a list with unique words, so you are sending some of the words to 'concordance()' several times. The words in this list also have punctuation attached with some upper case letters and are being compared to stripped and lowered words.
Try this link for more info: http://www.bvdetailing.com/wordcount.htm

Jan 31 '07 #24

TMS

119

100+

Thank you. That is good advice. My code was messy and I appreciate your input.

I'm still stuck, however, on the argument list. I can only seem to get it to process one file, even though my loop prints out all arguments I ask for. I've added the enumerate like this:

Expand|Select|Wrap|Line Numbers

  
for arg in enumerate(sys.argv[1:]):

    concordance2.prepare(arg)

If I simply ask it to print arg, like this:

Expand|Select|Wrap|Line Numbers

  
for arg in enumerate(sys.argv[1:]):

    print arg

And call the script like this:

python concord someFile.txt README.txt
I get this:

(0, 'someFile.txt')
(1, 'README.txt')

which tells me that my loop should be working, but its not sending the appropriate file to prepare to go through the list. I also tried it with your code, just to see if there was a bug in mine (wouldn't surprise me...) but no difference.

Once I get this done, then I'm done. At least for this assignment :)

Jan 31 '07 #25

TMS

119

100+

Ok, I got it. nevermind....

Jan 31 '07 #26

numF += 1 doesn't return correctly

Similar topics