By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,046 Members | 2,058 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,046 IT Pros & Developers. It's quick & easy.

Word count from file help.

P: n/a
Hello,

I'm learning Python from Python Bible, and having some
problems with this code below. When I run it, I get nothing. It
should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears
in the text.

#!/usr/bin/python
# WordCount.py - Counts the words in a given text file (poem.txt)

import string

def CountWords(Text):
"Count how many times each word appears in Text"
# A string (above) after a def statement is a -
# "docstring" - a comment intended for documentation.
WordCount={}
# We will build up (and return) a dictionary whose keys
# are the words, and whose values are the corresponding
# number of occurrences.

CountWords=""
# To make the job cleaner, add a period at the end of the
# text; that way, we are guaranteed to be finished with
# the current word when we run out of letters:
Text=Text+"."

# We assume that ' and - don't break words, but any other
# nonalphabetic character does. This assumption isn't
# entirely accurate, but it's close enough for us.
# string.letters is a string of all alphabetic charactors.
PiecesOfWords=string.letters+"'-"

# Iterate over each character in the text. The function
# len () returns the length of a sequence.
for CharacterIndex in range(0,len(Text)):
CurrentCharacter=Text[CharacterIndex]

# The find() method of a string finds the starting
# index of the first occurrence of a substring within
# a string, or returns -1 of it doesn't find a substring.
# The next line of code tests to see wether CurrentCharacter
# is part of a word:
if(PiecesOfWords.find(CurrentCharacter)!=-1):
# Append this letter to the current word.
CurrentWord=CurrentWord+CurrentCharacter
else:
# This character is no a letter.
if(CurrentWord!=""):
# We just finished a word.
# Convert to lowercase, so "The" and
"the"
# fall in the same bucket...

CurrentWord=string.lower(CurrentWord)

# Now increment this word's count.

CurrentCount=WordCount.get(CurrentWord,0)

WordCount[CurrentWord]=CurrentCount+1

# Start a new word.
CurrentWord=""
return(WordCount)
if (__name__=="__main__"):
# Read the text from the file
peom.txt.
TextFile=open("poem.txt","r")
Text=TextFile.read()
TextFile.close()

# Count the words in the text.
WordCount=CountWords(Text)
# Alphabetize the word list, and
print them all out.
SortedWords=WordCount.keys()
SortedWords.sort()
for Word in SortedWords:
print Word.WordCount[Word]
Jul 18 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
On Thu, 12 Feb 2004 01:04:20 GMT, jester.dev wrote:
I'm learning Python from Python Bible
Welcome, I hope you're enjoying learning the language.
problems with this code below. When I run it, I get nothing.
More information required:

How are you invoking it (what command do you type)? Does the program
appear to do something, then exit?

You've told us what you expect the program to do (thanks!):
It should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears in the
text.
Diagnostics:

When you encounter unexpected behaviour in a complex piece of code, it's
best to test some assumptions.

What happens when the file "poem.txt" is not there? (Rename the file to
a different name.) This will tell you whether the program is even
attempting to read the file.

What happens when you import this into the interactive Python prompt,
then call CountWords on some text? This will tell you whether the
function is performing as expected.

And so on.
One possible problem that may be a mistake in the way you pasted the
text into your newsgroup message:
#!/usr/bin/python
[...]
import string

def CountWords(Text):
[...]
for CharacterIndex in range(0,len(Text)):
[...]
if(PiecesOfWords.find(CurrentCharacter)!=-1):
[...]
else:
if(CurrentWord!=""):
[...]
if (__name__=="__main__"):
[...]


Indentation defines structural language blocks in Python. The "def",
"for", "if" structures above will encompass *all* lines below them until
the next line at their own indentation level or less.

In other words, if the code looks the way you've pasted it here, the
"def" encompasses everything below it; the "for" encompasses everything
below it; and the "if(PiecesOfWords...):" encompasses everything below
it. Including the "if( __name__ == "__main__" ):" line.

Thus, as you've posted it here, the file imports the string module,
defines a function -- then does nothing with it.

Please be sure to paste the text literally in messages; or, if you've
pasted the text exactly as it is in the program, learn how Python
interprets indentation:

<http://www.python.org/doc/current/ref/indentation.html>

--
\ "You've got the brain of a four-year-old boy, and I'll bet he |
`\ was glad to get rid of it." -- Groucho Marx |
_o__) |
Ben Finney <http://bignose.squidly.org/>
Jul 18 '05 #2

P: n/a
"jester.dev" <je********@comcast.net> wrote in message
news:oiAWb.9926$jk2.28236@attbi_s53...
Hello,

I'm learning Python from Python Bible, and having some
problems with this code below. When I run it, I get nothing. It
should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears
in the text.

Try this:

# wordCount.py
#
# invoke using: python wordCount.py <filename>
#
from pyparsing import Word, alphas
import sys

# modify this word definition as you wish - whitespace is implicit separator
wordSpec = Word(alphas)

if len(sys.argv) > 1:
infile = sys.argv[1]

wordDict = {}
filetext = "\n".join( file(infile).readlines() )
for wd,locstart,locend in wordSpec.scanString(filetext):
#~ curWord = string.lower(wd[0])
curWord = wd[0].lower()
if wordDict.has_key( curWord ):
wordDict[curWord] += 1
else:
wordDict[curWord] = 1

print "%s has %d different words." % ( infile, len(wordDict.keys()) )
keylist = wordDict.keys()
keylist.sort( lambda a,b:
( wordDict[b] - wordDict[a] ) or
( ( ( a > b ) and 1 ) or ( ( a < b ) and -1 ) or 0 ) )
for k in keylist:
print k, ":", wordDict[k]
Jul 18 '05 #3

P: n/a
Oops, sorry, forgot to mention that this requires downloading pyparsing at
http://pyparsing.sourceforge.net.

"Paul McGuire" <pt***@users.sourceforge.net> wrote in message
news:HE*****************@fe2.texas.rr.com...
"jester.dev" <je********@comcast.net> wrote in message
news:oiAWb.9926$jk2.28236@attbi_s53...
Hello,

I'm learning Python from Python Bible, and having some
problems with this code below. When I run it, I get nothing. It
should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears
in the text.
Try this:

# wordCount.py
#
# invoke using: python wordCount.py <filename>
#
from pyparsing import Word, alphas
import sys

# modify this word definition as you wish - whitespace is implicit

separator wordSpec = Word(alphas)

if len(sys.argv) > 1:
infile = sys.argv[1]

wordDict = {}
filetext = "\n".join( file(infile).readlines() )
for wd,locstart,locend in wordSpec.scanString(filetext):
curWord = wd[0].lower()
if wordDict.has_key( curWord ):
wordDict[curWord] += 1
else:
wordDict[curWord] = 1

print "%s has %d different words." % ( infile, len(wordDict.keys()) )
keylist = wordDict.keys()
keylist.sort( lambda a,b:
( wordDict[b] - wordDict[a] ) or
( ( ( a > b ) and 1 ) or ( ( a < b ) and -1 ) or 0 ) )
for k in keylist:
print k, ":", wordDict[k]

Jul 18 '05 #4

P: n/a
See inline.

Ben Finney wrote:
On Thu, 12 Feb 2004 01:04:20 GMT, jester.dev wrote:
I'm learning Python from Python Bible

How are you invoking it (what command do you type)? Does the program
appear to do something, then exit?

I made it executable: chmod 755 word_count.py
I also tried: python word_count.py

Diagnostics:

When you encounter unexpected behaviour in a complex piece of code, it's
best to test some assumptions.

What happens when the file "poem.txt" is not there? (Rename the file to
a different name.) This will tell you whether the program is even
attempting to read the file.
It does nothing either way. First time I ran it the file was not there.
What happens when you import this into the interactive Python prompt,
then call CountWords on some text? This will tell you whether the
function is performing as expected.
Nothing happens. :) So I guess what you said below is correct.
And so on.
One possible problem that may be a mistake in the way you pasted the
text into your newsgroup message:
#!/usr/bin/python
[...]
import string

def CountWords(Text):
[...]
for CharacterIndex in range(0,len(Text)):
[...]
if(PiecesOfWords.find(CurrentCharacter)!=-1):
[...]
else:
if(CurrentWord!=""):
[...]
if (__name__=="__main__"):
[...]


Indentation defines structural language blocks in Python. The "def",
"for", "if" structures above will encompass *all* lines below them until
the next line at their own indentation level or less.

In other words, if the code looks the way you've pasted it here, the
"def" encompasses everything below it; the "for" encompasses everything
below it; and the "if(PiecesOfWords...):" encompasses everything below
it. Including the "if( __name__ == "__main__" ):" line.

Thus, as you've posted it here, the file imports the string module,
defines a function -- then does nothing with it.

Please be sure to paste the text literally in messages; or, if you've
pasted the text exactly as it is in the program, learn how Python
interprets indentation:

<http://www.python.org/doc/current/ref/indentation.html>


Thanks for the link. I'm not really used to this whole indention deal yet. I
as however using WingIDE which indents for me.

JesterDev
Jul 18 '05 #5

P: n/a
On Thu, 12 Feb 2004 01:04:20 GMT in comp.lang.python, "jester.dev"
<je********@comcast.net> wrote:
Hello,

I'm learning Python from Python Bible, and having some
problems with this code below. When I run it, I get nothing. It
should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears
in the text.
When I run it (after re-formatting - you can see below how it appears
in my newsreader), and after fixing the two error messages, it prints
the results just as you describe. Try this:

1) Add the line 'CurrentWord = ""' just before the line
'for CharacterIndex in range(0,len(Text)):'
2) Change the very last line to 'print Word, WordCount[Word]'

If that doesn't work for you then I suspect that the indenting in your
program is wrong (rather than just being mangled by posting it), but
I'm just guessing. It would be helpful if you posted the actual error
message (Traceback) that the Python interpreter prints, that makes it
much easier to find the problem.

Dave

#!/usr/bin/python
# WordCount.py - Counts the words in a given text file (poem.txt)

import string

def CountWords(Text):
"Count how many times each word appears in Text"
# A string (above) after a def statement is a -
# "docstring" - a comment intended for documentation.
WordCount={}
# We will build up (and return) a dictionary whose keys
# are the words, and whose values are the corresponding
# number of occurrences.

CountWords=""
# To make the job cleaner, add a period at the end of the
# text; that way, we are guaranteed to be finished with
# the current word when we run out of letters:
Text=Text+"."

# We assume that ' and - don't break words, but any other
# nonalphabetic character does. This assumption isn't
# entirely accurate, but it's close enough for us.
# string.letters is a string of all alphabetic charactors.
PiecesOfWords=string.letters+"'-"

# Iterate over each character in the text. The function
# len () returns the length of a sequence.
for CharacterIndex in range(0,len(Text)):
CurrentCharacter=Text[CharacterIndex]

# The find() method of a string finds the starting
# index of the first occurrence of a substring within
# a string, or returns -1 of it doesn't find a substring.
# The next line of code tests to see wether CurrentCharacter
# is part of a word:
if(PiecesOfWords.find(CurrentCharacter)!=-1):
# Append this letter to the current word.
CurrentWord=CurrentWord+CurrentCharacter
else:
# This character is no a letter.
if(CurrentWord!=""):
# We just finished a word.
# Convert to lowercase, so "The" and
"the"
# fall in the same bucket...

CurrentWord=string.lower(CurrentWord)

# Now increment this word's count.

CurrentCount=WordCount.get(CurrentWord,0)

WordCount[CurrentWord]=CurrentCount+1

# Start a new word.
CurrentWord=""
return(WordCount)
if (__name__=="__main__"):
# Read the text from the file
peom.txt.
TextFile=open("poem.txt","r")
Text=TextFile.read()
TextFile.close()

# Count the words in the text.
WordCount=CountWords(Text)
# Alphabetize the word list, and
print them all out.
SortedWords=WordCount.keys()
SortedWords.sort()
for Word in SortedWords:
print Word.WordCount[Word]


Jul 18 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.