473,386 Members | 1,864 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Word count from file help.

Hello,

I'm learning Python from Python Bible, and having some
problems with this code below. When I run it, I get nothing. It
should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears
in the text.

#!/usr/bin/python
# WordCount.py - Counts the words in a given text file (poem.txt)

import string

def CountWords(Text):
"Count how many times each word appears in Text"
# A string (above) after a def statement is a -
# "docstring" - a comment intended for documentation.
WordCount={}
# We will build up (and return) a dictionary whose keys
# are the words, and whose values are the corresponding
# number of occurrences.

CountWords=""
# To make the job cleaner, add a period at the end of the
# text; that way, we are guaranteed to be finished with
# the current word when we run out of letters:
Text=Text+"."

# We assume that ' and - don't break words, but any other
# nonalphabetic character does. This assumption isn't
# entirely accurate, but it's close enough for us.
# string.letters is a string of all alphabetic charactors.
PiecesOfWords=string.letters+"'-"

# Iterate over each character in the text. The function
# len () returns the length of a sequence.
for CharacterIndex in range(0,len(Text)):
CurrentCharacter=Text[CharacterIndex]

# The find() method of a string finds the starting
# index of the first occurrence of a substring within
# a string, or returns -1 of it doesn't find a substring.
# The next line of code tests to see wether CurrentCharacter
# is part of a word:
if(PiecesOfWords.find(CurrentCharacter)!=-1):
# Append this letter to the current word.
CurrentWord=CurrentWord+CurrentCharacter
else:
# This character is no a letter.
if(CurrentWord!=""):
# We just finished a word.
# Convert to lowercase, so "The" and
"the"
# fall in the same bucket...

CurrentWord=string.lower(CurrentWord)

# Now increment this word's count.

CurrentCount=WordCount.get(CurrentWord,0)

WordCount[CurrentWord]=CurrentCount+1

# Start a new word.
CurrentWord=""
return(WordCount)
if (__name__=="__main__"):
# Read the text from the file
peom.txt.
TextFile=open("poem.txt","r")
Text=TextFile.read()
TextFile.close()

# Count the words in the text.
WordCount=CountWords(Text)
# Alphabetize the word list, and
print them all out.
SortedWords=WordCount.keys()
SortedWords.sort()
for Word in SortedWords:
print Word.WordCount[Word]
Jul 18 '05 #1
5 7618
On Thu, 12 Feb 2004 01:04:20 GMT, jester.dev wrote:
I'm learning Python from Python Bible
Welcome, I hope you're enjoying learning the language.
problems with this code below. When I run it, I get nothing.
More information required:

How are you invoking it (what command do you type)? Does the program
appear to do something, then exit?

You've told us what you expect the program to do (thanks!):
It should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears in the
text.
Diagnostics:

When you encounter unexpected behaviour in a complex piece of code, it's
best to test some assumptions.

What happens when the file "poem.txt" is not there? (Rename the file to
a different name.) This will tell you whether the program is even
attempting to read the file.

What happens when you import this into the interactive Python prompt,
then call CountWords on some text? This will tell you whether the
function is performing as expected.

And so on.
One possible problem that may be a mistake in the way you pasted the
text into your newsgroup message:
#!/usr/bin/python
[...]
import string

def CountWords(Text):
[...]
for CharacterIndex in range(0,len(Text)):
[...]
if(PiecesOfWords.find(CurrentCharacter)!=-1):
[...]
else:
if(CurrentWord!=""):
[...]
if (__name__=="__main__"):
[...]


Indentation defines structural language blocks in Python. The "def",
"for", "if" structures above will encompass *all* lines below them until
the next line at their own indentation level or less.

In other words, if the code looks the way you've pasted it here, the
"def" encompasses everything below it; the "for" encompasses everything
below it; and the "if(PiecesOfWords...):" encompasses everything below
it. Including the "if( __name__ == "__main__" ):" line.

Thus, as you've posted it here, the file imports the string module,
defines a function -- then does nothing with it.

Please be sure to paste the text literally in messages; or, if you've
pasted the text exactly as it is in the program, learn how Python
interprets indentation:

<http://www.python.org/doc/current/ref/indentation.html>

--
\ "You've got the brain of a four-year-old boy, and I'll bet he |
`\ was glad to get rid of it." -- Groucho Marx |
_o__) |
Ben Finney <http://bignose.squidly.org/>
Jul 18 '05 #2
"jester.dev" <je********@comcast.net> wrote in message
news:oiAWb.9926$jk2.28236@attbi_s53...
Hello,

I'm learning Python from Python Bible, and having some
problems with this code below. When I run it, I get nothing. It
should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears
in the text.

Try this:

# wordCount.py
#
# invoke using: python wordCount.py <filename>
#
from pyparsing import Word, alphas
import sys

# modify this word definition as you wish - whitespace is implicit separator
wordSpec = Word(alphas)

if len(sys.argv) > 1:
infile = sys.argv[1]

wordDict = {}
filetext = "\n".join( file(infile).readlines() )
for wd,locstart,locend in wordSpec.scanString(filetext):
#~ curWord = string.lower(wd[0])
curWord = wd[0].lower()
if wordDict.has_key( curWord ):
wordDict[curWord] += 1
else:
wordDict[curWord] = 1

print "%s has %d different words." % ( infile, len(wordDict.keys()) )
keylist = wordDict.keys()
keylist.sort( lambda a,b:
( wordDict[b] - wordDict[a] ) or
( ( ( a > b ) and 1 ) or ( ( a < b ) and -1 ) or 0 ) )
for k in keylist:
print k, ":", wordDict[k]
Jul 18 '05 #3
Oops, sorry, forgot to mention that this requires downloading pyparsing at
http://pyparsing.sourceforge.net.

"Paul McGuire" <pt***@users.sourceforge.net> wrote in message
news:HE*****************@fe2.texas.rr.com...
"jester.dev" <je********@comcast.net> wrote in message
news:oiAWb.9926$jk2.28236@attbi_s53...
Hello,

I'm learning Python from Python Bible, and having some
problems with this code below. When I run it, I get nothing. It
should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears
in the text.
Try this:

# wordCount.py
#
# invoke using: python wordCount.py <filename>
#
from pyparsing import Word, alphas
import sys

# modify this word definition as you wish - whitespace is implicit

separator wordSpec = Word(alphas)

if len(sys.argv) > 1:
infile = sys.argv[1]

wordDict = {}
filetext = "\n".join( file(infile).readlines() )
for wd,locstart,locend in wordSpec.scanString(filetext):
curWord = wd[0].lower()
if wordDict.has_key( curWord ):
wordDict[curWord] += 1
else:
wordDict[curWord] = 1

print "%s has %d different words." % ( infile, len(wordDict.keys()) )
keylist = wordDict.keys()
keylist.sort( lambda a,b:
( wordDict[b] - wordDict[a] ) or
( ( ( a > b ) and 1 ) or ( ( a < b ) and -1 ) or 0 ) )
for k in keylist:
print k, ":", wordDict[k]

Jul 18 '05 #4
See inline.

Ben Finney wrote:
On Thu, 12 Feb 2004 01:04:20 GMT, jester.dev wrote:
I'm learning Python from Python Bible

How are you invoking it (what command do you type)? Does the program
appear to do something, then exit?

I made it executable: chmod 755 word_count.py
I also tried: python word_count.py

Diagnostics:

When you encounter unexpected behaviour in a complex piece of code, it's
best to test some assumptions.

What happens when the file "poem.txt" is not there? (Rename the file to
a different name.) This will tell you whether the program is even
attempting to read the file.
It does nothing either way. First time I ran it the file was not there.
What happens when you import this into the interactive Python prompt,
then call CountWords on some text? This will tell you whether the
function is performing as expected.
Nothing happens. :) So I guess what you said below is correct.
And so on.
One possible problem that may be a mistake in the way you pasted the
text into your newsgroup message:
#!/usr/bin/python
[...]
import string

def CountWords(Text):
[...]
for CharacterIndex in range(0,len(Text)):
[...]
if(PiecesOfWords.find(CurrentCharacter)!=-1):
[...]
else:
if(CurrentWord!=""):
[...]
if (__name__=="__main__"):
[...]


Indentation defines structural language blocks in Python. The "def",
"for", "if" structures above will encompass *all* lines below them until
the next line at their own indentation level or less.

In other words, if the code looks the way you've pasted it here, the
"def" encompasses everything below it; the "for" encompasses everything
below it; and the "if(PiecesOfWords...):" encompasses everything below
it. Including the "if( __name__ == "__main__" ):" line.

Thus, as you've posted it here, the file imports the string module,
defines a function -- then does nothing with it.

Please be sure to paste the text literally in messages; or, if you've
pasted the text exactly as it is in the program, learn how Python
interprets indentation:

<http://www.python.org/doc/current/ref/indentation.html>


Thanks for the link. I'm not really used to this whole indention deal yet. I
as however using WingIDE which indents for me.

JesterDev
Jul 18 '05 #5
On Thu, 12 Feb 2004 01:04:20 GMT in comp.lang.python, "jester.dev"
<je********@comcast.net> wrote:
Hello,

I'm learning Python from Python Bible, and having some
problems with this code below. When I run it, I get nothing. It
should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears
in the text.
When I run it (after re-formatting - you can see below how it appears
in my newsreader), and after fixing the two error messages, it prints
the results just as you describe. Try this:

1) Add the line 'CurrentWord = ""' just before the line
'for CharacterIndex in range(0,len(Text)):'
2) Change the very last line to 'print Word, WordCount[Word]'

If that doesn't work for you then I suspect that the indenting in your
program is wrong (rather than just being mangled by posting it), but
I'm just guessing. It would be helpful if you posted the actual error
message (Traceback) that the Python interpreter prints, that makes it
much easier to find the problem.

Dave

#!/usr/bin/python
# WordCount.py - Counts the words in a given text file (poem.txt)

import string

def CountWords(Text):
"Count how many times each word appears in Text"
# A string (above) after a def statement is a -
# "docstring" - a comment intended for documentation.
WordCount={}
# We will build up (and return) a dictionary whose keys
# are the words, and whose values are the corresponding
# number of occurrences.

CountWords=""
# To make the job cleaner, add a period at the end of the
# text; that way, we are guaranteed to be finished with
# the current word when we run out of letters:
Text=Text+"."

# We assume that ' and - don't break words, but any other
# nonalphabetic character does. This assumption isn't
# entirely accurate, but it's close enough for us.
# string.letters is a string of all alphabetic charactors.
PiecesOfWords=string.letters+"'-"

# Iterate over each character in the text. The function
# len () returns the length of a sequence.
for CharacterIndex in range(0,len(Text)):
CurrentCharacter=Text[CharacterIndex]

# The find() method of a string finds the starting
# index of the first occurrence of a substring within
# a string, or returns -1 of it doesn't find a substring.
# The next line of code tests to see wether CurrentCharacter
# is part of a word:
if(PiecesOfWords.find(CurrentCharacter)!=-1):
# Append this letter to the current word.
CurrentWord=CurrentWord+CurrentCharacter
else:
# This character is no a letter.
if(CurrentWord!=""):
# We just finished a word.
# Convert to lowercase, so "The" and
"the"
# fall in the same bucket...

CurrentWord=string.lower(CurrentWord)

# Now increment this word's count.

CurrentCount=WordCount.get(CurrentWord,0)

WordCount[CurrentWord]=CurrentCount+1

# Start a new word.
CurrentWord=""
return(WordCount)
if (__name__=="__main__"):
# Read the text from the file
peom.txt.
TextFile=open("poem.txt","r")
Text=TextFile.read()
TextFile.close()

# Count the words in the text.
WordCount=CountWords(Text)
# Alphabetize the word list, and
print them all out.
SortedWords=WordCount.keys()
SortedWords.sort()
for Word in SortedWords:
print Word.WordCount[Word]


Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Martin Lucas-Smith | last post by:
I am trying to use PHP's COM support to open a URL from within MS Word then save the document. I am using PHP5.0.3/Apache2/WindowsXP. phpinfo() confirms that COM support is enabled. ...
5
by: STeve | last post by:
Hey guys, I currently have a 100 page word document filled with various "articles". These articles are delimited by the Style of the text (IE. Heading 1 for the various titles) These articles...
11
by: Jacek Dziedzic | last post by:
Hi! I need a routine like: std::string nth_word(const std::string &s, unsigned int n) { // return n-th word from the string, n is 0-based // if 's' contains too few words, return "" //...
3
by: waynejr25 | last post by:
can anyone help me add a function that will count the occurance of each word in an input file. here's the code i have so far it counts the number of characters, words, and lines but i need the...
1
by: beanie | last post by:
i am a c programming beginner and i am trying to Create a concordance of Word Count for a text File in c programming but my code isn't working.please can u help me out.here is my code: #include...
2
by: beanie | last post by:
i am a beginer in c programming and i am trying to Create a Concordance of Word Count for a Text File but my code is not working.pls can anyone helpme out.here is my code: #include <stdio.h>...
6
by: boyindie86 | last post by:
Hi I have been fighting with this lump of code for the last week what I am trying to do is that I am passing words into passages of texts, and I want the system to go and find exact word matches...
0
by: alivip | last post by:
I write code to get most frequent words in the file I won't to implement bigram probability by modifying the code to do the following: How can I get every Token (word) and ...
5
by: alivip | last post by:
How can I get every Token (word) and PreviousToken(Previous word) From multube files and frequency of each two word my code is trying to get all single word and double word (every Token (word) and...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.