By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
460,022 Members | 1,372 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 460,022 IT Pros & Developers. It's quick & easy.

Problem loading a file of words

P: n/a
I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:

*--beginning of file--*
#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word = str.lower(word)
word_list = []
for char in word:
word_list.append(char)
word_list.sort()
sorted_word = ''
for char in word_list:
sorted_word += char
return sorted_word

print 'Building dictionary...',

dictionary = { }

# Notice that you need to have a file named 'dictionary.txt'
# in the same directory as this file. The format is to have
# one word per line, such as the following (of course without
# the # marks):

#test
#hello
#quit
#night
#pear
#pare

f = file('dictionary.txt')

# This loop builds the dictionary, where the key is
# the string after calling sort_string(), and the value
# is the list of all 'regular' words (from the dictionary,
# not sorted) that passing to sort_string() returns the key

while True:
line = f.readline()
if len(line) == 0:
break
line = str.lower(line[:-1]) # convert to lowercase just in case
and
# remove the return at the end of
the line
sline = sort_string(line)
if sline in dictionary: # this key already exist, add to
existing list
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline) #for testing
else: # create new key and list
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line) #for
testing
f.close()

print 'Ready!'

# This loop lets the user input a scrambled word, look for it in
# dictionary, and print all matching unscrambled words.
# If the user types 'quit' then the program ends.
while True:
lookup = raw_input('Enter a scrambled word : ')

results = dictionary[sort_string(lookup)]

for x in results:
print x,

print

if lookup == 'quit':
break
*--end of file--*
If you create dictionary.txt as suggested in the comments, it should
work fine (assumeing you pass a word that creates a valid key, I'll
have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the large
dictionary)
*--beginning of example--*
Enter a scrambled word : ccehimnostyz
Traceback (most recent call last):
File "unscram.py", line 62, in ?
results = dictionary[sort_string(lookup)]
KeyError: 'ccehimnostyz'
*--end of example--*
If you'd like a copy of the dictionary I'm using email me at teoryn at
gmail dot com or leave your email here and I'll send it to you (It's
702.2 KB compressed)

Thanks,
Kevin

Jul 25 '05 #1
Share this Question
Share on Google+
12 Replies


P: n/a
teoryn wrote:
I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:

*--beginning of file--*
#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word = str.lower(word)
word_list = []
for char in word:
word_list.append(char)
word_list.sort()
sorted_word = ''
for char in word_list:
sorted_word += char
return sorted_word

print 'Building dictionary...',

dictionary = { }

# Notice that you need to have a file named 'dictionary.txt'
# in the same directory as this file. The format is to have
# one word per line, such as the following (of course without
# the # marks):

#test
#hello
#quit
#night
#pear
#pare

f = file('dictionary.txt')

# This loop builds the dictionary, where the key is
# the string after calling sort_string(), and the value
# is the list of all 'regular' words (from the dictionary,
# not sorted) that passing to sort_string() returns the key

while True:
line = f.readline()
if len(line) == 0:
break
line = str.lower(line[:-1]) # convert to lowercase just in case
and
# remove the return at the end of
the line
sline = sort_string(line)
if sline in dictionary: # this key already exist, add to
existing list
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline) #for testing
else: # create new key and list
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line) #for
testing
f.close()

print 'Ready!'

# This loop lets the user input a scrambled word, look for it in
# dictionary, and print all matching unscrambled words.
# If the user types 'quit' then the program ends.
while True:
lookup = raw_input('Enter a scrambled word : ')

results = dictionary[sort_string(lookup)]

for x in results:
print x,

print

if lookup == 'quit':
break
*--end of file--*
If you create dictionary.txt as suggested in the comments, it should
work fine (assumeing you pass a word that creates a valid key, I'll
have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the large
dictionary)
*--beginning of example--*
Enter a scrambled word : ccehimnostyz
Traceback (most recent call last):
File "unscram.py", line 62, in ?
results = dictionary[sort_string(lookup)]
KeyError: 'ccehimnostyz'
*--end of example--*
If you'd like a copy of the dictionary I'm using email me at teoryn at
gmail dot com or leave your email here and I'll send it to you (It's
702.2 KB compressed)

Thanks,
Kevin


Heh, it reminds me of the code I used to write.

def sort_string(word):
return ''.join(sorted(list(word.lower())))
f = open('dictionary.txt','r')
lines = [line.rstrip('\n') for line in f.readlines()]
f.close()
dictionary = dict((sort_string(line),line) for line in lines)
lookup = ''
while lookup != 'quit':
lookup = raw_input('Enter a scrambled word:')
if dictionary.has_key(lookup):
word = dictionary[lookup]
else:
word = 'Not found.'
print word

You need python 2.4 to use this example.

Jul 25 '05 #2

P: n/a
teoryn wrote:
I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:

*--beginning of file--*
#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word = str.lower(word)
word_list = []
for char in word:
word_list.append(char)
word_list.sort()
sorted_word = ''
for char in word_list:
sorted_word += char
return sorted_word
An idiomatic Python 2.4 version of this function would be:

def sort_string(word):
word = word.lower()
sorted_list = sorted(word)
sorted_word = ''.join(sorted_list)
return sorted_word
print 'Building dictionary...',

dictionary = { }

# Notice that you need to have a file named 'dictionary.txt'
# in the same directory as this file. The format is to have
# one word per line, such as the following (of course without
# the # marks):

#test
#hello
#quit
#night
#pear
#pare

f = file('dictionary.txt')

# This loop builds the dictionary, where the key is
# the string after calling sort_string(), and the value
# is the list of all 'regular' words (from the dictionary,
# not sorted) that passing to sort_string() returns the key

while True:
line = f.readline()
if len(line) == 0:
break
line = str.lower(line[:-1]) # convert to lowercase just in case
and
# remove the return at the end of
the line
sline = sort_string(line)
if sline in dictionary: # this key already exist, add to
existing list
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline) #for testing
else: # create new key and list
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line) #for
testing
f.close()
# this really should all be within a function, but let's just carry on
dictionary = {}
f = open('dictionary.txt')
try:
# enclose this in a try: finally: block in case something goes wrong
for line in f:
line = line.strip().lower()
sline = sort_string(line)
val = dictionary.setdefault(sline, [])
val.append(line)
print "Added %s to key %s" % (line, sline)
finally:
f.close()
print 'Ready!'

# This loop lets the user input a scrambled word, look for it in
# dictionary, and print all matching unscrambled words.
# If the user types 'quit' then the program ends.
while True:
lookup = raw_input('Enter a scrambled word : ')

results = dictionary[sort_string(lookup)]

for x in results:
print x,

print

if lookup == 'quit':
break
*--end of file--*
If you create dictionary.txt as suggested in the comments, it should
work fine (assumeing you pass a word that creates a valid key, I'll
have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the large
dictionary)


Well, my version works (using /usr/share/dict/words from Debian as
dictionary.txt). Yours does, too. Are you sure that you are using the
right dictionary.txt?

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Jul 25 '05 #3

P: n/a
Kevin,
I'm pretty new to Python too. I'm not sure why you're seeing this
problem... is it possible that this is an "out-by-one" error? Is
zymotechnics the *last* word in dictionary.txt? Try this slightly
simplified version of your program and see if you have the same problem....

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
return "".join(sorted(list(word.lower())))

dictionary = {}
f = open('/usr/bin/words') # or whatever file you like
for line in f:
sline = sort_string(line[:-1])
if sline in dictionary:
dictionary[sline].append(line)
else:
dictionary[sline] = [line]
f.close()

lookup = raw_input('Enter a scrambled word : ')
while lookup:
try:
results = dictionary[sort_string(lookup)]
for x in results:
print x,
print
except:
print "?????"
lookup = raw_input('Enter a scrambled word : ')
Good luck,

Nick.
Jul 25 '05 #4

P: n/a
Devan L wrote:
Heh, it reminds me of the code I used to write.

def sort_string(word):
return ''.join(sorted(list(word.lower())))
f = open('dictionary.txt','r')
lines = [line.rstrip('\n') for line in f.readlines()]
f.close()
dictionary = dict((sort_string(line),line) for line in lines)


That's definitely not the kind of dictionary that he wants.

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Jul 25 '05 #5

P: n/a


Robert Kern wrote:
That's definitely not the kind of dictionary that he wants.

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter


Oh, I missed the part where he put values in a list.

Jul 25 '05 #6

P: n/a
teoryn wrote:
I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code: line = str.lower(line[:-1]) # convert to lowercase just in case have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the large
dictionary)
*--beginning of example--*
Enter a scrambled word : ccehimnostyz
Traceback (most recent call last):
File "unscram.py", line 62, in ?
results = dictionary[sort_string(lookup)]
KeyError: 'ccehimnostyz'
*--end of example--*


If 'zymotechnics' is the last line and that line is missing a trailing
newline

line[:-1]

mutilates 'zymotechnics' to 'zymotechnic'. In that case the dictionary would
contain the key 'ccehimnotyz'. Another potential problem could be
leading/trailing whitespace. Both problems can be fixed by using
line.strip() instead of line[:-1] as in Robert Kern's code.

Peter

Jul 25 '05 #7

P: n/a
On Sun, 24 Jul 2005 20:44:08 -0700, teoryn wrote:
I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:

*--beginning of file--*
#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word = str.lower(word)
It is generally considered better form to write that line as:

word = word.lower()

word_list = []
for char in word:
word_list.append(char)
If you want a list of characters, the best way of doing that is just:

word_list = list(word)

word_list.sort() sorted_word = ''
for char in word_list:
sorted_word += char
return sorted_word
And the above four lines are best written as:

return ''.join(word_list)

print 'Building dictionary...',

dictionary = { }

# Notice that you need to have a file named 'dictionary.txt'
# in the same directory as this file. The format is to have
# one word per line, such as the following (of course without
# the # marks):

#test
#hello
#quit
#night
#pear
#pare

f = file('dictionary.txt')

# This loop builds the dictionary, where the key is
# the string after calling sort_string(), and the value
# is the list of all 'regular' words (from the dictionary,
# not sorted) that passing to sort_string() returns the key

while True:
line = f.readline()
if len(line) == 0:
break
line = str.lower(line[:-1]) # convert to lowercase just in case
and
# remove the return at the end of
the line
sline = sort_string(line)
if sline in dictionary: # this key already exist, add to
existing list
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline) #for testing
else: # create new key and list
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line) #for
testing
f.close()
Your while-loop seems to have been mangled a little thanks to word-wrap.
In particular, I can't work out what that "and" is doing in the middle of
it.

Unless you are expecting really HUGE dictionary files (hundreds of
millions of lines) perhaps a better way of writing the above while-loop
would be:

print 'Building dictionary...',
dictionary = { }
f = file('dictionary.txt', 'r')
for line in f.readlines()
line = line.strip() # remove whitespace at both ends
if line: # line is not the empty string
line = line.lower()
sline = sort_string(line)
if sline in dictionary:
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline)
else:
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line)
f.close()

print 'Ready!'

# This loop lets the user input a scrambled word, look for it in
# dictionary, and print all matching unscrambled words.
# If the user types 'quit' then the program ends.
while True:
lookup = raw_input('Enter a scrambled word : ')

results = dictionary[sort_string(lookup)]
This will fail if the scrambled word you enter is not in the dictionary.
for x in results:
print x,

print

if lookup == 'quit':
break
You probably want the test for quit to happen before printing the
"unscrambled" words.
*--end of file--*
If you create dictionary.txt as suggested in the comments, it should
work fine (assumeing you pass a word that creates a valid key, I'll
have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the
large dictionary)
*--beginning of example--*
Enter a scrambled word : ccehimnostyz Traceback (most recent call last):
File "unscram.py", line 62, in ?
results = dictionary[sort_string(lookup)]
KeyError: 'ccehimnostyz'
*--end of example--*


If this error is always happening for the LAST line in the text file, I'm
guessing there is no newline after the word. So when you read the text
file and build the dictionary, you inadvertently remove the "s" from the
word before storing it in the dictionary.
--
Steven.

Jul 25 '05 #8

P: n/a
Thanks to everyone for all the help!

Here's the (at least for now) final script, although note I'm using
2.3.5, not 2.4, so I can't use some of the tips that were given.

#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word_list = list(word.lower())
word_list.sort()
return ''.join(word_list)

print 'Building dictionary...',

dictionary = { }

f = file('/usr/share/dict/words', 'r')

for line in f.readlines():
line = line.strip() # remove whitespace at both ends
if line: # line is not the empty string
line = line.lower()
sline = sort_string(line)
if sline in dictionary:
dictionary[sline].append(line)
#print 'Added %s to key %s' % (line,sline)
else:
dictionary[sline] = [line]
#print 'Created key %s for %s' % (sline,line)
f.close()

print 'Ready!'

lookup = raw_input('Enter a scrambled word : ')
while lookup:
try:
results = dictionary[sort_string(lookup)]
for x in results:
print x,
print
except:
print "?????"
lookup = raw_input('Enter a scrambled word : ')

As for the end of the file idea, that word wasn't at the end of the
file, and there was a blank line, so that's out of the question. The
word list I was using was 272,520 words long, and I got it a while back
when doing this same thing in java, but as you can see now I'm just
using /usr/share/dict/words which I found after not finding it in the
place listed in Nick's comment.

I'm still lost as to why my old code would only work for the small
file, and another interesting note is that with the larger file, it
would only write "zzz for zzz" (or whatever each word was) instead of
"Created key zzz for zzz". However, it works now, so I'm happy.

Thanks for all the help,
Kevin

Jul 25 '05 #9

P: n/a
teoryn wrote:
I'm still lost as to why my old code would only work for the small
file, and another interesting note is that with the larger file, it
would only write "zzz for zzz" (or whatever each word was) instead of
"Created key zzz for zzz". However, it works now, so I'm happy.


Happy as long as you don't know what happened? How can that be?
Another guess then -- there may be inconsistent newlines, some "\n" and some
"\r\n":
garbled = "garbled\r\n"[:-1]
print "created key %s for %s" % ("".join(sorted(garbled)), garbled)

abdeglr for garbled

Peter

Jul 25 '05 #10

P: n/a
I was just happy that it worked, but was still curious as to why it
didn't before. Thanks for the idea, I'll look into it and see if this
is the case.

Thanks,
Kevin

Jul 25 '05 #11

P: n/a
I changed to using line = line.strip() instead of line = line [:-1] in
the original and it it worked.

Thanks!

Jul 25 '05 #12

P: n/a
teoryn wrote:
I changed to using line = line.strip() instead of line = line [:-1] in
the original and it it worked.


Just to be clear, these don't do nearly the same thing in general,
though in your specific case they might appear similar.

The line[:-1] idiom says 'return a string which is a copy of the
original but with the last character, if any, removed, regardless of
what character it is'.

The line.strip() idiom says 'return a string with all whitespace
characters removed from the end *and* start of the string'.

In certain cases, you might reasonably prefer .rstrip() (which removes
only from the right-hand side, or end), or even something like
..rstrip('\n') which would remove only newlines from the end.

-Peter
Jul 25 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.