Problem loading a file of words

teoryn

I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:

*--beginning of file--*
#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word = str.lower(word)
word_list = []
for char in word:
word_list.append(char)
word_list.sort()
sorted_word = ''
for char in word_list:
sorted_word += char
return sorted_word

print 'Building dictionary...',

dictionary = { }

# Notice that you need to have a file named 'dictionary.txt'
# in the same directory as this file. The format is to have
# one word per line, such as the following (of course without
# the # marks):

#test
#hello
#quit
#night
#pear
#pare

f = file('dictionary.txt')

# This loop builds the dictionary, where the key is
# the string after calling sort_string(), and the value
# is the list of all 'regular' words (from the dictionary,
# not sorted) that passing to sort_string() returns the key

while True:
line = f.readline()
if len(line) == 0:
break
line = str.lower(line[:-1]) # convert to lowercase just in case
and
# remove the return at the end of
the line
sline = sort_string(line)
if sline in dictionary: # this key already exist, add to
existing list
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline) #for testing
else: # create new key and list
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line) #for
testing
f.close()

print 'Ready!'

# This loop lets the user input a scrambled word, look for it in
# dictionary, and print all matching unscrambled words.
# If the user types 'quit' then the program ends.
while True:
lookup = raw_input('Enter a scrambled word : ')

results = dictionary[sort_string(lookup)]

for x in results:
print x,

print

if lookup == 'quit':
break
*--end of file--*
If you create dictionary.txt as suggested in the comments, it should
work fine (assumeing you pass a word that creates a valid key, I'll
have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the large
dictionary)
*--beginning of example--*
Enter a scrambled word : ccehimnostyz
Traceback (most recent call last):
File "unscram.py", line 62, in ?
results = dictionary[sort_string(lookup)]
KeyError: 'ccehimnostyz'
*--end of example--*
If you'd like a copy of the dictionary I'm using email me at teoryn at
gmail dot com or leave your email here and I'll send it to you (It's
702.2 KB compressed)

Thanks,
Kevin

Jul 25 '05 #1

Subscribe Post Reply

2162

Devan L

teoryn wrote:

I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:

*--beginning of file--*
#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word = str.lower(word)
word_list = []
for char in word:
word_list.append(char)
word_list.sort()
sorted_word = ''
for char in word_list:
sorted_word += char
return sorted_word

print 'Building dictionary...',

dictionary = { }

# Notice that you need to have a file named 'dictionary.txt'
# in the same directory as this file. The format is to have
# one word per line, such as the following (of course without
# the # marks):

#test
#hello
#quit
#night
#pear
#pare

f = file('dictionary.txt')

# This loop builds the dictionary, where the key is
# the string after calling sort_string(), and the value
# is the list of all 'regular' words (from the dictionary,
# not sorted) that passing to sort_string() returns the key

while True:
line = f.readline()
if len(line) == 0:
break
line = str.lower(line[:-1]) # convert to lowercase just in case
and
# remove the return at the end of
the line
sline = sort_string(line)
if sline in dictionary: # this key already exist, add to
existing list
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline) #for testing
else: # create new key and list
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line) #for
testing
f.close()

print 'Ready!'

# This loop lets the user input a scrambled word, look for it in
# dictionary, and print all matching unscrambled words.
# If the user types 'quit' then the program ends.
while True:
lookup = raw_input('Enter a scrambled word : ')

results = dictionary[sort_string(lookup)]

for x in results:
print x,

print

if lookup == 'quit':
break
*--end of file--*
If you create dictionary.txt as suggested in the comments, it should
work fine (assumeing you pass a word that creates a valid key, I'll
have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the large
dictionary)
*--beginning of example--*
Enter a scrambled word : ccehimnostyz
Traceback (most recent call last):
File "unscram.py", line 62, in ?
results = dictionary[sort_string(lookup)]
KeyError: 'ccehimnostyz'
*--end of example--*
If you'd like a copy of the dictionary I'm using email me at teoryn at
gmail dot com or leave your email here and I'll send it to you (It's
702.2 KB compressed)

Thanks,
Kevin

Heh, it reminds me of the code I used to write.

def sort_string(word):
return ''.join(sorted(list(word.lower())))
f = open('dictionary.txt','r')
lines = [line.rstrip('\n') for line in f.readlines()]
f.close()
dictionary = dict((sort_string(line),line) for line in lines)
lookup = ''
while lookup != 'quit':
lookup = raw_input('Enter a scrambled word:')
if dictionary.has_key(lookup):
word = dictionary[lookup]
else:
word = 'Not found.'
print word

You need python 2.4 to use this example.

Jul 25 '05 #2

Robert Kern

teoryn wrote:

I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:

*--beginning of file--*
#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word = str.lower(word)
word_list = []
for char in word:
word_list.append(char)
word_list.sort()
sorted_word = ''
for char in word_list:
sorted_word += char
return sorted_word
An idiomatic Python 2.4 version of this function would be:

def sort_string(word):
word = word.lower()
sorted_list = sorted(word)
sorted_word = ''.join(sorted_list)
return sorted_word
print 'Building dictionary...',

dictionary = { }

# Notice that you need to have a file named 'dictionary.txt'
# in the same directory as this file. The format is to have
# one word per line, such as the following (of course without
# the # marks):

#test
#hello
#quit
#night
#pear
#pare

f = file('dictionary.txt')

# This loop builds the dictionary, where the key is
# the string after calling sort_string(), and the value
# is the list of all 'regular' words (from the dictionary,
# not sorted) that passing to sort_string() returns the key

while True:
line = f.readline()
if len(line) == 0:
break
line = str.lower(line[:-1]) # convert to lowercase just in case
and
# remove the return at the end of
the line
sline = sort_string(line)
if sline in dictionary: # this key already exist, add to
existing list
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline) #for testing
else: # create new key and list
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line) #for
testing
f.close()
# this really should all be within a function, but let's just carry on
dictionary = {}
f = open('dictionary.txt')
try:
# enclose this in a try: finally: block in case something goes wrong
for line in f:
line = line.strip().lower()
sline = sort_string(line)
val = dictionary.setdefault(sline, [])
val.append(line)
print "Added %s to key %s" % (line, sline)
finally:
f.close()
print 'Ready!'

# This loop lets the user input a scrambled word, look for it in
# dictionary, and print all matching unscrambled words.
# If the user types 'quit' then the program ends.
while True:
lookup = raw_input('Enter a scrambled word : ')

results = dictionary[sort_string(lookup)]

for x in results:
print x,

print

if lookup == 'quit':
break
*--end of file--*
If you create dictionary.txt as suggested in the comments, it should
work fine (assumeing you pass a word that creates a valid key, I'll
have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the large
dictionary)

Well, my version works (using /usr/share/dict/words from Debian as
dictionary.txt). Yours does, too. Are you sure that you are using the
right dictionary.txt?

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Jul 25 '05 #3

Terrance N. Phillip

Kevin,
I'm pretty new to Python too. I'm not sure why you're seeing this
problem... is it possible that this is an "out-by-one" error? Is
zymotechnics the *last* word in dictionary.txt? Try this slightly
simplified version of your program and see if you have the same problem....

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
return "".join(sorted(list(word.lower())))

dictionary = {}
f = open('/usr/bin/words') # or whatever file you like
for line in f:
sline = sort_string(line[:-1])
if sline in dictionary:
dictionary[sline].append(line)
else:
dictionary[sline] = [line]
f.close()

lookup = raw_input('Enter a scrambled word : ')
while lookup:
try:
results = dictionary[sort_string(lookup)]
for x in results:
print x,
print
except:
print "?????"
lookup = raw_input('Enter a scrambled word : ')
Good luck,

Nick.

Jul 25 '05 #4

Robert Kern

Devan L wrote:

Heh, it reminds me of the code I used to write.

def sort_string(word):
return ''.join(sorted(list(word.lower())))
f = open('dictionary.txt','r')
lines = [line.rstrip('\n') for line in f.readlines()]
f.close()
dictionary = dict((sort_string(line),line) for line in lines)

That's definitely not the kind of dictionary that he wants.

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Jul 25 '05 #5

Devan L

Robert Kern wrote:

That's definitely not the kind of dictionary that he wants.

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Oh, I missed the part where he put values in a list.

Jul 25 '05 #6

Peter Otten

teoryn wrote:

I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code: line = str.lower(line[:-1]) # convert to lowercase just in case have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the large
dictionary)
*--beginning of example--*
Enter a scrambled word : ccehimnostyz
Traceback (most recent call last):
File "unscram.py", line 62, in ?
results = dictionary[sort_string(lookup)]
KeyError: 'ccehimnostyz'
*--end of example--*

If 'zymotechnics' is the last line and that line is missing a trailing
newline

line[:-1]

mutilates 'zymotechnics' to 'zymotechnic'. In that case the dictionary would
contain the key 'ccehimnotyz'. Another potential problem could be
leading/trailing whitespace. Both problems can be fixed by using
line.strip() instead of line[:-1] as in Robert Kern's code.

Peter

Jul 25 '05 #7

Steven D'Aprano

On Sun, 24 Jul 2005 20:44:08 -0700, teoryn wrote:

I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:

*--beginning of file--*
#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word = str.lower(word)
It is generally considered better form to write that line as:

word = word.lower()

word_list = []
for char in word:
word_list.append(char)
If you want a list of characters, the best way of doing that is just:

word_list = list(word)

word_list.sort() sorted_word = ''
for char in word_list:
sorted_word += char
return sorted_word
And the above four lines are best written as:

return ''.join(word_list)

print 'Building dictionary...',

dictionary = { }

# Notice that you need to have a file named 'dictionary.txt'
# in the same directory as this file. The format is to have
# one word per line, such as the following (of course without
# the # marks):

#test
#hello
#quit
#night
#pear
#pare

f = file('dictionary.txt')

# This loop builds the dictionary, where the key is
# the string after calling sort_string(), and the value
# is the list of all 'regular' words (from the dictionary,
# not sorted) that passing to sort_string() returns the key

while True:
line = f.readline()
if len(line) == 0:
break
line = str.lower(line[:-1]) # convert to lowercase just in case
and
# remove the return at the end of
the line
sline = sort_string(line)
if sline in dictionary: # this key already exist, add to
existing list
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline) #for testing
else: # create new key and list
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line) #for
testing
f.close()
Your while-loop seems to have been mangled a little thanks to word-wrap.
In particular, I can't work out what that "and" is doing in the middle of
it.

Unless you are expecting really HUGE dictionary files (hundreds of
millions of lines) perhaps a better way of writing the above while-loop
would be:

print 'Building dictionary...',
dictionary = { }
f = file('dictionary.txt', 'r')
for line in f.readlines()
line = line.strip() # remove whitespace at both ends
if line: # line is not the empty string
line = line.lower()
sline = sort_string(line)
if sline in dictionary:
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline)
else:
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line)
f.close()

print 'Ready!'

# This loop lets the user input a scrambled word, look for it in
# dictionary, and print all matching unscrambled words.
# If the user types 'quit' then the program ends.
while True:
lookup = raw_input('Enter a scrambled word : ')

results = dictionary[sort_string(lookup)]
This will fail if the scrambled word you enter is not in the dictionary.
for x in results:
print x,

print

if lookup == 'quit':
break
You probably want the test for quit to happen before printing the
"unscrambled" words.
*--end of file--*
If you create dictionary.txt as suggested in the comments, it should
work fine (assumeing you pass a word that creates a valid key, I'll
have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the
large dictionary)
*--beginning of example--*
Enter a scrambled word : ccehimnostyz Traceback (most recent call last):
File "unscram.py", line 62, in ?
results = dictionary[sort_string(lookup)]
KeyError: 'ccehimnostyz'
*--end of example--*

If this error is always happening for the LAST line in the text file, I'm
guessing there is no newline after the word. So when you read the text
file and build the dictionary, you inadvertently remove the "s" from the
word before storing it in the dictionary.
--
Steven.

Jul 25 '05 #8

teoryn

Thanks to everyone for all the help!

Here's the (at least for now) final script, although note I'm using
2.3.5, not 2.4, so I can't use some of the tips that were given.

#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word_list = list(word.lower())
word_list.sort()
return ''.join(word_list)

print 'Building dictionary...',

dictionary = { }

f = file('/usr/share/dict/words', 'r')

for line in f.readlines():
line = line.strip() # remove whitespace at both ends
if line: # line is not the empty string
line = line.lower()
sline = sort_string(line)
if sline in dictionary:
dictionary[sline].append(line)
#print 'Added %s to key %s' % (line,sline)
else:
dictionary[sline] = [line]
#print 'Created key %s for %s' % (sline,line)
f.close()

print 'Ready!'

lookup = raw_input('Enter a scrambled word : ')
while lookup:
try:
results = dictionary[sort_string(lookup)]
for x in results:
print x,
print
except:
print "?????"
lookup = raw_input('Enter a scrambled word : ')

As for the end of the file idea, that word wasn't at the end of the
file, and there was a blank line, so that's out of the question. The
word list I was using was 272,520 words long, and I got it a while back
when doing this same thing in java, but as you can see now I'm just
using /usr/share/dict/words which I found after not finding it in the
place listed in Nick's comment.

I'm still lost as to why my old code would only work for the small
file, and another interesting note is that with the larger file, it
would only write "zzz for zzz" (or whatever each word was) instead of
"Created key zzz for zzz". However, it works now, so I'm happy.

Thanks for all the help,
Kevin

Jul 25 '05 #9

Peter Otten

teoryn wrote:

I'm still lost as to why my old code would only work for the small
file, and another interesting note is that with the larger file, it
would only write "zzz for zzz" (or whatever each word was) instead of
"Created key zzz for zzz". However, it works now, so I'm happy.

Happy as long as you don't know what happened? How can that be?
Another guess then -- there may be inconsistent newlines, some "\n" and some
"\r\n":

garbled = "garbled\r\n"[:-1]
print "created key %s for %s" % ("".join(sorted(garbled)), garbled)

abdeglr for garbled

Peter

Jul 25 '05 #10

teoryn

I was just happy that it worked, but was still curious as to why it
didn't before. Thanks for the idea, I'll look into it and see if this
is the case.

Thanks,
Kevin

Jul 25 '05 #11

teoryn

I changed to using line = line.strip() instead of line = line [:-1] in
the original and it it worked.

Thanks!

Jul 25 '05 #12

Peter Hansen

teoryn wrote:

I changed to using line = line.strip() instead of line = line [:-1] in
the original and it it worked.

Just to be clear, these don't do nearly the same thing in general,
though in your specific case they might appear similar.

The line[:-1] idiom says 'return a string which is a copy of the
original but with the last character, if any, removed, regardless of
what character it is'.

The line.strip() idiom says 'return a string with all whitespace
characters removed from the end *and* start of the string'.

In certain cases, you might reasonably prefer .rstrip() (which removes
only from the right-hand side, or end), or even something like
..rstrip('\n') which would remove only newlines from the end.

-Peter

Jul 25 '05 #13

Similar topics

problem with file()

by: vKp | last post by:

I'm having a problem with file(). If I try to open a url of the form "http://example.com/find?one,two", I get an error of the following form: ....failed to create stream: Bad file descriptor... ...

PHP

ActiveXObject, Problem loading into DOM

by: Nomad | last post by:

I'm trying to load an XML document into the DOM using the ActiveXObject I've succeeded in doing this on one machine. Which shouldn't becaus I've checked for the ActiveXObject and it doesn't...

.NET Framework

Problem Loading xml document

by: Tom Vukovich | last post by:

I'm having a problem loading an xmldocument from the web. The xml i wish to bring into the application is generated from a request to an ASP page. The following code does not work. ...

.NET Framework

ASP.NET: problem loading managed C++ DLL

by: NGM | last post by:

Hello All I have a unmanaged C++ DLL, which has been wrapped up with a manged C++ DLL. When i refer to this managed DLL in Windows form based applications it works out fine. But when i refer to...

ASP.NET

Problem loading plug-in DLLs with Assembly.LoadFrom()

by: Oenone | last post by:

I have created a number of "plug-in" DLLs for my ASP.NET application which are dynamically loaded at run-time. This is done by locating the DLL file on disk and loading it using the...

ASP.NET

Counting a text file words in C++ and C, using lists or hash tables

by: bigbagy | last post by:

Notes The programs will be compiled and tested on the machine which runs the Linux operating system. V3.4 of the GNU C/C++ compiler (gcc ,g++) must be used. A significant amount coding is...

C / C++

problem loading php_gd2.dll php5/apache2.0+

by: Pete Marsh | last post by:

Wondering if anyone can recomend some sample code for dynamically loading the GD module. I have tried setting the extension dir in php.ini, and loading the GD module from there when apache is...

PHP

AJAX Problem: Loading URL into Div

by: Shigun | last post by:

On a website I am working on I am trying to load another page into a div on the the page the user does his work from. What I have works correctly in FireFox, but not in IE. I've rummaged Google for...

Javascript

having problem loading a property file

by: ndedhia1 | last post by:

I am having trouble loading a property file and keep getting this error when running my shell script that loads and runs my jar files, etc: no property file loaded, using defaults......

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice