Performance issue

Tom Carrick

Hi,

In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds. I'm not willing to
think it's just python being slow, so I was hoping someone could find
a faster way of doing this. Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.

The words.txt here is just a copy of FreeBSD's /usr/share/dict/words

Anyway, the code:

import string

# Need a function to convert a string to a list to be
# able to use the sort() function
def string2list(s):
l = []
for i in range(0, len(s)):
l.append(s[i])
return l

words = []
found = []

anagram = raw_input("Find anagrams of word: ")

f = open('words.txt', 'r')
file = f.read()
f.close()

words = file.splitlines()

sorted_anagram = anagram.lower()
sorted_anagram = string2list(anagram)
sorted_anagram.sort(lambda x, y: cmp(x, y))
while words:
if len(words[0]) == len(sorted_anagram):
wordlist = string2list(words[0])
wordlist.sort(lambda x, y: cmp(x, y))
sorted_wordlist = wordlist
if sorted_anagram == sorted_wordlist:
found.append(words[0])
del words[0]

print "Anagrams of " + anagram + ": "
while found:
print found[0] + " "
del found[0]

Jul 18 '05 #1

Subscribe Post Reply

1744

Irmen de Jong

Tom Carrick wrote:

Hi,

In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds. I'm not willing to
think it's just python being slow, so I was hoping someone could find
a faster way of doing this.
I like your attitude, not thinking that it's just Python that is slow :-)
Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.
String to list: list("irmen") # --> ['i','r','m','e','n']
Sorted list: sorted("irmen") # --> ['e', 'i', 'm', 'n', 'r']
(the latter works in Python 2.4+)

The words.txt here is just a copy of FreeBSD's /usr/share/dict/words

Anyway, the code:

import string

# Need a function to convert a string to a list to be
# able to use the sort() function
def string2list(s):
l = []
for i in range(0, len(s)):
l.append(s[i])
return l
.... see above... just replace string2list(s) with sorted(s)

words = []
found = []

anagram = raw_input("Find anagrams of word: ")

f = open('words.txt', 'r')
file = f.read()
f.close()
Style: don't use 'file' as a variable name, you're hiding the
builtin 'file' function

words = file.splitlines()
You can obtain this list without reading the file in its entirety,
by using the readlines method of file objects:

words=open("words.txt").readlines()

sorted_anagram = anagram.lower()
sorted_anagram = string2list(anagram)
sorted_anagram.sort(lambda x, y: cmp(x, y))
The lambda is optional and only slows it down :-)
But to get a sorted list of letters, just use sorted(s)
if you're on Python 2.4+
while words:
if len(words[0]) == len(sorted_anagram):
wordlist = string2list(words[0])
wordlist.sort(lambda x, y: cmp(x, y))
sorted_wordlist = wordlist
(same here.. replacing this by sorted(words[0]) probably
will speed it up rather significantly, partly because
it avoids the creation of those temporary lists)
if sorted_anagram == sorted_wordlist:
found.append(words[0])
del words[0]

print "Anagrams of " + anagram + ": "
while found:
print found[0] + " "
del found[0]

print " ".join(found)
Cheers

--Irmen de Jong

Jul 18 '05 #2

Marc 'BlackJack' Rintsch

In <ma**************************************@python.o rg>, Tom Carrick
wrote:

[â€¦] Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.
Use the `list()` builtin on the string and *just* the `sort()` method::

In [2]: characters = list('hello')

In [3]: characters
Out[3]: ['h', 'e', 'l', 'l', 'o']

In [4]: characters.sort()

In [5]: characters
Out[5]: ['e', 'h', 'l', 'l', 'o']
sorted_anagram = anagram.lower()
sorted_anagram = string2list(anagram)
sorted_anagram.sort(lambda x, y: cmp(x, y))
sorted_anagram = list(anagram.lower())
sorted_anagram.sort()
while words:
if len(words[0]) == len(sorted_anagram):
wordlist = string2list(words[0])
wordlist.sort(lambda x, y: cmp(x, y))
sorted_wordlist = wordlist
if sorted_anagram == sorted_wordlist:
found.append(words[0])
del words[0]

And here's the performance issue. Deleting the first element of a list
results in moving all remaining elements one index down. Better iterate
over the words in a for loop::

for word in words:
# use `word` instead of `word[0]` in the loop body.
...

Ciao,
Marc 'BlackJack' Rintsch

Jul 18 '05 #3

Thomas Rast

Tom Carrick <kn****@gmail.com> writes:

In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds.
Indeed, your program can be improved to run about ten times as fast,
which (on my system, with 96274 entries in /usr/share/dict/words) is
below a second.

In general you should try to move the loops into C code, i.e. use
built-in functions instead of long 'for' blocks.

Some comments on your version:
import string

# Need a function to convert a string to a list to be
# able to use the sort() function
def string2list(s): [snipped]
list() achieves the same thing a lot faster.
words = []
You do not need to initialize 'words' here, as you're overwriting it a
few lines afterwards.
found = []

anagram = raw_input("Find anagrams of word: ")

f = open('words.txt', 'r')
file = f.read()
f.close()

words = file.splitlines()
Try to avoid assigning to the names of built-in functions if you can.
Names like 'file', 'list', 'dict', 'map' etc. are often an obvious
choice, but overwriting them means that you don't "just" know what a
later use refers to.
sorted_anagram = anagram.lower()
sorted_anagram = string2list(anagram)
sorted_anagram.sort(lambda x, y: cmp(x, y))
Unless you *really* have to, don't use comparison functions with
sort(), as they slow the operation considerably. In this (as in most)
cases, a plain sorted_anagram.sort() does the trick, and in version
2.4 you can achieve custom sort orders with the optional 'key'
argument. The sorted() built-in also comes in handy here.
while words:
if len(words[0]) == len(sorted_anagram):
wordlist = string2list(words[0])
wordlist.sort(lambda x, y: cmp(x, y))
sorted_wordlist = wordlist
if sorted_anagram == sorted_wordlist:
found.append(words[0])
del words[0]
Avoid this style of looping at all times! Removing the first element
of a list is O(n), so looping through the whole list as above is
O(n**2). In most cases you should use a for loop:

for word in words:
# do something

which is O(n) of course. If you do have to loop destructively, pop()
from the end (which is the default) like so:

while words:
word = words.pop()
# do something

This is also O(n), because removing the *last* element of a list is
O(1) (amortized; I suppose the implementation will occasionally shrink
the underlying array at linear cost).
print "Anagrams of " + anagram + ": "
while found:
print found[0] + " "
del found[0]

I assume you meant not to print a newline between the words, which
'print' does by default. The best solution in that case is "
".join(found).

A better version (2.4+ only):

-- 8< -- 8< --
anagram = raw_input("Find anagrams of word: ")

words = open('words.txt', 'r')

sorted_anagram = sorted(anagram.lower())

found = []

for word in words.read().splitlines():
if len(word) == len(anagram) and sorted(word) == sorted_anagram:
found.append(word)

print "Anagrams of %s: %s" % (anagram, ' '.join(found))
-- >8 -- >8 --

Interestingly, the length comparison makes quite a difference! I
removed it at first, thinking it was unnecessary. Here are some
timings:

* Your original version (for comparison):

$ time echo stop | python2.4 anagram_slow.py
[...]
real 0m9.090s
user 0m8.790s
sys 0m0.013s

* Your version, but with the O(n**2) loop replaced by an O(n) 'for':

$ time echo stop | python2.4 anagram_forloop.py
[...]
real 0m0.221s
user 0m0.134s
sys 0m0.014s

* My version but with the length comparison removed:

$ time echo stop | python2.4 anagram_no_lencmp.py
[...]
real 0m0.408s
user 0m0.353s
sys 0m0.010s

* My version as above:

$ time echo stop | python2.4 anagram_fast.py
[...]
real 0m0.144s
user 0m0.099s
sys 0m0.008s

Hope that helps :-)

- Thomas

--
If you want to reply by mail, substitute my first and last name for
'foo' and 'bar', respectively, and remove '.invalid'.

Jul 18 '05 #4

vincent wehren

"Tom Carrick" <kn****@gmail.com> schrieb im Newsbeitrag
news:ma**************************************@pyth on.org...
| Hi,
|
| In my attempted learning of python, I've decided to recode an old
| anagram solving program I made in C++. The C++ version runs in less
| than a second, while the python takes 30 seconds. I'm not willing to
| think it's just python being slow, so I was hoping someone could find
| a faster way of doing this. Also, I was wondering if there was a more
| builtin, or just nicer way of converting a string to a list (or using
| the sort function on a list) than making a function for it.
|
| The words.txt here is just a copy of FreeBSD's /usr/share/dict/words
|
| Anyway, the code:
|
| import string

You're importing string, but never use it, so you can omit that line.

|
| # Need a function to convert a string to a list to be
| # able to use the sort() function
| def string2list(s):
| l = []
| for i in range(0, len(s)):
| l.append(s[i])
| return l

No need to write your own function. list(s) already does the trick.

|
| words = []
| found = []
|
| anagram = raw_input("Find anagrams of word: ")
|
| f = open('words.txt', 'r')
| file = f.read()
| f.close()
I don't have a copy of words.txt, but here's what I would try
(untested):

anagram = raw_input("Find anagrams of word: ")
sorted_anagram = list(sorted(anagram.lower()))
# If you're Python is pre 2.4 ise
# sorted_anagram = list(anagram.lower())
# sorted_anagram.sort() #--sort list in place
found = []
# assuming "words.txt" contains a word per line
# iterate over the lines of the file

for line in open("/path/to/words.txt"):
word = line[:-1] # Get rid of trailing newline
sorted_word = list(sorted(word.lower()))
if sorted_word == sorted_anagram:
found.append(word)
if found:
print "Anagrams of %s:" % anagram
for w in found:
print w
else:
print "No anagrams for %s" % anagram
--

Vincent Wehren

Jul 18 '05 #5

Marc 'BlackJack' Rintsch

In <42*********************@news.xs4all.nl>, Irmen de Jong wrote:

words = file.splitlines()

You can obtain this list without reading the file in its entirety,
by using the readlines method of file objects:

words=open("words.txt").readlines()

This leaves the newline characters at the end of each line while
`str.splitlines()` removes them.

Ciao,
Marc 'BlackJack' Rintsch

Jul 18 '05 #6

Scott David Daniels

Thomas Rast wrote:

Tom Carrick <kn****@gmail.com> writes:
In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds.

Indeed, your program can be improved to run about ten times as fast, ...
<great stuff>

This problem inspired an "all anagrams" program. Using it I was able
to find the largest anagram group in Shakespeare's first folio in about
the time you originally found anagrams for an individual word.

7: owers = rowse = sower = sowre = swore = woers = worse

====

def words(source):
for line in source:
for word in line.split():
yield word
def all_anagrams(words):
seen = dict()

for word in words:
word = word.lower()
if word not in seen:
dorw = ''.join(sorted(word))
try:
seen[dorw].append(word)
except KeyError:
seen[dorw] = [word]
if word == dorw:
continue
seen[word] = ()
for group in seen.itervalues():
if len(group) > 1:
yield -len(group), sorted(group) # conveniently sortable
def main(sources):
for filename in sources:
dictionary = open(filename, 'r')
print "All anagrams from %s:" % filename
try:
for nsize, group in sorted(all_anagrams(words(dictionary))):
print '%2s: %s' % (-nsize, ' = '.join(group))
finally:
dictionary.close()
print

if __name__ == '__main__':
import sys
main(sys.argv[1:] or ['anagrams.py'])

Jul 18 '05 #7

stelios xanthakis

Scott David Daniels wrote:

if __name__ == '__main__':
import sys
main(sys.argv[1:] or ['anagrams.py'])

This is *exactly* the kind of testcases I'm looking for to test
the soon-to-be-released pyvm. Great! I'll be back with results.
For now, a fast anagrams.py is
--------------------------------------
import sys

WORDS = [ i.rstrip () for i in file ('/usr/share/dict/words') ]

def findana (anagram):
sorted_anagram = sorted(anagram.lower())
len_anagram = len (anagram)
found = [ word for word in WORDS if len(word)==len_anagram and
sorted(word)==sorted_anagram ]
print "Anagrams of %s: %s" % (anagram, ' '.join(found))

for i in sys.argv [1:]:
findana (i)
-----------------------------------------
And timings....

time python anagram.pyc stop step words lots pool eat fast slow lamp
cold door xyzzy
Anagrams of stop: opts post pots spot stop tops
Anagrams of step: pest pets sept step
Anagrams of words: sword words
Anagrams of lots: lost lots slot
Anagrams of pool: loop polo pool
Anagrams of eat: ate eat tea
Anagrams of fast: fast fats
Anagrams of slow: lows owls slow
Anagrams of lamp: lamp palm
Anagrams of cold: clod cold
Anagrams of door: door odor
Anagrams of xyzzy:

real 0m1.491s
user 0m1.390s
sys 0m0.040s

time pyvm anagram.pyc stop step words lots pool eat fast slow lamp cold
door xyzzy
Anagrams of stop: opts post pots spot stop tops
Anagrams of step: pest pets sept step
Anagrams of words: sword words
Anagrams of lots: lost lots slot
Anagrams of pool: loop polo pool
Anagrams of eat: ate eat tea
Anagrams of fast: fast fats
Anagrams of slow: lows owls slow
Anagrams of lamp: lamp palm
Anagrams of cold: clod cold
Anagrams of door: door odor
Anagrams of xyzzy:

real 0m0.923s
user 0m0.760s
sys 0m0.070s
-------
Stelios

Jul 18 '05 #8

Marc 'BlackJack' Rintsch

In <ma**************************************@python.o rg>, Tom Carrick
wrote:

In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds. I'm not willing to
think it's just python being slow, so I was hoping someone could find
a faster way of doing this. Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.

The words.txt here is just a copy of FreeBSD's /usr/share/dict/words

Here's my attempt which builds an anagram dictionary ("sorted word" ->
list of anagrams) for fast lookup of anagrams::

#!/usr/bin/env python2.4
from itertools import imap, ifilter

WORDS = '/usr/share/dict/words'

def make_anagram_map(words):
anagram_map = dict()
for word in imap(lambda w: w.strip().lower(), words):
sorted_word = ''.join(sorted(list(word)))
anagram_map.setdefault(sorted_word, list()).append(word)

return dict(ifilter(lambda x: len(x[1]) > 1, anagram_map.iteritems()))
def main():
words_file = open(WORDS, 'r')
anagram_map = make_anagram_map(words_file)
words_file.close()

while True:
word = raw_input('Find anagrams of word (just enter to end): ')
if not word:
break
try:
print anagram_map[''.join(sorted(list(word.strip().lower())))]
except KeyError:
print 'No anagrams found for %r' % word

# # Print all anagrams sorted by number of anagrams.
# print '\n'.join(map(str, sorted(anagram_map.values(), key=len)))
# print len(anagram_map)
if __name__ == '__main__':
main()
Ciao,
Marc 'BlackJack' Rintsch

Jul 18 '05 #9

Steven Bethard

Marc 'BlackJack' Rintsch wrote:

def make_anagram_map(words):
anagram_map = dict()
for word in imap(lambda w: w.strip().lower(), words):
sorted_word = ''.join(sorted(list(word)))
anagram_map.setdefault(sorted_word, list()).append(word)

return dict(ifilter(lambda x: len(x[1]) > 1, anagram_map.iteritems()))

Or if you're afraid of map and filter like me, you can try:

def make_anagram_map(words):
anagram_map = {}
for word in (w.strip().lower() for w in words):
anagram_map.setdefault(''.join(sorted(word)), []).append(word)
return dict(sortedword_wordlist
for sortedword_wordlist in anagram_map.iteritems()
if len(sortedword_wordlist[1]) > 1)
py> make_anagram_map(['owers', 'pest', 'rowse', 'pets', 'sower', 'step'])
{'epst': ['pest', 'pets', 'step'], 'eorsw': ['owers', 'rowse', 'sower']}

STeVe

Jul 18 '05 #10

by: Paul Mateer | last post by:

Hi, I have been running some queries against a table in a my database and have noted an odd (at least it seems odd to me) performance issue. The table has approximately 5 million rows and...

Microsoft SQL Server

GDI+ performance issue

by: **ham | last post by:

I know that's an old dirty issue; GDI+ almost -the slowest part of the framework - has bothered many developers using it in animations. Even in managed C++ the performance is awful. Now, any dude...

.NET Framework

115

Mystery: static variables & performance

by: Mark Shelor | last post by:

I've encountered a troublesome inconsistency in the C-language Perl extension I've written for CPAN (Digest::SHA). The problem involves the use of a static array within a performance-critical...

C / C++

performance

by: bjarne | last post by:

Willy Denoyette wrote; > ... it > was not the intention of StrousTrup to the achieve the level of efficiency > of C when he invented C++, ... Ahmmm. It was my aim to match the performance...

C# / C Sharp

ASP.NET performance counters not updating

by: James | last post by:

Hi Has anybody had any experience of ASP.Net performance counters not updating. In the performance monitor application when I try to add the groups ASP.NET and ASP.NET Applications the...

ASP.NET

Performance issue of ASP .NET

by: 57R4N63R | last post by:

I'm currently building a website for one of the client. There has been few errors here and there, but just recently the problem is getting worse. Basically the symptoms is that when the user try...

ASP.NET

Indexing issue, inconsistent performance

by: Steph | last post by:

Hi - Trying to chase down a baffling performance issue. Our database has been running very slow lately. So we are performance tuning the database. In doing so, we created a copy of our...

Microsoft SQL Server

Extremely Poor Query Performance - Identical DBs Different Performance

by: Brian Tabios | last post by:

Hello Everyone, I have a very complex performance issue with our production database. Here's the scenario. We have a production webserver server and a development web server. Both are running...

Microsoft SQL Server

Very Poor Performance - Identical DBs but Different Performance

by: BTabios | last post by:

Hello Everyone, I have a very complex performance issue with our production database. Here's the scenario. We have a production webserver server and a development web server. Both are running...

Microsoft SQL Server

Performance issue!!

by: Varangian | last post by:

Hi, I have a performance issue question? which is best (in terms of efficiency and performance, I don't care neatness in code)... building an ArrayList of Object Instances using SqlDataReader...

C# / C Sharp

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Performance issue

Similar topics