473,394 Members | 1,701 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Performance issue

Hi,

In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds. I'm not willing to
think it's just python being slow, so I was hoping someone could find
a faster way of doing this. Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.

The words.txt here is just a copy of FreeBSD's /usr/share/dict/words

Anyway, the code:

import string

# Need a function to convert a string to a list to be
# able to use the sort() function
def string2list(s):
l = []
for i in range(0, len(s)):
l.append(s[i])
return l

words = []
found = []

anagram = raw_input("Find anagrams of word: ")

f = open('words.txt', 'r')
file = f.read()
f.close()

words = file.splitlines()

sorted_anagram = anagram.lower()
sorted_anagram = string2list(anagram)
sorted_anagram.sort(lambda x, y: cmp(x, y))
while words:
if len(words[0]) == len(sorted_anagram):
wordlist = string2list(words[0])
wordlist.sort(lambda x, y: cmp(x, y))
sorted_wordlist = wordlist
if sorted_anagram == sorted_wordlist:
found.append(words[0])
del words[0]

print "Anagrams of " + anagram + ": "
while found:
print found[0] + " "
del found[0]
Jul 18 '05 #1
9 1744
Tom Carrick wrote:
Hi,

In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds. I'm not willing to
think it's just python being slow, so I was hoping someone could find
a faster way of doing this.
I like your attitude, not thinking that it's just Python that is slow :-)
Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.
String to list: list("irmen") # --> ['i','r','m','e','n']
Sorted list: sorted("irmen") # --> ['e', 'i', 'm', 'n', 'r']
(the latter works in Python 2.4+)

The words.txt here is just a copy of FreeBSD's /usr/share/dict/words

Anyway, the code:

import string

# Need a function to convert a string to a list to be
# able to use the sort() function
def string2list(s):
l = []
for i in range(0, len(s)):
l.append(s[i])
return l
.... see above... just replace string2list(s) with sorted(s)

words = []
found = []

anagram = raw_input("Find anagrams of word: ")

f = open('words.txt', 'r')
file = f.read()
f.close()
Style: don't use 'file' as a variable name, you're hiding the
builtin 'file' function

words = file.splitlines()
You can obtain this list without reading the file in its entirety,
by using the readlines method of file objects:

words=open("words.txt").readlines()

sorted_anagram = anagram.lower()
sorted_anagram = string2list(anagram)
sorted_anagram.sort(lambda x, y: cmp(x, y))
The lambda is optional and only slows it down :-)
But to get a sorted list of letters, just use sorted(s)
if you're on Python 2.4+
while words:
if len(words[0]) == len(sorted_anagram):
wordlist = string2list(words[0])
wordlist.sort(lambda x, y: cmp(x, y))
sorted_wordlist = wordlist
(same here.. replacing this by sorted(words[0]) probably
will speed it up rather significantly, partly because
it avoids the creation of those temporary lists)
if sorted_anagram == sorted_wordlist:
found.append(words[0])
del words[0]

print "Anagrams of " + anagram + ": "
while found:
print found[0] + " "
del found[0]


print " ".join(found)
Cheers

--Irmen de Jong
Jul 18 '05 #2
In <ma**************************************@python.o rg>, Tom Carrick
wrote:
[…] Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.
Use the `list()` builtin on the string and *just* the `sort()` method::

In [2]: characters = list('hello')

In [3]: characters
Out[3]: ['h', 'e', 'l', 'l', 'o']

In [4]: characters.sort()

In [5]: characters
Out[5]: ['e', 'h', 'l', 'l', 'o']
sorted_anagram = anagram.lower()
sorted_anagram = string2list(anagram)
sorted_anagram.sort(lambda x, y: cmp(x, y))
sorted_anagram = list(anagram.lower())
sorted_anagram.sort()
while words:
if len(words[0]) == len(sorted_anagram):
wordlist = string2list(words[0])
wordlist.sort(lambda x, y: cmp(x, y))
sorted_wordlist = wordlist
if sorted_anagram == sorted_wordlist:
found.append(words[0])
del words[0]


And here's the performance issue. Deleting the first element of a list
results in moving all remaining elements one index down. Better iterate
over the words in a for loop::

for word in words:
# use `word` instead of `word[0]` in the loop body.
...

Ciao,
Marc 'BlackJack' Rintsch

Jul 18 '05 #3
Tom Carrick <kn****@gmail.com> writes:
In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds.
Indeed, your program can be improved to run about ten times as fast,
which (on my system, with 96274 entries in /usr/share/dict/words) is
below a second.

In general you should try to move the loops into C code, i.e. use
built-in functions instead of long 'for' blocks.

Some comments on your version:
import string

# Need a function to convert a string to a list to be
# able to use the sort() function
def string2list(s): [snipped]
list() achieves the same thing a lot faster.
words = []
You do not need to initialize 'words' here, as you're overwriting it a
few lines afterwards.
found = []

anagram = raw_input("Find anagrams of word: ")

f = open('words.txt', 'r')
file = f.read()
f.close()

words = file.splitlines()
Try to avoid assigning to the names of built-in functions if you can.
Names like 'file', 'list', 'dict', 'map' etc. are often an obvious
choice, but overwriting them means that you don't "just" know what a
later use refers to.
sorted_anagram = anagram.lower()
sorted_anagram = string2list(anagram)
sorted_anagram.sort(lambda x, y: cmp(x, y))
Unless you *really* have to, don't use comparison functions with
sort(), as they slow the operation considerably. In this (as in most)
cases, a plain sorted_anagram.sort() does the trick, and in version
2.4 you can achieve custom sort orders with the optional 'key'
argument. The sorted() built-in also comes in handy here.
while words:
if len(words[0]) == len(sorted_anagram):
wordlist = string2list(words[0])
wordlist.sort(lambda x, y: cmp(x, y))
sorted_wordlist = wordlist
if sorted_anagram == sorted_wordlist:
found.append(words[0])
del words[0]
Avoid this style of looping at all times! Removing the first element
of a list is O(n), so looping through the whole list as above is
O(n**2). In most cases you should use a for loop:

for word in words:
# do something

which is O(n) of course. If you do have to loop destructively, pop()
from the end (which is the default) like so:

while words:
word = words.pop()
# do something

This is also O(n), because removing the *last* element of a list is
O(1) (amortized; I suppose the implementation will occasionally shrink
the underlying array at linear cost).
print "Anagrams of " + anagram + ": "
while found:
print found[0] + " "
del found[0]


I assume you meant not to print a newline between the words, which
'print' does by default. The best solution in that case is "
".join(found).

A better version (2.4+ only):

-- 8< -- 8< --
anagram = raw_input("Find anagrams of word: ")

words = open('words.txt', 'r')

sorted_anagram = sorted(anagram.lower())

found = []

for word in words.read().splitlines():
if len(word) == len(anagram) and sorted(word) == sorted_anagram:
found.append(word)

print "Anagrams of %s: %s" % (anagram, ' '.join(found))
-- >8 -- >8 --

Interestingly, the length comparison makes quite a difference! I
removed it at first, thinking it was unnecessary. Here are some
timings:

* Your original version (for comparison):

$ time echo stop | python2.4 anagram_slow.py
[...]
real 0m9.090s
user 0m8.790s
sys 0m0.013s

* Your version, but with the O(n**2) loop replaced by an O(n) 'for':

$ time echo stop | python2.4 anagram_forloop.py
[...]
real 0m0.221s
user 0m0.134s
sys 0m0.014s

* My version but with the length comparison removed:

$ time echo stop | python2.4 anagram_no_lencmp.py
[...]
real 0m0.408s
user 0m0.353s
sys 0m0.010s

* My version as above:

$ time echo stop | python2.4 anagram_fast.py
[...]
real 0m0.144s
user 0m0.099s
sys 0m0.008s

Hope that helps :-)

- Thomas

--
If you want to reply by mail, substitute my first and last name for
'foo' and 'bar', respectively, and remove '.invalid'.
Jul 18 '05 #4
"Tom Carrick" <kn****@gmail.com> schrieb im Newsbeitrag
news:ma**************************************@pyth on.org...
| Hi,
|
| In my attempted learning of python, I've decided to recode an old
| anagram solving program I made in C++. The C++ version runs in less
| than a second, while the python takes 30 seconds. I'm not willing to
| think it's just python being slow, so I was hoping someone could find
| a faster way of doing this. Also, I was wondering if there was a more
| builtin, or just nicer way of converting a string to a list (or using
| the sort function on a list) than making a function for it.
|
| The words.txt here is just a copy of FreeBSD's /usr/share/dict/words
|
| Anyway, the code:
|
| import string

You're importing string, but never use it, so you can omit that line.

|
| # Need a function to convert a string to a list to be
| # able to use the sort() function
| def string2list(s):
| l = []
| for i in range(0, len(s)):
| l.append(s[i])
| return l

No need to write your own function. list(s) already does the trick.

|
| words = []
| found = []
|
| anagram = raw_input("Find anagrams of word: ")
|
| f = open('words.txt', 'r')
| file = f.read()
| f.close()
I don't have a copy of words.txt, but here's what I would try
(untested):

anagram = raw_input("Find anagrams of word: ")
sorted_anagram = list(sorted(anagram.lower()))
# If you're Python is pre 2.4 ise
# sorted_anagram = list(anagram.lower())
# sorted_anagram.sort() #--sort list in place
found = []
# assuming "words.txt" contains a word per line
# iterate over the lines of the file

for line in open("/path/to/words.txt"):
word = line[:-1] # Get rid of trailing newline
sorted_word = list(sorted(word.lower()))
if sorted_word == sorted_anagram:
found.append(word)
if found:
print "Anagrams of %s:" % anagram
for w in found:
print w
else:
print "No anagrams for %s" % anagram
--

Vincent Wehren


Jul 18 '05 #5
In <42*********************@news.xs4all.nl>, Irmen de Jong wrote:
words = file.splitlines()


You can obtain this list without reading the file in its entirety,
by using the readlines method of file objects:

words=open("words.txt").readlines()


This leaves the newline characters at the end of each line while
`str.splitlines()` removes them.

Ciao,
Marc 'BlackJack' Rintsch
Jul 18 '05 #6
Thomas Rast wrote:
Tom Carrick <kn****@gmail.com> writes:
In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds.


Indeed, your program can be improved to run about ten times as fast, ...
<great stuff>


This problem inspired an "all anagrams" program. Using it I was able
to find the largest anagram group in Shakespeare's first folio in about
the time you originally found anagrams for an individual word.

7: owers = rowse = sower = sowre = swore = woers = worse

====

def words(source):
for line in source:
for word in line.split():
yield word
def all_anagrams(words):
seen = dict()

for word in words:
word = word.lower()
if word not in seen:
dorw = ''.join(sorted(word))
try:
seen[dorw].append(word)
except KeyError:
seen[dorw] = [word]
if word == dorw:
continue
seen[word] = ()
for group in seen.itervalues():
if len(group) > 1:
yield -len(group), sorted(group) # conveniently sortable
def main(sources):
for filename in sources:
dictionary = open(filename, 'r')
print "All anagrams from %s:" % filename
try:
for nsize, group in sorted(all_anagrams(words(dictionary))):
print '%2s: %s' % (-nsize, ' = '.join(group))
finally:
dictionary.close()
print

if __name__ == '__main__':
import sys
main(sys.argv[1:] or ['anagrams.py'])
Jul 18 '05 #7
Scott David Daniels wrote:
if __name__ == '__main__':
import sys
main(sys.argv[1:] or ['anagrams.py'])


This is *exactly* the kind of testcases I'm looking for to test
the soon-to-be-released pyvm. Great! I'll be back with results.
For now, a fast anagrams.py is
--------------------------------------
import sys

WORDS = [ i.rstrip () for i in file ('/usr/share/dict/words') ]

def findana (anagram):
sorted_anagram = sorted(anagram.lower())
len_anagram = len (anagram)
found = [ word for word in WORDS if len(word)==len_anagram and
sorted(word)==sorted_anagram ]
print "Anagrams of %s: %s" % (anagram, ' '.join(found))

for i in sys.argv [1:]:
findana (i)
-----------------------------------------
And timings....

time python anagram.pyc stop step words lots pool eat fast slow lamp
cold door xyzzy
Anagrams of stop: opts post pots spot stop tops
Anagrams of step: pest pets sept step
Anagrams of words: sword words
Anagrams of lots: lost lots slot
Anagrams of pool: loop polo pool
Anagrams of eat: ate eat tea
Anagrams of fast: fast fats
Anagrams of slow: lows owls slow
Anagrams of lamp: lamp palm
Anagrams of cold: clod cold
Anagrams of door: door odor
Anagrams of xyzzy:

real 0m1.491s
user 0m1.390s
sys 0m0.040s

time pyvm anagram.pyc stop step words lots pool eat fast slow lamp cold
door xyzzy
Anagrams of stop: opts post pots spot stop tops
Anagrams of step: pest pets sept step
Anagrams of words: sword words
Anagrams of lots: lost lots slot
Anagrams of pool: loop polo pool
Anagrams of eat: ate eat tea
Anagrams of fast: fast fats
Anagrams of slow: lows owls slow
Anagrams of lamp: lamp palm
Anagrams of cold: clod cold
Anagrams of door: door odor
Anagrams of xyzzy:

real 0m0.923s
user 0m0.760s
sys 0m0.070s
-------
Stelios

Jul 18 '05 #8
In <ma**************************************@python.o rg>, Tom Carrick
wrote:
In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds. I'm not willing to
think it's just python being slow, so I was hoping someone could find
a faster way of doing this. Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.

The words.txt here is just a copy of FreeBSD's /usr/share/dict/words


Here's my attempt which builds an anagram dictionary ("sorted word" ->
list of anagrams) for fast lookup of anagrams::

#!/usr/bin/env python2.4
from itertools import imap, ifilter

WORDS = '/usr/share/dict/words'

def make_anagram_map(words):
anagram_map = dict()
for word in imap(lambda w: w.strip().lower(), words):
sorted_word = ''.join(sorted(list(word)))
anagram_map.setdefault(sorted_word, list()).append(word)

return dict(ifilter(lambda x: len(x[1]) > 1, anagram_map.iteritems()))
def main():
words_file = open(WORDS, 'r')
anagram_map = make_anagram_map(words_file)
words_file.close()

while True:
word = raw_input('Find anagrams of word (just enter to end): ')
if not word:
break
try:
print anagram_map[''.join(sorted(list(word.strip().lower())))]
except KeyError:
print 'No anagrams found for %r' % word

# # Print all anagrams sorted by number of anagrams.
# print '\n'.join(map(str, sorted(anagram_map.values(), key=len)))
# print len(anagram_map)
if __name__ == '__main__':
main()
Ciao,
Marc 'BlackJack' Rintsch
Jul 18 '05 #9
Marc 'BlackJack' Rintsch wrote:
def make_anagram_map(words):
anagram_map = dict()
for word in imap(lambda w: w.strip().lower(), words):
sorted_word = ''.join(sorted(list(word)))
anagram_map.setdefault(sorted_word, list()).append(word)

return dict(ifilter(lambda x: len(x[1]) > 1, anagram_map.iteritems()))


Or if you're afraid of map and filter like me, you can try:

def make_anagram_map(words):
anagram_map = {}
for word in (w.strip().lower() for w in words):
anagram_map.setdefault(''.join(sorted(word)), []).append(word)
return dict(sortedword_wordlist
for sortedword_wordlist in anagram_map.iteritems()
if len(sortedword_wordlist[1]) > 1)
py> make_anagram_map(['owers', 'pest', 'rowse', 'pets', 'sower', 'step'])
{'epst': ['pest', 'pets', 'step'], 'eorsw': ['owers', 'rowse', 'sower']}

STeVe
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Paul Mateer | last post by:
Hi, I have been running some queries against a table in a my database and have noted an odd (at least it seems odd to me) performance issue. The table has approximately 5 million rows and...
10
by: **ham | last post by:
I know that's an old dirty issue; GDI+ almost -the slowest part of the framework - has bothered many developers using it in animations. Even in managed C++ the performance is awful. Now, any dude...
115
by: Mark Shelor | last post by:
I've encountered a troublesome inconsistency in the C-language Perl extension I've written for CPAN (Digest::SHA). The problem involves the use of a static array within a performance-critical...
13
by: bjarne | last post by:
Willy Denoyette wrote; > ... it > was not the intention of StrousTrup to the achieve the level of efficiency > of C when he invented C++, ... Ahmmm. It was my aim to match the performance...
7
by: James | last post by:
Hi Has anybody had any experience of ASP.Net performance counters not updating. In the performance monitor application when I try to add the groups ASP.NET and ASP.NET Applications the...
17
by: 57R4N63R | last post by:
I'm currently building a website for one of the client. There has been few errors here and there, but just recently the problem is getting worse. Basically the symptoms is that when the user try...
4
by: Steph | last post by:
Hi - Trying to chase down a baffling performance issue. Our database has been running very slow lately. So we are performance tuning the database. In doing so, we created a copy of our...
2
by: Brian Tabios | last post by:
Hello Everyone, I have a very complex performance issue with our production database. Here's the scenario. We have a production webserver server and a development web server. Both are running...
2
by: BTabios | last post by:
Hello Everyone, I have a very complex performance issue with our production database. Here's the scenario. We have a production webserver server and a development web server. Both are running...
5
by: Varangian | last post by:
Hi, I have a performance issue question? which is best (in terms of efficiency and performance, I don't care neatness in code)... building an ArrayList of Object Instances using SqlDataReader...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.