473,770 Members | 4,552 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Performance issue

Hi,

In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds. I'm not willing to
think it's just python being slow, so I was hoping someone could find
a faster way of doing this. Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.

The words.txt here is just a copy of FreeBSD's /usr/share/dict/words

Anyway, the code:

import string

# Need a function to convert a string to a list to be
# able to use the sort() function
def string2list(s):
l = []
for i in range(0, len(s)):
l.append(s[i])
return l

words = []
found = []

anagram = raw_input("Find anagrams of word: ")

f = open('words.txt ', 'r')
file = f.read()
f.close()

words = file.splitlines ()

sorted_anagram = anagram.lower()
sorted_anagram = string2list(ana gram)
sorted_anagram. sort(lambda x, y: cmp(x, y))
while words:
if len(words[0]) == len(sorted_anag ram):
wordlist = string2list(wor ds[0])
wordlist.sort(l ambda x, y: cmp(x, y))
sorted_wordlist = wordlist
if sorted_anagram == sorted_wordlist :
found.append(wo rds[0])
del words[0]

print "Anagrams of " + anagram + ": "
while found:
print found[0] + " "
del found[0]
Jul 18 '05 #1
9 1764
Tom Carrick wrote:
Hi,

In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds. I'm not willing to
think it's just python being slow, so I was hoping someone could find
a faster way of doing this.
I like your attitude, not thinking that it's just Python that is slow :-)
Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.
String to list: list("irmen") # --> ['i','r','m','e' ,'n']
Sorted list: sorted("irmen") # --> ['e', 'i', 'm', 'n', 'r']
(the latter works in Python 2.4+)

The words.txt here is just a copy of FreeBSD's /usr/share/dict/words

Anyway, the code:

import string

# Need a function to convert a string to a list to be
# able to use the sort() function
def string2list(s):
l = []
for i in range(0, len(s)):
l.append(s[i])
return l
.... see above... just replace string2list(s) with sorted(s)

words = []
found = []

anagram = raw_input("Find anagrams of word: ")

f = open('words.txt ', 'r')
file = f.read()
f.close()
Style: don't use 'file' as a variable name, you're hiding the
builtin 'file' function

words = file.splitlines ()
You can obtain this list without reading the file in its entirety,
by using the readlines method of file objects:

words=open("wor ds.txt").readli nes()

sorted_anagram = anagram.lower()
sorted_anagram = string2list(ana gram)
sorted_anagram. sort(lambda x, y: cmp(x, y))
The lambda is optional and only slows it down :-)
But to get a sorted list of letters, just use sorted(s)
if you're on Python 2.4+
while words:
if len(words[0]) == len(sorted_anag ram):
wordlist = string2list(wor ds[0])
wordlist.sort(l ambda x, y: cmp(x, y))
sorted_wordlist = wordlist
(same here.. replacing this by sorted(words[0]) probably
will speed it up rather significantly, partly because
it avoids the creation of those temporary lists)
if sorted_anagram == sorted_wordlist :
found.append(wo rds[0])
del words[0]

print "Anagrams of " + anagram + ": "
while found:
print found[0] + " "
del found[0]


print " ".join(foun d)
Cheers

--Irmen de Jong
Jul 18 '05 #2
In <ma************ *************** ***********@pyt hon.org>, Tom Carrick
wrote:
[…] Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.
Use the `list()` builtin on the string and *just* the `sort()` method::

In [2]: characters = list('hello')

In [3]: characters
Out[3]: ['h', 'e', 'l', 'l', 'o']

In [4]: characters.sort ()

In [5]: characters
Out[5]: ['e', 'h', 'l', 'l', 'o']
sorted_anagram = anagram.lower()
sorted_anagram = string2list(ana gram)
sorted_anagram. sort(lambda x, y: cmp(x, y))
sorted_anagram = list(anagram.lo wer())
sorted_anagram. sort()
while words:
if len(words[0]) == len(sorted_anag ram):
wordlist = string2list(wor ds[0])
wordlist.sort(l ambda x, y: cmp(x, y))
sorted_wordlist = wordlist
if sorted_anagram == sorted_wordlist :
found.append(wo rds[0])
del words[0]


And here's the performance issue. Deleting the first element of a list
results in moving all remaining elements one index down. Better iterate
over the words in a for loop::

for word in words:
# use `word` instead of `word[0]` in the loop body.
...

Ciao,
Marc 'BlackJack' Rintsch

Jul 18 '05 #3
Tom Carrick <kn****@gmail.c om> writes:
In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds.
Indeed, your program can be improved to run about ten times as fast,
which (on my system, with 96274 entries in /usr/share/dict/words) is
below a second.

In general you should try to move the loops into C code, i.e. use
built-in functions instead of long 'for' blocks.

Some comments on your version:
import string

# Need a function to convert a string to a list to be
# able to use the sort() function
def string2list(s): [snipped]
list() achieves the same thing a lot faster.
words = []
You do not need to initialize 'words' here, as you're overwriting it a
few lines afterwards.
found = []

anagram = raw_input("Find anagrams of word: ")

f = open('words.txt ', 'r')
file = f.read()
f.close()

words = file.splitlines ()
Try to avoid assigning to the names of built-in functions if you can.
Names like 'file', 'list', 'dict', 'map' etc. are often an obvious
choice, but overwriting them means that you don't "just" know what a
later use refers to.
sorted_anagram = anagram.lower()
sorted_anagram = string2list(ana gram)
sorted_anagram. sort(lambda x, y: cmp(x, y))
Unless you *really* have to, don't use comparison functions with
sort(), as they slow the operation considerably. In this (as in most)
cases, a plain sorted_anagram. sort() does the trick, and in version
2.4 you can achieve custom sort orders with the optional 'key'
argument. The sorted() built-in also comes in handy here.
while words:
if len(words[0]) == len(sorted_anag ram):
wordlist = string2list(wor ds[0])
wordlist.sort(l ambda x, y: cmp(x, y))
sorted_wordlist = wordlist
if sorted_anagram == sorted_wordlist :
found.append(wo rds[0])
del words[0]
Avoid this style of looping at all times! Removing the first element
of a list is O(n), so looping through the whole list as above is
O(n**2). In most cases you should use a for loop:

for word in words:
# do something

which is O(n) of course. If you do have to loop destructively, pop()
from the end (which is the default) like so:

while words:
word = words.pop()
# do something

This is also O(n), because removing the *last* element of a list is
O(1) (amortized; I suppose the implementation will occasionally shrink
the underlying array at linear cost).
print "Anagrams of " + anagram + ": "
while found:
print found[0] + " "
del found[0]


I assume you meant not to print a newline between the words, which
'print' does by default. The best solution in that case is "
".join(foun d).

A better version (2.4+ only):

-- 8< -- 8< --
anagram = raw_input("Find anagrams of word: ")

words = open('words.txt ', 'r')

sorted_anagram = sorted(anagram. lower())

found = []

for word in words.read().sp litlines():
if len(word) == len(anagram) and sorted(word) == sorted_anagram:
found.append(wo rd)

print "Anagrams of %s: %s" % (anagram, ' '.join(found))
-- >8 -- >8 --

Interestingly, the length comparison makes quite a difference! I
removed it at first, thinking it was unnecessary. Here are some
timings:

* Your original version (for comparison):

$ time echo stop | python2.4 anagram_slow.py
[...]
real 0m9.090s
user 0m8.790s
sys 0m0.013s

* Your version, but with the O(n**2) loop replaced by an O(n) 'for':

$ time echo stop | python2.4 anagram_forloop .py
[...]
real 0m0.221s
user 0m0.134s
sys 0m0.014s

* My version but with the length comparison removed:

$ time echo stop | python2.4 anagram_no_lenc mp.py
[...]
real 0m0.408s
user 0m0.353s
sys 0m0.010s

* My version as above:

$ time echo stop | python2.4 anagram_fast.py
[...]
real 0m0.144s
user 0m0.099s
sys 0m0.008s

Hope that helps :-)

- Thomas

--
If you want to reply by mail, substitute my first and last name for
'foo' and 'bar', respectively, and remove '.invalid'.
Jul 18 '05 #4
"Tom Carrick" <kn****@gmail.c om> schrieb im Newsbeitrag
news:ma******** *************** *************** @python.org...
| Hi,
|
| In my attempted learning of python, I've decided to recode an old
| anagram solving program I made in C++. The C++ version runs in less
| than a second, while the python takes 30 seconds. I'm not willing to
| think it's just python being slow, so I was hoping someone could find
| a faster way of doing this. Also, I was wondering if there was a more
| builtin, or just nicer way of converting a string to a list (or using
| the sort function on a list) than making a function for it.
|
| The words.txt here is just a copy of FreeBSD's /usr/share/dict/words
|
| Anyway, the code:
|
| import string

You're importing string, but never use it, so you can omit that line.

|
| # Need a function to convert a string to a list to be
| # able to use the sort() function
| def string2list(s):
| l = []
| for i in range(0, len(s)):
| l.append(s[i])
| return l

No need to write your own function. list(s) already does the trick.

|
| words = []
| found = []
|
| anagram = raw_input("Find anagrams of word: ")
|
| f = open('words.txt ', 'r')
| file = f.read()
| f.close()
I don't have a copy of words.txt, but here's what I would try
(untested):

anagram = raw_input("Find anagrams of word: ")
sorted_anagram = list(sorted(ana gram.lower()))
# If you're Python is pre 2.4 ise
# sorted_anagram = list(anagram.lo wer())
# sorted_anagram. sort() #--sort list in place
found = []
# assuming "words.txt" contains a word per line
# iterate over the lines of the file

for line in open("/path/to/words.txt"):
word = line[:-1] # Get rid of trailing newline
sorted_word = list(sorted(wor d.lower()))
if sorted_word == sorted_anagram:
found.append(wo rd)
if found:
print "Anagrams of %s:" % anagram
for w in found:
print w
else:
print "No anagrams for %s" % anagram
--

Vincent Wehren


Jul 18 '05 #5
In <42************ *********@news. xs4all.nl>, Irmen de Jong wrote:
words = file.splitlines ()


You can obtain this list without reading the file in its entirety,
by using the readlines method of file objects:

words=open("wor ds.txt").readli nes()


This leaves the newline characters at the end of each line while
`str.splitlines ()` removes them.

Ciao,
Marc 'BlackJack' Rintsch
Jul 18 '05 #6
Thomas Rast wrote:
Tom Carrick <kn****@gmail.c om> writes:
In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds.


Indeed, your program can be improved to run about ten times as fast, ...
<great stuff>


This problem inspired an "all anagrams" program. Using it I was able
to find the largest anagram group in Shakespeare's first folio in about
the time you originally found anagrams for an individual word.

7: owers = rowse = sower = sowre = swore = woers = worse

====

def words(source):
for line in source:
for word in line.split():
yield word
def all_anagrams(wo rds):
seen = dict()

for word in words:
word = word.lower()
if word not in seen:
dorw = ''.join(sorted( word))
try:
seen[dorw].append(word)
except KeyError:
seen[dorw] = [word]
if word == dorw:
continue
seen[word] = ()
for group in seen.itervalues ():
if len(group) > 1:
yield -len(group), sorted(group) # conveniently sortable
def main(sources):
for filename in sources:
dictionary = open(filename, 'r')
print "All anagrams from %s:" % filename
try:
for nsize, group in sorted(all_anag rams(words(dict ionary))):
print '%2s: %s' % (-nsize, ' = '.join(group))
finally:
dictionary.clos e()
print

if __name__ == '__main__':
import sys
main(sys.argv[1:] or ['anagrams.py'])
Jul 18 '05 #7
Scott David Daniels wrote:
if __name__ == '__main__':
import sys
main(sys.argv[1:] or ['anagrams.py'])


This is *exactly* the kind of testcases I'm looking for to test
the soon-to-be-released pyvm. Great! I'll be back with results.
For now, a fast anagrams.py is
--------------------------------------
import sys

WORDS = [ i.rstrip () for i in file ('/usr/share/dict/words') ]

def findana (anagram):
sorted_anagram = sorted(anagram. lower())
len_anagram = len (anagram)
found = [ word for word in WORDS if len(word)==len_ anagram and
sorted(word)==s orted_anagram ]
print "Anagrams of %s: %s" % (anagram, ' '.join(found))

for i in sys.argv [1:]:
findana (i)
-----------------------------------------
And timings....

time python anagram.pyc stop step words lots pool eat fast slow lamp
cold door xyzzy
Anagrams of stop: opts post pots spot stop tops
Anagrams of step: pest pets sept step
Anagrams of words: sword words
Anagrams of lots: lost lots slot
Anagrams of pool: loop polo pool
Anagrams of eat: ate eat tea
Anagrams of fast: fast fats
Anagrams of slow: lows owls slow
Anagrams of lamp: lamp palm
Anagrams of cold: clod cold
Anagrams of door: door odor
Anagrams of xyzzy:

real 0m1.491s
user 0m1.390s
sys 0m0.040s

time pyvm anagram.pyc stop step words lots pool eat fast slow lamp cold
door xyzzy
Anagrams of stop: opts post pots spot stop tops
Anagrams of step: pest pets sept step
Anagrams of words: sword words
Anagrams of lots: lost lots slot
Anagrams of pool: loop polo pool
Anagrams of eat: ate eat tea
Anagrams of fast: fast fats
Anagrams of slow: lows owls slow
Anagrams of lamp: lamp palm
Anagrams of cold: clod cold
Anagrams of door: door odor
Anagrams of xyzzy:

real 0m0.923s
user 0m0.760s
sys 0m0.070s
-------
Stelios

Jul 18 '05 #8
In <ma************ *************** ***********@pyt hon.org>, Tom Carrick
wrote:
In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds. I'm not willing to
think it's just python being slow, so I was hoping someone could find
a faster way of doing this. Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.

The words.txt here is just a copy of FreeBSD's /usr/share/dict/words


Here's my attempt which builds an anagram dictionary ("sorted word" ->
list of anagrams) for fast lookup of anagrams::

#!/usr/bin/env python2.4
from itertools import imap, ifilter

WORDS = '/usr/share/dict/words'

def make_anagram_ma p(words):
anagram_map = dict()
for word in imap(lambda w: w.strip().lower (), words):
sorted_word = ''.join(sorted( list(word)))
anagram_map.set default(sorted_ word, list()).append( word)

return dict(ifilter(la mbda x: len(x[1]) > 1, anagram_map.ite ritems()))
def main():
words_file = open(WORDS, 'r')
anagram_map = make_anagram_ma p(words_file)
words_file.clos e()

while True:
word = raw_input('Find anagrams of word (just enter to end): ')
if not word:
break
try:
print anagram_map[''.join(sorted( list(word.strip ().lower())))]
except KeyError:
print 'No anagrams found for %r' % word

# # Print all anagrams sorted by number of anagrams.
# print '\n'.join(map(s tr, sorted(anagram_ map.values(), key=len)))
# print len(anagram_map )
if __name__ == '__main__':
main()
Ciao,
Marc 'BlackJack' Rintsch
Jul 18 '05 #9
Marc 'BlackJack' Rintsch wrote:
def make_anagram_ma p(words):
anagram_map = dict()
for word in imap(lambda w: w.strip().lower (), words):
sorted_word = ''.join(sorted( list(word)))
anagram_map.set default(sorted_ word, list()).append( word)

return dict(ifilter(la mbda x: len(x[1]) > 1, anagram_map.ite ritems()))


Or if you're afraid of map and filter like me, you can try:

def make_anagram_ma p(words):
anagram_map = {}
for word in (w.strip().lowe r() for w in words):
anagram_map.set default(''.join (sorted(word)), []).append(word)
return dict(sortedword _wordlist
for sortedword_word list in anagram_map.ite ritems()
if len(sortedword_ wordlist[1]) > 1)
py> make_anagram_ma p(['owers', 'pest', 'rowse', 'pets', 'sower', 'step'])
{'epst': ['pest', 'pets', 'step'], 'eorsw': ['owers', 'rowse', 'sower']}

STeVe
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
5225
by: Paul Mateer | last post by:
Hi, I have been running some queries against a table in a my database and have noted an odd (at least it seems odd to me) performance issue. The table has approximately 5 million rows and includes the following columns: DocID (INTEGER, PRIMARY KEY, CLUSTERED) IsRecord (INTEGER, NONCLUSTERED)
10
2557
by: **ham | last post by:
I know that's an old dirty issue; GDI+ almost -the slowest part of the framework - has bothered many developers using it in animations. Even in managed C++ the performance is awful. Now, any dude out there does know any thing about this issue in VS 2005 + ..NET 2.0 ? Has Microsoft solved this performance problem, or we will have to again stick to that DX for simple animations in our applications? ( and since Microsoft doesn't support DDraw...
115
7642
by: Mark Shelor | last post by:
I've encountered a troublesome inconsistency in the C-language Perl extension I've written for CPAN (Digest::SHA). The problem involves the use of a static array within a performance-critical transform function. When compiling under gcc on my big-endian PowerPC (Mac OS X), declaring this array as "static" DECREASES the transform throughput by around 5%. However, declaring it as "static" on gcc/Linux/Intel INCREASES the throughput by...
13
2766
by: bjarne | last post by:
Willy Denoyette wrote; > ... it > was not the intention of StrousTrup to the achieve the level of efficiency > of C when he invented C++, ... Ahmmm. It was my aim to match the performance of C and I achieved that aim very early on. See, for example "The Design and Evolution of C++". -- Bjarne Stroustrup; http://www.research.att.com/~bs
7
4808
by: James | last post by:
Hi Has anybody had any experience of ASP.Net performance counters not updating. In the performance monitor application when I try to add the groups ASP.NET and ASP.NET Applications the counters all show 0. In addition, ASP.NET Applications does not show any of the ASP.NET instances that are running on the computer. The other counters for example % processor time work fine. The problem machine is a test server running Win2003. ...
17
2068
by: 57R4N63R | last post by:
I'm currently building a website for one of the client. There has been few errors here and there, but just recently the problem is getting worse. Basically the symptoms is that when the user try to access the page, it takes really long time to load. However, after up to 1 hour, the website will run fine again as normal. This issue has been there with the site. I usually just ask the system admin to restart the IIS Service. However, the...
4
3271
by: Steph | last post by:
Hi - Trying to chase down a baffling performance issue. Our database has been running very slow lately. So we are performance tuning the database. In doing so, we created a copy of our production database. In that database, I changed one clustered index on a table to try to improve performance. I ran one query - saw a slight improvement - but saw "lazy spool" in the execution plan. I tried to change it back to the original index by...
2
2424
by: Brian Tabios | last post by:
Hello Everyone, I have a very complex performance issue with our production database. Here's the scenario. We have a production webserver server and a development web server. Both are running SQL Server 2000. I encounted various performance issues with the production server with a particular query. It would take approximately 22 seconds to return 100 rows, thats about 0.22 seconds per row. Note: I ran the query in single user mode. So...
2
1566
by: BTabios | last post by:
Hello Everyone, I have a very complex performance issue with our production database. Here's the scenario. We have a production webserver server and a development web server. Both are running SQL Server 2000. I encounted various performance issues with the production server with a particular query. It would take approximately 22 seconds to return 100 rows, thats about 0.22 seconds per row. Note: I ran the query in single user mode. So...
5
2031
by: Varangian | last post by:
Hi, I have a performance issue question? which is best (in terms of efficiency and performance, I don't care neatness in code)... building an ArrayList of Object Instances using SqlDataReader OR using SqlDataAdapter to Fill a DataSet or DataTable ? Thanks!
0
9602
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10071
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10017
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8905
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7431
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6690
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5326
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3589
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2832
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.