473,699 Members | 2,501 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

optimization pointers?

Would anyone care to offer pointers on how the following code
might be optimized? Or alternatives? ...

---
def lz(string):
""" Return the Lempel-Ziv complexity (LZ78) of string. """

vocabulary = []
word = ''
for char in string:
word += char
if word not in vocabulary:
vocabulary.appe nd(word)
word = ''
return len(vocabulary)
---

On my 2 GHz system running PythonWin, the above code averages
about 1.3 hrs to process a 1 MB string. I'd hoped to compute
LZ78 for multi-GB files, but it seems infeasible. (?)

Thanks for any feedback.
--
r.e.s.
Jul 18 '05 #1
7 1619
In article <Bz************ *****@newsread1 .news.pas.earth link.net>,
"r.e.s." <r.*@XXmindspri ng.com> wrote:
Would anyone care to offer pointers on how the following code
might be optimized? Or alternatives? ...

---
def lz(string):
""" Return the Lempel-Ziv complexity (LZ78) of string. """

vocabulary = []
word = ''
for char in string:
word += char
if word not in vocabulary:
vocabulary.appe nd(word)
word = ''
return len(vocabulary)
---

On my 2 GHz system running PythonWin, the above code averages
about 1.3 hrs to process a 1 MB string. I'd hoped to compute
LZ78 for multi-GB files, but it seems infeasible. (?)


One simple improvement is to get rid of

if an_item not in a_list

which does a linear search over the list. A dictionary is faster.

Without a full benchmark, it seems that appending
def lz_complexity (s):
voc = {}
word = ''
for c in s:
word += c
if not voc.get (word, False):
voc[word] = True
word = ''
return len (voc.keys())

if __name__ == '__main__':
import time
text = file ('lzc.py', 'rt').read()
t0 = time.clock()
t0 = time.clock() - t0

c = lz (text)
t1 = time.clock()
c = lz (text)
t1 = time.clock() - t1

c = lz_complexity (text)
t2 = time.clock()
c = lz_complexity (text)
t2 = time.clock() - t2

print 'T1:', t1 - t0
print 'T2:', t2 - t0
will show about a x6 speedup.

Avoiding

a_string += another_string

is a traditional speedup, but it seems you'll be needing the
string form for every character processed, so it may not buy
much in this case.

Regards. Mel.
Jul 18 '05 #2
"r.e.s." <r.*@XXmindspri ng.com> wrote:
Would anyone care to offer pointers on how the following code
might be optimized? Or alternatives? ...

---
def lz(string):
""" Return the Lempel-Ziv complexity (LZ78) of string. """

vocabulary = []
word = ''
for char in string:
word += char
if word not in vocabulary:
vocabulary.appe nd(word)
word = ''
return len(vocabulary)
---

On my 2 GHz system running PythonWin, the above code averages
about 1.3 hrs to process a 1 MB string. I'd hoped to compute
LZ78 for multi-GB files, but it seems infeasible. (?)


On a 300 mhz celeron the script below takes about ten or twenty
seconds, the original lz is commented out as it never seems to finish
for longer strings.

Anton

p.s. the file I used is here:
http://sailor.gutenberg.org/etext97/1donq10.zip

import sets

def lz(string):
""" Return the Lempel-Ziv complexity (LZ78) of string. """

vocabulary = []
word = ''
for char in string:
word += char
if word not in vocabulary:
vocabulary.appe nd(word)
word = ''
return len(vocabulary)

def lzx(string):
""" Return the Lempel-Ziv complexity (LZ78) of string. """

vocabulary = sets.Set()
word = ''
for char in string:
word += char
if word not in vocabulary:
vocabulary.add( word)
word = ''
return len(vocabulary)

def test():
fn = '1donq10.txt'
s = file(fn,'rb').r ead(1000000)
print lzx(s)
#print lz(s)
if __name__=='__ma in__':
test()

Jul 18 '05 #3
"Mel Wilson" <mw*****@the-wire.com> wrote ...
One simple improvement is to get rid of

if an_item not in a_list

which does a linear search over the list.
A dictionary is faster.

Without a full benchmark, it seems that [...] will show about a x6 speedup.


I really appreciate the suggestions -- both yours
and Anton's.

After making the changes you indicated, the code
runs in about 1/500th the time of the original
(i.e. about 2 sec per MB for strings in RAM). The
sets idea also speeds things up tremendously, but
not as much -- it takes about 70% longer than the
dictionary approach.

Thanks again.
--
r.e.s.

Jul 18 '05 #4
"Mel Wilson" <mw*****@the-wire.com> wrote in message
news:rzO2/ks/KX*******@the-wire.com...

One simple improvement is to get rid of

if an_item not in a_list

which does a linear search over the list. A dictionary is faster.

Without a full benchmark, it seems that appending
def lz_complexity (s):
voc = {}
word = ''
for c in s:
word += c
if not voc.get (word, False):
voc[word] = True
word = ''
return len (voc.keys())

if __name__ == '__main__':
import time
text = file ('lzc.py', 'rt').read()
t0 = time.clock()
t0 = time.clock() - t0

c = lz (text)
t1 = time.clock()
c = lz (text)
t1 = time.clock() - t1

c = lz_complexity (text)
t2 = time.clock()
c = lz_complexity (text)
t2 = time.clock() - t2

print 'T1:', t1 - t0
print 'T2:', t2 - t0
will show about a x6 speedup.

Avoiding

a_string += another_string

is a traditional speedup, but it seems you'll be needing the
string form for every character processed, so it may not buy
much in this case.

Regards. Mel.


Heres my contribution, removing the string append in favour of slices. Buys
a little speed thanks to xrange.

def lz_comp_mine(te xt):
voc = {}
st = 0
for cur in xrange(1,len(te xt)+1):
if not voc.get(text[st:cur], False):
voc[text[st:cur]] = True
st = cur
return len(voc)

Anthony McDonald
Jul 18 '05 #5
In article <3f************ **********@news .club-internet.fr>,
"Anthony McDonald" <tonym1972(at)c lub-internet(in)fr> wrote:
Heres my contribution, removing the string append in favour of slices. Buys
a little speed thanks to xrange.

def lz_comp_mine(te xt):
voc = {}
st = 0
for cur in xrange(1,len(te xt)+1):
if not voc.get(text[st:cur], False):
voc[text[st:cur]] = True
st = cur
return len(voc)


Nice touch! I tried slices and took a huge performance
hit (almost 3x the list version) but I didn't use `for ... in
xrange ...`. It must have all been in the while-loop test
and index incrementing.

Regards. Mel.
Jul 18 '05 #6
"Mel Wilson" <mw*****@the-wire.com> wrote in message
news:xIf2/ks/Kv*******@the-wire.com...

Nice touch! I tried slices and took a huge performance
hit (almost 3x the list version) but I didn't use `for ... in
xrange ...`. It must have all been in the while-loop test
and index incrementing.

Regards. Mel.


Thanks.

My original attempt which incidentally used range was about half a second
slower than yours with 700K of test data. Just as I was about to close the
editor window I noticed your return statement.

return len (voc.keys())

Which creates a list of keys to then apply len to, but thats an unneeded
step as len applied directly to a dictionary returns the number of keys. I
made the change and gained 7 hundreds of a second. Not much, still behind
yours, but it suggested that chainging range to xrange and avoiding the list
creation might help. Viola!

The interesting thing that benchmarking with test data shows, is that the
difference in speed between our 2 routines is about 0.05secs per 700K
processed. That difference remains constant at least upto 80Mb of input
data, implying that slices are no more efficent than string appending, and
may actually in this case be less efficent.

Anthony McDonald
Jul 18 '05 #7
"Anthony McDonald" <tonym1972(at)c lub-internet(in)fr> wrote:
return len (voc.keys())

Which creates a list of keys to then apply len to, but thats an unneeded
step as len applied directly to a dictionary returns the number of keys.


Below are two more small optimization possibilities. Originally I
wasn't going to post them because they are way into micro optimization
country and because the setdefault solution is beautiful but blond. At
least on some of my machines however they are both faster than the
solutions offered so far.

Anton

def lzx(s):
word,D = '',{}
Dget, Dset = D.get, D.__setitem__
for c in s:
word += c
if not Dget(word):
Dset(word,True)
word = ''
return len(D)

def lzy(s):
j,D = 0,{}
func = D.setdefault
for i in xrange(1,len(s) +1):
if func(s[j:i],j) is j: j = i
return len(D)
Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
1762
by: Pat | last post by:
Given A, B and C I would like to calculate the sum of each entry: A=B+C for each i,j The easiest way is to use for-loop, but I think the performance is not good. Is it possible to find out some faster way to do that? Thanks.
3
3325
by: PWalker | last post by:
Hi, I have written code that I would like to optimize. I need to push it to the limit interms of speed as the accuracy of results are proportional to runtime. First off, would anyone know any resources that explains how to optimize code i.e. give some rules on c++ optimization? e.g. using memcpy to copy an array (which i have done). Also, what is the best sorting algorithm out there for sorting an array of of size 100 or less? I have...
7
2285
by: Evangelista Sami | last post by:
Hi all i have implemented a list type as an array of pointer like this typedef struct { int nb_elements; void **elements; } list; to avoid having a pointer for each element as it is done with a recursive
7
2686
by: Rajeev | last post by:
Hello, I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed optimizations turned on. For the following loop R=(evaluate here); // float N=(evaluate here); // N min=1 max=100 median=66 for (i=0;i<N;i++){ R+=A*B*K; // all variables are float=4 bytes
65
6678
by: He Shiming | last post by:
Hi, I just wrote a function that has over 200 "cases" wrapped in a "switch" statement. I'm wondering if there are performance issues in such implementation. Do I need to optimize it some way? In terms of generated machine code, how does hundreds of cases in a switch differ from hundreds of if-elses? Do compilers or processors do any optimization on such structured code? Do I need to worry about the performance, usually?
10
4563
by: MariusI | last post by:
I stumbled over an optimization (or lack of one, to be specific) when viewing IL opcodes generated by the compiler using ms .net 2003. I was testing fast pixel manipulation using Bitmap.LockBits and unsafe pointers to iterate over an image's pixels. The inner-loop looked like this: for (int x = 0; x < _buffer.Width*_buffer.Height; x++) { // do manipulations here.. }
1
1155
by: Daniel Fitzgerald | last post by:
Hi all, I need some pointers. I want to optimize my pgsql databases hardware-wise. I can do this in Sybase, Oracle, SQL Server, etc. by putting let's say, indexes on one device, the trasaction logs on another, data on another drive, etc. Can I do this in Postgresql ? I have only used the simple CREATE DATABASE newdatabase ( didn't find an alternate syntax ) so far. Is it possible to do a specific component optimization or do I just...
10
357
by: Markus Grunwald | last post by:
Hello, while implementing a simple algorithm for digital filters (IIR filters), I got very confused about the very bad performance. This is the code, discussion follows: /** <!--------------------------------------------------------------------------> * The working horse: Filters a signal *
2
2677
by: ciccio | last post by:
Hi, I was wondering what the main reason is why reinterpret_cast fails to work as expected when using optimizations. Here is the simple example code which fail to give the correct result when optimizing. #include <iostream> int main() { long D1 = 0xbcfbc4f0d9b65179; long D2 = 0xbcfbc4f0d9b65042;
0
8685
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8613
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9032
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8908
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7745
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5869
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4626
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3054
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2008
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.