473,698 Members | 2,330 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

performance degradation when looping through lists

I need to process large lists (in my real application, this is to parse the
content of a file). I noticed that the performance to access the individual list
elements degrades over runtime.

This can be reproduced easily using this code:

import time

N=100000
p=10000

A=[]
for i in range(N):
A.append(str(i) )

j = 0
t = time.clock()
for i in range(len(A)):
j += int(A[i])
if i % p == 0:
t = time.clock() - t
print t

(the string conversion only servers to increase the duration of each iteration;
you can observer the same effect with ints, too).

When running this, I get output like this:
0.0
0.37
0.03
0.4
0.06
0.43
0.09
0.46
0.13
0.49

I use Python 2.3.4 (#1, Sep 3 2004, 12:08:45)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2

I wonder why
1. The execution times alternate between "fast" and "slow" (I observe the same
effect in my much more complex application)
2. The execution times increase steadily, from "0" to 0.13 for the "fast"
phases, and from 0.37 to 0.49 for the slow phases.

Within my application, the effect is more drastical as the following numbers
show (the processing time per line is roughly the same for each line!):
at line 10000 (32258 lines/s)
at line 20000 (478 lines/s)
at line 30000 (21276 lines/s)
at line 40000 (475 lines/s)
at line 50000 (15873 lines/s)
at line 60000 (471 lines/s)
at line 70000 (12658 lines/s)
at line 80000 (468 lines/s)
at line 90000 (10638 lines/s)
at line 100000 (464 lines/s)
at line 110000 (9090 lines/s)
at line 120000 (461 lines/s)
at line 130000 (7936 lines/s)
at line 140000 (457 lines/s)
at line 150000 (7042 lines/s)
at line 160000 (454 lines/s)
at line 170000 (6369 lines/s)
at line 180000 (451 lines/s)
at line 190000 (5780 lines/s)
at line 200000 (448 lines/s)
at line 210000 (4854 lines/s)
at line 220000 (444 lines/s)
at line 230000 (4504 lines/s)
at line 240000 (441 lines/s)
at line 250000 (4201 lines/s)
at line 260000 (438 lines/s)
at line 270000 (3952 lines/s)
at line 280000 (435 lines/s)
at line 290000 (3717 lines/s)
at line 300000 (432 lines/s)
at line 310000 (3508 lines/s)
at line 320000 (429 lines/s)
at line 330000 (3322 lines/s)
at line 340000 (426 lines/s)
at line 350000 (3154 lines/s)
at line 360000 (423 lines/s)
at line 370000 (3003 lines/s)
at line 380000 (421 lines/s)
at line 390000 (2873 lines/s)

Any ideas why this is like this, and what I could do about it? It really makes
may application non-scalable as the lines/s go down even further.

--
Joachim - reply to joachim at domain ccrl-nece dot de

Opinion expressed is personal and does not constitute
an opinion or statement of NEC Laboratories.
Apr 7 '06 #1
6 1471
Joachim Worringen wrote:
I need to process large lists (in my real application, this is to parse
the content of a file).
Then you probably want to use generators instead of lists. The problem
with large lists is that they eat a lot of memory - which can result in
swapping .
I noticed that the performance to access the
individual list elements degrades over runtime.


I leave this point to gurus, but it may have to do with swapping. Also,
this is not real-time, so variations may have to do with your OS tasks
scheduler.

My 2 cents
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom. gro'.split('@')])"
Apr 7 '06 #2
bruno at modulix wrote:
Joachim Worringen wrote:
I need to process large lists (in my real application, this is to parse
the content of a file).


Then you probably want to use generators instead of lists. The problem
with large lists is that they eat a lot of memory - which can result in
swapping .


The effect also shows up in tiny examples (as the one posted) which surely don't
swap on a 512MB machine.

Also, I only read parts of the file into memory to avoid that memory becomes
exhausted.

Of course, using less memory is always a good idea - do you have a pointer on
how to use generators for this application (basically, buffering file content in
memory for faster access)? BTW, the effect also shows up with the linecache module.
I noticed that the performance to access the
individual list elements degrades over runtime.


I leave this point to gurus, but it may have to do with swapping. Also,
this is not real-time, so variations may have to do with your OS tasks
scheduler.


See above for the swapping. And the OS scheduler may create variations in
runtime, but not monotone degradation. I don't think these two effect come into
play here.

--
Joachim - reply to joachim at domain ccrl-nece dot de

Opinion expressed is personal and does not constitute
an opinion or statement of NEC Laboratories.
Apr 7 '06 #3
Joachim Worringen wrote:
I need to process large lists (in my real application, this is to parse
the content of a file). I noticed that the performance to access the
individual list elements degrades over runtime.

This can be reproduced easily using this code:

import time

N=100000
p=10000

A=[]
for i in range(N):
A.append(str(i) )

j = 0
t = time.clock()
for i in range(len(A)):
j += int(A[i])
if i % p == 0:
t = time.clock() - t
print t

(the string conversion only servers to increase the duration of each
iteration; you can observer the same effect with ints, too).

When running this, I get output like this:
0.0
0.37
0.03
0.4
0.06
0.43
0.09
0.46
0.13
0.49

I use Python 2.3.4 (#1, Sep 3 2004, 12:08:45)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2

I wonder why
1. The execution times alternate between "fast" and "slow" (I observe the
same effect in my much more complex application)


Your timing code is buggy. Change it to

import time

N=100000
p=10000

A=[]
for i in range(N):
A.append(str(i) )

j = 0
start = time.clock()
for i in range(len(A)):
j += int(A[i])
if i % p == 0:
end = time.clock()
print end - start
start = end

Does the problem persist? I hope not.

Peter
Apr 7 '06 #4
Joachim Worringen on comp.lang.pytho n said:
I use Python 2.3.4 (#1, Sep 3 2004, 12:08:45)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2


Check Peter Otten's answer, and remember as well that GCC 2.96 can lead to
highly strange issues whenever used.

--
Alan Franzoni <al************ ***@gmail.com>
-
Togli .xyz dalla mia email per contattarmi.
Rremove .xyz from my address in order to contact me.
-
GPG Key Fingerprint:
5C77 9DC3 BD5B 3A28 E7BC 921A 0255 42AA FE06 8F3E
Apr 7 '06 #5
Peter Otten wrote:
Your timing code is buggy. Change it to


Ooops, you're right. Everything is fine now... Thanks.

Joachim

--
Joachim - reply to joachim at domain ccrl-nece dot de

Opinion expressed is personal and does not constitute
an opinion or statement of NEC Laboratories.
Apr 7 '06 #6
Hi,

I wrote a program some days back and I was using lists heavily for
performing operations such as pop, remove, append. My list size was
around 1024x3 and there were around 20 different objects like this.

What I want to ask you is that my program also degraded over a period
of time. I cannot post the code as its lot of code.

But I want to ask a question why List degrade. What other alternative
for lists is a faster measure.

Eveyr help is greatly appreciated,

Apr 7 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
7353
by: Thomas | last post by:
Does anyone know if the stl string class implements some sort of pooling or chunking of memory for performance optimization? I know that the Lucent SCL has this built in, but not sure if the same is true of STL. The platform is Solaris 2.8. Thanks, Thomas
12
8348
by: serge | last post by:
I have an SP that is big, huge, 700-800 lines. I am not an expert but I need to figure out every possible way that I can improve the performance speed of this SP. In the next couple of weeks I will work on preparing SQL statements that will create the tables, insert sample record and run the SP. I would hope people will look at my SP and give me any hints on how I can better write the SP.
4
1932
by: Jason Heyes | last post by:
What can I do to circumvent the performance degradation associated with dynamic allocation and small objects? Thanks.
0
1193
by: Andrew Mayo | last post by:
This problem was discovered with MSDE2000 SP2 and under WinXP SP2. We are unsure whether it is more widespread as it has only been seen on one machine to date. The problem is related to name resolution. If you attempt to connect to a local database with a connect string using server=. rather than
6
2323
by: teedilo | last post by:
We have an application with a SQL Server 2000 back end that is fairly database intensive -- lots of fairly frequent queries, inserts, updates -- the gamut. The application does not make use of performance hogs like cursors, but I know there are lots of ways the application could be made more efficient database-wise. The server code is running VB6 of all things, using COM+ database interfaces. There are some clustered and non-clustered...
3
1707
by: adsheehan | last post by:
Hi all, Wondering if a GIL lock/unlock causes a re-schedule/contect swap when embedding Python in a multi-threaded C/C++ app on Unix ? If so, do I have any control or influence on this re-scheduling ? The app suffers from serious performance degradation (compared to pure c/C++) and high context switches that I suspect the GIL unlocking may be aggravating ?
22
3351
by: Kevin Murphy | last post by:
I'm using PG 7.4.3 on Mac OS X. I am disappointed with the performance of queries like 'select foo from bar where baz in (subquery)', or updates like 'update bar set foo = 2 where baz in (subquery)'. PG always seems to want to do a sequential scan of the bar table. I wish there were a way of telling PG, "use the index on baz in your plan, because I know that the subquery will return very few results". Where it really matters, I have...
11
4503
by: Richard Maher | last post by:
Hi, I have read many of the copius entries on the subject of IE performance (or the lack thereof) when populating Select Lists. I don't mind the insert performance so much, (I get 100x120byte rows inserted/sec up to 500, and 100rows/6secs up to 3000, which isn't great but then the Row Count is clicking away for the user to see and they can hit the "cancel" button at anytime, so overall I'm happy), what really disappoints me is the...
2
1004
by: ermayank | last post by:
We have a legacy application which is in VB. we are migrating it in vb.net (2005). While coding we found ourself in a fix, the scenario is -- In vb for fetching data we are using Recordset, While iterating this using while loop we found that iteration is taking 21 sec to fetch about 450 records and looping through them and assigning respective values to excel sheet cells. For the same thing in vb.net, using Datareader , time is 51...
0
8675
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8604
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9029
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8897
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8862
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
4370
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4619
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2331
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2002
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.