473,397 Members | 2,116 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

gc penalty of 30-40% when manipulating large data structures?

Poking around I discovered somewhere someone saying that
Python gc adds a 4-7% speed penalty.

So since I was pretty sure I was not creating
reference cycles in nucular I tried running the tests with garbage
collection disabled.

To my delight I found that index builds run 30-40% faster without
gc. This is really nice because testing gc.collect() afterward
shows that gc was not actually doing anything.

I haven't analyzed memory consumption but I suspect that should
be significantly improved also, since the index builds construct
some fairly large data structures with lots of references for a
garbage collector to keep track of.

Somewhere someone should mention the possibility that disabling
gc can greatly improve performance with no down side if you
don't create reference cycles. I couldn't find anything like this
on the Python site or elsewhere. As Paul (I think) said, this should
be a FAQ.

Further, maybe Python should include some sort of "backoff"
heuristic which might go like this: If gc didn't find anything and
memory size is stable, wait longer for the next gc cycle. It's
silly to have gc kicking in thousands of times in a multi-hour
run, finding nothing every time.

Just my 2c.
-- Aaron Watters

nucular full text fielded indexing: http://nucular.sourceforge.net
===
http://www.xfeedme.com/nucular/pydis...=dingus%20fish
Nov 16 '07 #1
2 1153
On Nov 16, 2007 8:34 AM, Aaron Watters <aa***********@gmail.comwrote:
Poking around I discovered somewhere someone saying that
Python gc adds a 4-7% speed penalty.

So since I was pretty sure I was not creating
reference cycles in nucular I tried running the tests with garbage
collection disabled.

To my delight I found that index builds run 30-40% faster without
gc. This is really nice because testing gc.collect() afterward
shows that gc was not actually doing anything.

I haven't analyzed memory consumption but I suspect that should
be significantly improved also, since the index builds construct
some fairly large data structures with lots of references for a
garbage collector to keep track of.

Somewhere someone should mention the possibility that disabling
gc can greatly improve performance with no down side if you
don't create reference cycles. I couldn't find anything like this
on the Python site or elsewhere. As Paul (I think) said, this should
be a FAQ.

Further, maybe Python should include some sort of "backoff"
heuristic which might go like this: If gc didn't find anything and
memory size is stable, wait longer for the next gc cycle. It's
silly to have gc kicking in thousands of times in a multi-hour
run, finding nothing every time.
The GC has a heuristic where it kicks in when (allocations -
deallocations) exceeds a certain threshold, which has (sometimes quite
severe) implications for building large indexes. This doesn't seem to
be very well known (it's come up at least 3-4 times on this list in
the last 6 months) and the heuristic is probably not a very good one.
If you have some ideas for improvements, you can read about the
current GC in the gc module docs (as well as in the source) and can
post them on python-ideas.
Nov 16 '07 #2
On Nov 16, 10:59 am, "Chris Mellon" <arka...@gmail.comwrote:
The GC has a heuristic where it kicks in when (allocations -
deallocations) exceeds a certain threshold,
As the available ram increases this threshold can be more easily
reached. Ever since I moved to 2Gb ram I stumbled upon issues that
were easily solved by turning the gc off (the truth is that more ram
made me lazier, I'm a little less keen to keep memory consumption down
for occasional jobs, being overly cavalier with generating lists of
1Gb in size...)

One example, when moving from a list size from 1 million to 10 million
I hit this threshold. Nowadays I disable the gc during data
initialization.

i.

Nov 16 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Kenneth McDonald | last post by:
Now that I'm back to Python and all the new (to me) cool features, I find I'm using properties a lot, i.e. I'm defining: foo = property(fset=..., fget=...) for a number of properties, in many...
7
by: Michael Andersson | last post by:
Hi! Does the use of exception handling induce a performance penalty during the execution of non exception handling code? Regards, /Michael
9
by: A. Saksena | last post by:
Hi, Do anybody have an idea of the performance penalty while using exception handling (specially with g++) Abhishek
8
by: alex goldman | last post by:
class c { int x; public: inline c() : x(0) {} inline c(int i) { x = i; } inline operator const int& () const { return x; } inline operator int& () { return x; } };
0
by: EP | last post by:
Ed Leafe wrote in response to the "Python vs. Access VBA" thread: > You might want to look at Dabo, which is a database application > framework for Python. In about 30 seconds you can create an...
3
by: Rob Nicholson | last post by:
This is a question about the following KB article: http://support.microsoft.com/kb/q262161 We've got a problem with the Infragistics NetAdvantage presentation controls which they've suggested...
7
by: Rob Nicholson | last post by:
We're using a well known presentation layer library to implement complex controls on an intranet site. IE has the following limitation which effectively means that you can only have 30 <STYLE> tags...
2
by: John | last post by:
I created a number of pictureboxes in a panel, and want to draw lines in those pictureboxes but I cannot. Please see the following code and make corrections. Thanks. Private Sub...
3
by: orbitus | last post by:
I know that I am overcomplicating this. I have records that need to be printed. Lets say 536 records, some on two or more pages in the report. I want to print 30 records, then 30 more till the...
6
by: Daniel Austria | last post by:
Sorry, how can i convert a string like "10, 20, 30" to a list what i can do is: s = "10, 20, 30" tmp = '' l = eval(tmp)
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.