473,224 Members | 1,602 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,224 software developers and data experts.

gc penalty of 30-40% when manipulating large data structures?

Poking around I discovered somewhere someone saying that
Python gc adds a 4-7% speed penalty.

So since I was pretty sure I was not creating
reference cycles in nucular I tried running the tests with garbage
collection disabled.

To my delight I found that index builds run 30-40% faster without
gc. This is really nice because testing gc.collect() afterward
shows that gc was not actually doing anything.

I haven't analyzed memory consumption but I suspect that should
be significantly improved also, since the index builds construct
some fairly large data structures with lots of references for a
garbage collector to keep track of.

Somewhere someone should mention the possibility that disabling
gc can greatly improve performance with no down side if you
don't create reference cycles. I couldn't find anything like this
on the Python site or elsewhere. As Paul (I think) said, this should
be a FAQ.

Further, maybe Python should include some sort of "backoff"
heuristic which might go like this: If gc didn't find anything and
memory size is stable, wait longer for the next gc cycle. It's
silly to have gc kicking in thousands of times in a multi-hour
run, finding nothing every time.

Just my 2c.
-- Aaron Watters

nucular full text fielded indexing: http://nucular.sourceforge.net
===
http://www.xfeedme.com/nucular/pydis...=dingus%20fish
Nov 16 '07 #1
2 1149
On Nov 16, 2007 8:34 AM, Aaron Watters <aa***********@gmail.comwrote:
Poking around I discovered somewhere someone saying that
Python gc adds a 4-7% speed penalty.

So since I was pretty sure I was not creating
reference cycles in nucular I tried running the tests with garbage
collection disabled.

To my delight I found that index builds run 30-40% faster without
gc. This is really nice because testing gc.collect() afterward
shows that gc was not actually doing anything.

I haven't analyzed memory consumption but I suspect that should
be significantly improved also, since the index builds construct
some fairly large data structures with lots of references for a
garbage collector to keep track of.

Somewhere someone should mention the possibility that disabling
gc can greatly improve performance with no down side if you
don't create reference cycles. I couldn't find anything like this
on the Python site or elsewhere. As Paul (I think) said, this should
be a FAQ.

Further, maybe Python should include some sort of "backoff"
heuristic which might go like this: If gc didn't find anything and
memory size is stable, wait longer for the next gc cycle. It's
silly to have gc kicking in thousands of times in a multi-hour
run, finding nothing every time.
The GC has a heuristic where it kicks in when (allocations -
deallocations) exceeds a certain threshold, which has (sometimes quite
severe) implications for building large indexes. This doesn't seem to
be very well known (it's come up at least 3-4 times on this list in
the last 6 months) and the heuristic is probably not a very good one.
If you have some ideas for improvements, you can read about the
current GC in the gc module docs (as well as in the source) and can
post them on python-ideas.
Nov 16 '07 #2
On Nov 16, 10:59 am, "Chris Mellon" <arka...@gmail.comwrote:
The GC has a heuristic where it kicks in when (allocations -
deallocations) exceeds a certain threshold,
As the available ram increases this threshold can be more easily
reached. Ever since I moved to 2Gb ram I stumbled upon issues that
were easily solved by turning the gc off (the truth is that more ram
made me lazier, I'm a little less keen to keep memory consumption down
for occasional jobs, being overly cavalier with generating lists of
1Gb in size...)

One example, when moving from a list size from 1 million to 10 million
I hit this threshold. Nowadays I disable the gc during data
initialization.

i.

Nov 16 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Kenneth McDonald | last post by:
Now that I'm back to Python and all the new (to me) cool features, I find I'm using properties a lot, i.e. I'm defining: foo = property(fset=..., fget=...) for a number of properties, in many...
7
by: Michael Andersson | last post by:
Hi! Does the use of exception handling induce a performance penalty during the execution of non exception handling code? Regards, /Michael
9
by: A. Saksena | last post by:
Hi, Do anybody have an idea of the performance penalty while using exception handling (specially with g++) Abhishek
8
by: alex goldman | last post by:
class c { int x; public: inline c() : x(0) {} inline c(int i) { x = i; } inline operator const int& () const { return x; } inline operator int& () { return x; } };
0
by: EP | last post by:
Ed Leafe wrote in response to the "Python vs. Access VBA" thread: > You might want to look at Dabo, which is a database application > framework for Python. In about 30 seconds you can create an...
3
by: Rob Nicholson | last post by:
This is a question about the following KB article: http://support.microsoft.com/kb/q262161 We've got a problem with the Infragistics NetAdvantage presentation controls which they've suggested...
7
by: Rob Nicholson | last post by:
We're using a well known presentation layer library to implement complex controls on an intranet site. IE has the following limitation which effectively means that you can only have 30 <STYLE> tags...
2
by: John | last post by:
I created a number of pictureboxes in a panel, and want to draw lines in those pictureboxes but I cannot. Please see the following code and make corrections. Thanks. Private Sub...
3
by: orbitus | last post by:
I know that I am overcomplicating this. I have records that need to be printed. Lets say 536 records, some on two or more pages in the report. I want to print 30 records, then 30 more till the...
6
by: Daniel Austria | last post by:
Sorry, how can i convert a string like "10, 20, 30" to a list what i can do is: s = "10, 20, 30" tmp = '' l = eval(tmp)
1
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
0
by: veera ravala | last post by:
ServiceNow is a powerful cloud-based platform that offers a wide range of services to help organizations manage their workflows, operations, and IT services more efficiently. At its core, ServiceNow...
0
by: VivesProcSPL | last post by:
Obviously, one of the original purposes of SQL is to make data query processing easy. The language uses many English-like terms and syntax in an effort to make it easy to learn, particularly for...
0
by: mar23 | last post by:
Here's the situation. I have a form called frmDiceInventory with subform called subfrmDice. The subform's control source is linked to a query called qryDiceInventory. I've been trying to pick up the...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
by: jimatqsi | last post by:
The boss wants the word "CONFIDENTIAL" overlaying certain reports. He wants it large, slanted across the page, on every page, very light gray, outlined letters, not block letters. I thought Word Art...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.