gc penalty of 30-40% when manipulating large data structures?

Aaron Watters

Poking around I discovered somewhere someone saying that
Python gc adds a 4-7% speed penalty.

So since I was pretty sure I was not creating
reference cycles in nucular I tried running the tests with garbage
collection disabled.

To my delight I found that index builds run 30-40% faster without
gc. This is really nice because testing gc.collect() afterward
shows that gc was not actually doing anything.

I haven't analyzed memory consumption but I suspect that should
be significantly improved also, since the index builds construct
some fairly large data structures with lots of references for a
garbage collector to keep track of.

Somewhere someone should mention the possibility that disabling
gc can greatly improve performance with no down side if you
don't create reference cycles. I couldn't find anything like this
on the Python site or elsewhere. As Paul (I think) said, this should
be a FAQ.

Further, maybe Python should include some sort of "backoff"
heuristic which might go like this: If gc didn't find anything and
memory size is stable, wait longer for the next gc cycle. It's
silly to have gc kicking in thousands of times in a multi-hour
run, finding nothing every time.

Just my 2c.
-- Aaron Watters

nucular full text fielded indexing: http://nucular.sourceforge.net
===
http://www.xfeedme.com/nucular/pydis...=dingus%20fish

Nov 16 '07 #1

Subscribe Post Reply

1153

Chris Mellon

On Nov 16, 2007 8:34 AM, Aaron Watters <aa***********@gmail.comwrote:

Poking around I discovered somewhere someone saying that
Python gc adds a 4-7% speed penalty.

So since I was pretty sure I was not creating
reference cycles in nucular I tried running the tests with garbage
collection disabled.

To my delight I found that index builds run 30-40% faster without
gc. This is really nice because testing gc.collect() afterward
shows that gc was not actually doing anything.

I haven't analyzed memory consumption but I suspect that should
be significantly improved also, since the index builds construct
some fairly large data structures with lots of references for a
garbage collector to keep track of.

Somewhere someone should mention the possibility that disabling
gc can greatly improve performance with no down side if you
don't create reference cycles. I couldn't find anything like this
on the Python site or elsewhere. As Paul (I think) said, this should
be a FAQ.

Further, maybe Python should include some sort of "backoff"
heuristic which might go like this: If gc didn't find anything and
memory size is stable, wait longer for the next gc cycle. It's
silly to have gc kicking in thousands of times in a multi-hour
run, finding nothing every time.

The GC has a heuristic where it kicks in when (allocations -
deallocations) exceeds a certain threshold, which has (sometimes quite
severe) implications for building large indexes. This doesn't seem to
be very well known (it's come up at least 3-4 times on this list in
the last 6 months) and the heuristic is probably not a very good one.
If you have some ideas for improvements, you can read about the
current GC in the gc module docs (as well as in the source) and can
post them on python-ideas.

Nov 16 '07 #2

Istvan Albert

On Nov 16, 10:59 am, "Chris Mellon" <arka...@gmail.comwrote:

The GC has a heuristic where it kicks in when (allocations -
deallocations) exceeds a certain threshold,

As the available ram increases this threshold can be more easily
reached. Ever since I moved to 2Gb ram I stumbled upon issues that
were easily solved by turning the gc off (the truth is that more ram
made me lazier, I'm a little less keen to keep memory consumption down
for occasional jobs, being overly cavalier with generating lists of
1Gb in size...)

One example, when moving from a list size from 1 million to 10 million
I hit this threshold. Nowadays I disable the gc during data
initialization.

i.

Nov 16 '07 #3

by: Kenneth McDonald | last post by:

Now that I'm back to Python and all the new (to me) cool features, I find I'm using properties a lot, i.e. I'm defining: foo = property(fset=..., fget=...) for a number of properties, in many...

Python

Exceptions performance penalty

by: Michael Andersson | last post by:

Hi! Does the use of exception handling induce a performance penalty during the execution of non exception handling code? Regards, /Michael

C / C++

perfomance penalty in using execption handling..

by: A. Saksena | last post by:

Hi, Do anybody have an idea of the performance penalty while using exception handling (specially with g++) Abhishek

C / C++

unexpected abstraction penalty in C++

by: alex goldman | last post by:

class c { int x; public: inline c() : x(0) {} inline c(int i) { x = i; } inline operator const int& () const { return x; } inline operator int& () { return x; } };

C / C++

Dabo in 30 seconds?

by: EP | last post by:

Ed Leafe wrote in response to the "Python vs. Access VBA" thread: > You might want to look at Dabo, which is a database application > framework for Python. In about 30 seconds you can create an...

Python

All style tags after the first 30 style tags on an HTML page are not applied in Internet Explorer

by: Rob Nicholson | last post by:

This is a question about the following KB article: http://support.microsoft.com/kb/q262161 We've got a problem with the Infragistics NetAdvantage presentation controls which they've suggested...

ASP.NET

30 style tags per page limitation

by: Rob Nicholson | last post by:

We're using a well known presentation layer library to implement complex controls on an intranet site. IE has the following limitation which effectively means that you can only have 30 <STYLE> tags...

ASP.NET

DrawLine(Pens.Green, 0, 0, 30, 30) does not draw this time?

by: John | last post by:

I created a number of pictureboxes in a panel, and want to draw lines in those pictureboxes but I cannot. Please see the following code and make corrections. Thanks. Private Sub...

Visual Basic .NET

MS Access 2002 - print 30 records at a time

by: orbitus | last post by:

I know that I am overcomplicating this. I have records that need to be printed. Lets say 536 records, some on two or more pages in the report. I want to print 30 records, then 30 more till the...

Microsoft Access / VBA

"10, 20, 30" to [10, 20, 30]

by: Daniel Austria | last post by:

Sorry, how can i convert a string like "10, 20, 30" to a list what i can do is: s = "10, 20, 30" tmp = '' l = eval(tmp)

Python

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

gc penalty of 30-40% when manipulating large data structures?

Similar topics