By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,078 Members | 1,846 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,078 IT Pros & Developers. It's quick & easy.

how not to run out of memory in cursor.execute

P: n/a
I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
and I find that the cursor.execute method uses a lot of memory that
never gets garbage collected. Using fetchmany instead of fetchall does
not seem to make any difference, since it's the execute that uses
memory. Breaking the query down to build lots of small tables doesn't
help, since execute doesn't give its memory back, after reading enough
small tables execute returns a memory error. What is the trick to get
memory back from execute in cx_Oracle and MySQLdb?

Jun 4 '06 #1
Share this Question
Share on Google+
8 Replies


P: n/a

jo**************@gmail.com wrote:
I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
and I find that the cursor.execute method uses a lot of memory that
never gets garbage collected. Using fetchmany instead of fetchall does
not seem to make any difference, since it's the execute that uses
memory. Breaking the query down to build lots of small tables doesn't
help, since execute doesn't give its memory back, after reading enough
small tables execute returns a memory error. What is the trick to get
memory back from execute in cx_Oracle and MySQLdb?


cx_Oracle and MySQLdb must be handled differently, due to the fact that
MySQL does not actually have cursors (MySQLdb fakes them for you).

To handle large resultsets efficiently in cx_Oracle simply use the
cursor iteration idiom:

for row in cursor:
# do stuff with the row

cx_Oracle takes care of the fetching for you, and your memory usage
should remain fairly constant when using this idiom.

To handle large resultsets in MySQLdb, you have to resort to multiple
queries:

l = 1000
o = 0

cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
rows = cursor.fetchall()
while len(rows) > 0:
# process the fetched rows
o += l
cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
rows = cursor.fetchall()

cursor.close()

As you can see, the MySQLdb version is more involved, due to the lack
of real cursor support in the MySQL database. Any database with good
cursor support will likely have good cursor iteration support in the
corresponding DBAPI driver.

Hope this helps,

L. Daniel Burr

Jun 5 '06 #2

P: n/a
amberite wrote:
jo**************@gmail.com wrote:
I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
and I find that the cursor.execute method uses a lot of memory that
never gets garbage collected. Using fetchmany instead of fetchall does
not seem to make any difference, since it's the execute that uses
memory. Breaking the query down to build lots of small tables doesn't
help, since execute doesn't give its memory back, after reading enough
small tables execute returns a memory error. What is the trick to get
memory back from execute in cx_Oracle and MySQLdb?

cx_Oracle and MySQLdb must be handled differently, due to the fact that
MySQL does not actually have cursors (MySQLdb fakes them for you).

To handle large resultsets efficiently in cx_Oracle simply use the
cursor iteration idiom:

for row in cursor:
# do stuff with the row

cx_Oracle takes care of the fetching for you, and your memory usage
should remain fairly constant when using this idiom.

To handle large resultsets in MySQLdb, you have to resort to multiple
queries:

l = 1000
o = 0

cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
rows = cursor.fetchall()
while len(rows) > 0:
# process the fetched rows
o += l
cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
rows = cursor.fetchall()

cursor.close()

As you can see, the MySQLdb version is more involved, due to the lack
of real cursor support in the MySQL database. Any database with good
cursor support will likely have good cursor iteration support in the
corresponding DBAPI driver.

Hope this helps,

L. Daniel Burr

The MySQLdb solution you give is way more complicated than it needs to
be, thereby skewing your opinion towards cx_Oracle unnecessarily.

Look up the .fetchmany() method of cursors in the DB API. There is only
any need to execute a single query no matter how large the result set:
you simply need to keep calling .fetchmany(N) (where N is whatever
you've decided by testing is your optimum chunk size) until it returns
less than N rows, at which point you have exhausted the query.

It's very little more effort to wrap this all up as a generator that
effectively allows you to use the same solution as you quote for cx_Oracle.

regards
Steve

--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Love me, love my blog http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Jun 5 '06 #3

P: n/a
On Mon, Jun 05, 2006 at 07:34:05PM +0100, Steve Holden wrote:
amberite wrote:
jo**************@gmail.com wrote:
I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
and I find that the cursor.execute method uses a lot of memory that
never gets garbage collected. Using fetchmany instead of fetchall does
not seem to make any difference, since it's the execute that uses
memory. Breaking the query down to build lots of small tables doesn't
help, since execute doesn't give its memory back, after reading enough
small tables execute returns a memory error. What is the trick to get
memory back from execute in cx_Oracle and MySQLdb?

cx_Oracle and MySQLdb must be handled differently, due to the fact that
MySQL does not actually have cursors (MySQLdb fakes them for you).

To handle large resultsets efficiently in cx_Oracle simply use the
cursor iteration idiom:

for row in cursor:
# do stuff with the row

cx_Oracle takes care of the fetching for you, and your memory usage
should remain fairly constant when using this idiom.

To handle large resultsets in MySQLdb, you have to resort to multiple
queries:

l = 1000
o = 0

cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
rows = cursor.fetchall()
while len(rows) > 0:
# process the fetched rows
o += l
cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
rows = cursor.fetchall()

cursor.close()

As you can see, the MySQLdb version is more involved, due to the lack
of real cursor support in the MySQL database. Any database with good
cursor support will likely have good cursor iteration support in the
corresponding DBAPI driver.

Hope this helps,

L. Daniel Burr

The MySQLdb solution you give is way more complicated than it needs to
be, thereby skewing your opinion towards cx_Oracle unnecessarily.

Look up the .fetchmany() method of cursors in the DB API. There is only
any need to execute a single query no matter how large the result set:
you simply need to keep calling .fetchmany(N) (where N is whatever
you've decided by testing is your optimum chunk size) until it returns
less than N rows, at which point you have exhausted the query.

It's very little more effort to wrap this all up as a generator that
effectively allows you to use the same solution as you quote for cx_Oracle.


MySQL will keep table locks until the results are all fetched so even though
the DB API allows fetchone() or fetchmany() using those with MySQLdb is
dangerous.

-Jack

Jun 5 '06 #4

P: n/a

Steve Holden wrote:

The MySQLdb solution you give is way more complicated than it needs to
be, thereby skewing your opinion towards cx_Oracle unnecessarily.

I respectfully disagree with your assertion here. The code I presented
for MySQLdb is what you *have* to do, to avoid using up too much
memory. This thread is about how to avoid running out of memory when
handling large resultsets, and the approach I outlined is really the
only way to do that with MySQLdb's DBAPI support.
Look up the .fetchmany() method of cursors in the DB API. There is only
any need to execute a single query no matter how large the result set:
you simply need to keep calling .fetchmany(N) (where N is whatever
you've decided by testing is your optimum chunk size) until it returns
less than N rows, at which point you have exhausted the query.

..fetchmany() buys you nothing in the case of large resultsets. The
memory usage will continue to climb with each call to fetchmany. This
isn't the fault of MySQLdb, it is just that MySQL doesn't support
cursors, so MySQLdb has to fake it.
It's very little more effort to wrap this all up as a generator that
effectively allows you to use the same solution as you quote for cx_Oracle.


Again, I respectfully disagree. Your proposed solution, while
resulting in shorter code, will continue to eat memory until the entire
resultset has been delivered.

L. Daniel Burr

Jun 6 '06 #5

P: n/a
Jack Diederich <ja**@performancedrivers.com> writes:
[...]
MySQL will keep table locks until the results are all fetched so even though
the DB API allows fetchone() or fetchmany() using those with MySQLdb is
dangerous.

[...]

That's not true of InnoDB tables.
John
Jun 6 '06 #6

P: n/a
whenever you are using a package that leaks memory.
it can be appropriate to use Rpyc (http://rpyc.wikispaces.com/) to run
the leaking code in a different process, and restart it from time to
time.
I've been using this method to avoid the leaks of matplotlib.

Jun 7 '06 #7

P: n/a
>>>>> "yairchu@gmail" == yairchu@gmail com <ya*****@gmail.com> writes:

yairchu@gmail> whenever you are using a package that leaks memory.
yairchu@gmail> it can be appropriate to use Rpyc
yairchu@gmail> (http://rpyc.wikispaces.com/) to run the leaking
yairchu@gmail> code in a different process, and restart it from
yairchu@gmail> time to time. I've been using this method to avoid
yairchu@gmail> the leaks of matplotlib.

The only known leak in matplotlib is in the tkagg backend which we
believe comes from tkinter and is not in matplotlib proper. There are
a variety of ways to make it look like matplotlib is leaking memory,
eg overplotting when you want to first clear the plot, or failing to
close figures properly. We have unit tests to check for leaks, and
they are passing. Perhaps you can post some code which exposes the
problem.

JDH
Jun 7 '06 #8

P: n/a
mjp
jo**************@gmail.com wrote:
I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
and I find that the cursor.execute method uses a lot of memory that
never gets garbage collected. Using fetchmany instead of fetchall does
not seem to make any difference, since it's the execute that uses
memory. [...]
For MySQLdb, the SSCursor class ("Server Side Cursor"), rather than the
default cursor class, may do what you want: retrieve the result set
row-by-row on demand, rather than all at once at the time of
..execute(). You'll need to remember to .fetch...() every row and call
..close(), however.

See the docstrings for CursorUseResultMixIn and CursorStoreResultMixIn
classes in MySQLdb.cursors for more information.
[...] Breaking the query down to build lots of small tables doesn't
help, since execute doesn't give its memory back, after reading enough
small tables execute returns a memory error. What is the trick to get
memory back from execute in cx_Oracle and MySQLdb?


Generally, the trick is to avoid consuming the memory in the first
place. :)

Regards,
Mike

Jun 11 '06 #9

This discussion thread is closed

Replies have been disabled for this discussion.