473,379 Members | 1,190 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,379 software developers and data experts.

Estimating memory use?

I've got a large text processing task to attack (it's actually a genomics
task; matching DNA probes against bacterial genomes). I've got roughly
200,000 probes, each of which is a 25 character long text string. My first
thought is to compile these into 200,000 regexes, but before I launch into
that, I want to do a back of the envelope guess as to how much memory that
will take.

Is there any easy way to find out how much memory a Python object takes?
If there was, it would be simple to compile a random small collection of
these patterns (say, 100 of them), and add up the sizes of the resulting
regex objects to get a rough idea of how much memory I'll need. I realize
I could just compile them all and watch the size of the Python process
grow, but that seems somewhat brute force.
Nov 27 '05 #1
7 1946
Hi,

What is your 'static' data (database), and what is your input-data?
Those 200.000 probes are your database? Perhaps they can be stored as
pickled compiled regexes and thus be loaded in pickled form; then you
don't need to keep them all in memory at once -- if you fear that
memory usage will be too big.

I don't know if perhaps other string-matching techniques can be used
btw; you don't need the full power of regexes I guess to match DNA
string patterns.
Perhaps you should investigate that a bit, and do some performance
tests?

cheers,

--Tim

Nov 27 '05 #2
Roy Smith <ro*@panix.com> wrote:
...
Is there any easy way to find out how much memory a Python object takes?


No, but there are a few early attempts out there at supplying SOME ways
(not necessarily "easy", but SOME). For example, PySizer, at
<http://pysizer.8325.org/>.
Alex

Nov 27 '05 #3
In article <1h***************************@mail.comcast.net> ,
al***@mail.comcast.net (Alex Martelli) wrote:
Roy Smith <ro*@panix.com> wrote:
...
Is there any easy way to find out how much memory a Python object takes?


No, but there are a few early attempts out there at supplying SOME ways
(not necessarily "easy", but SOME). For example, PySizer, at
<http://pysizer.8325.org/>.
Alex


Looks interesting, thanks.

I've already discovered one (very) surprising thing -- if I build a dict
containing all my regexes (takes about 3 minutes on my PowerBook) and
pickle them to a file, re-loading the pickle takes just about as long as
compiling them did in the first place.
Nov 27 '05 #4
Roy Smith wrote:
I've already discovered one (very) surprising thing -- if I build a dict
containing all my regexes (takes about 3 minutes on my PowerBook) and
pickle them to a file, re-loading the pickle takes just about as long as
compiling them did in the first place.


the internal RE byte code format is version dependent, so pickle stores the
patterns instead.

</F>

Nov 27 '05 #5
There is a function mx_sizeof() in the mx.Tools module from eGenix
which may be helpful. More at
<http://www.egenix.com/files/python/eGenix-mx-Extensions.html#mxTools>

/Jean Brouwers
PS) This is an approximation for memory usage which is useful in
certain, simple cases.

Each built-in type has an attribute __basicsize__ which is the size in
bytes needed to represent the basic type. For example
str.__basicsize__ returns 24 and int.__basictype__ returns 12.

However, __basicsize__ does not include the space needed to store the
object value. For a string, the length of the string has to be added
(times the character width). For example, the size of string "abcd"
would at least approximately str.__basicsize__ + len("abcd") bytes,
assuming single byte characters.

In addition, memory alignment should be taken into account by rounding
the size up to the next multiple of 8 (or maybe 16, depending on
platform, etc.).

An approximation for the amount of memory used by a string S (of single
byte characters) aligned to A bytes would be

(str.__basicsize__ + len(S) + A - 1) & A

Things are more complicated for types like list, tuple and dict and
instances of a class.

Nov 27 '05 #6
The name of the function in mx.Tools is sizeof() and not mx_sizeof().
My apologies.

Also, it turns out that the return value of mx.Tools.sizeof() function
is non-aligned. For example mx.Tools.sizeof("abcde") returns 29 which
is fine, but not entirely "accurate".

/Jean Brouwers

Nov 27 '05 #7
[Fredrik Lundh]
the internal RE byte code format is version dependent, so pickle
stores the patterns instead.


Oh! Nice to know. That explains why, when I was learning Python, my
initial experiment with pickles left me with the (probably wrong)
feeling that they were not worth the trouble.

It might be worth a note in the documentation, somewhere appropriate.

--
François Pinard http://pinard.progiciels-bpi.ca
Nov 28 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: ws_dev2001 | last post by:
Hello all, I am trying to obtain the size of a java object in C by using JNI. As we do not have a proper implementation of this in java, I decided to see if C could provide me some accurate data....
0
by: Andreas Suurkuusk | last post by:
Hi, I just noticed your post in the "C# memory problem: no end for our problem?" thread. In the post you implied that I do not how the garbage collector works and that I mislead people. Since...
2
by: sympatico | last post by:
I am looking for a good document of spread sheet that can help me better estimate the amount of time and money a software development project should take. I am using .NET Framework I have been...
22
by: xixi | last post by:
hi, we are using db2 udb v8.1 for windows, i have changed the buffer pool size to accommadate better performance, say size 200000, if i have multiple connection to the same database from...
2
by: Paul | last post by:
I am writing some documentation for a .NET web application and am trying to come up with minimum server computer hardware requirements such as processor speed, memory, hard drive size, ext. just...
0
by: acbcompute | last post by:
This book was just released in ebook and is due in hardcopy in a couple weeks... http://www.porterlearning.com/publications.html "Estimation with Use Cases: A Simple and Effective Approach...
1
momotaro
by: momotaro | last post by:
The mathematician Gottfried Leibniz determined the following formula for estimating the value of Pi (3.1415…): Pi/4 = 1 - 1/3 + 1/5 - 1/7 + 1/9 - 1/11 + …. Evaluate the first 200 terms of this...
5
by: kumarmdb2 | last post by:
Hi guys, For last few days we are getting out of private memory error. We have a development environment. We tried to figure out the problem but we believe that it might be related to the OS...
1
by: Jean-Paul Calderone | last post by:
On Tue, 22 Apr 2008 14:54:37 -0700 (PDT), yzghan@gmail.com wrote: The test doesn't demonstrate any leaks. It does demonstrate that memory usage can remain at or near peak memory usage even after...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.