By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,602 Members | 1,473 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,602 IT Pros & Developers. It's quick & easy.

best way to discover this process's current memory usage, cross-platform?

P: n/a
Having fixed a memory leak (not the leak of a Python reference, some
other stuff I wasn't properly freeing in certain cases) in a C-coded
extension I maintain, I need a way to test that the leak is indeed
fixed. Being in a hurry, I originally used a q&d hack...:
if sys.platform in ('linux2', 'darwin'):
def _memsize():
""" this function tries to return a measurement of how much memory
this process is consuming, in some arbitrary unit (if it doesn't
manage to, it returns 0).
"""
gc.collect()
try:
x = int(os.popen('ps -p %d -o vsz|tail -1' % os.getpid()).read())
except:
x = 0
return x
else:
def _memsize():
return 0

Having a _memsize() function available, the test then does:
before = _memsize()
# a lot of repeated executions of code that should not consume
# any net memory, but used to when the leak was there
after = _memsize()
and checks that after==before.

However, that _memsize is just too much of a hack, and I really want to
clean it up. It's also not cross-platform enough. Besides, I got a bug
report from a user on a Linux platform different from those I had tested
myself, and it boils down to the fact that once in a while on his
machine it turns our that after is before+4 (for any large number of
repetitions of the code in the above comment) -- I'm not sure what the
unit of measure is supposed to be (maybe blocks of 512 byte, with a page
size of 2048? whatever...), but clearly an extra page is getting used
somewhere.

So, I thought I'd turn to the "wisdom of crowds"... how would YOU guys
go about adding to your automated regression tests one that checks that
a certain memory leak has not recurred, as cross-platform as feasible?
In particular, how would you code _memsize() "cross-platformly"? (I can
easily use C rather than Python if needed, adding it as an auxiliary
function for testing purposes to my existing extension).
TIA,

Alex
Nov 22 '05 #1
Share this Question
Share on Google+
35 Replies


P: n/a
Not sure if I should start a new thread or not, but
since this is closely related, I'll just leave it as is.

Alex Martelli wrote:
Having fixed a memory leak (not the leak of a Python reference, some
other stuff I wasn't properly freeing in certain cases) in a C-coded
extension I maintain, I need a way to test that the leak is indeed
fixed.


I would like to investigate how much memory is used by
Python objects. My motive is 98% pure intellectual
curiosity and 2% optimization.

I wonder whether I can do something like this:

obj = something()
bytes_used = sizeof(obj)

(obviously there is no built-in function sizeof...
wait, let me check... nope, not a built-in)

I've read the docs for gc and pdb and nothing stands
out to me as doing anything like this.
--
Steven.

Nov 22 '05 #2

P: n/a
Steven D'Aprano <st***@REMOVEMEcyber.com.au> wrote:
Not sure if I should start a new thread or not, but
since this is closely related, I'll just leave it as is.

Alex Martelli wrote:
Having fixed a memory leak (not the leak of a Python reference, some
other stuff I wasn't properly freeing in certain cases) in a C-coded
extension I maintain, I need a way to test that the leak is indeed
fixed.


I would like to investigate how much memory is used by
Python objects. My motive is 98% pure intellectual
curiosity and 2% optimization.


I believe that's the purpose of the PySizer project (one of the "Google
Summer of Code" projects), which was recently announced on this group
(I'm sure any search engine will be able to direct you to it, anyway).

I have not checked it out, because my purpose is different -- mine is
not a Python-related leak at all, just a leak within C code (which
happens coincidentally to be a Python extension module).
Alex
Nov 22 '05 #3

P: n/a
Alex Martelli wrote:

So, I thought I'd turn to the "wisdom of crowds"... how would YOU guys
go about adding to your automated regression tests one that checks that
a certain memory leak has not recurred, as cross-platform as feasible?
In particular, how would you code _memsize() "cross-platformly"? (I can
easily use C rather than Python if needed, adding it as an auxiliary
function for testing purposes to my existing extension).


If you are doing Unix, can you use getrusage(2)?
import resource
r = resource.getrusage(resource.RUSAGE_SELF)
print r[2:5]


I get zeroes on my gentoo amd64 box. Not sure why. I thought maybe it
was Python, but C gives the same results.

Another possibiity is to call sbrk(0) which should return the top of
the heap. You could then return this value and check it. It requires
a tiny C module, but should be easy and work on most unixes. You can
determine direction heap grows by comparing it with id(0) which should
have been allocated early in the interpreters life.

I realize this isn't perfect as memory becomes fragmented, but might
work. Since 2.3 and beyond use pymalloc, fragmentation may not be much
of an issue. As memory is allocated in a big hunk, then doled out as
necessary.

These techniques could apply to Windows with some caveats. If you are
interested in Windows, see:
http://msdn.microsoft.com/library/de...l/UCMGch09.asp

Can't think of anything fool-proof though.

HTH,
n

Nov 22 '05 #4

P: n/a
My suggestion would also be to use sbrk() as it provides a high-water
mark for the memory usage of the process.

Below is the function hiwm() I used on Linux (RedHat). MacOS X and
Unix versions are straigthforward. Not sure about Windows.

/Jean Brouwers

#if _LINUX
#include <malloc.h>

size_t hiwm (void) {
/* info.arena - number of bytes allocated
* info.hblkhd - size of the mmap'ed space
* info.uordblks - number of bytes used (?)
*/
struct mallinfo info = mallinfo();
size_t s = (size_t) info.arena + (size_t) info.hblkhd;
return (s);
}

#elif _MAXOSX || _UNIX
#include <unistd.h>

size_t hiwm (void) {
size_t s = (size_t) sbrk(0);
return (s);
}

#elif _WINDOWS
size_t hiwm (void) {
size_t s = (size_t) 0; /* ??? */
return (s);
}

#endif

Nov 22 '05 #5

P: n/a
Neal Norwitz <nn******@gmail.com> wrote:
Alex Martelli wrote:

So, I thought I'd turn to the "wisdom of crowds"... how would YOU guys
go about adding to your automated regression tests one that checks that
a certain memory leak has not recurred, as cross-platform as feasible?
In particular, how would you code _memsize() "cross-platformly"? (I can
easily use C rather than Python if needed, adding it as an auxiliary
function for testing purposes to my existing extension).
If you are doing Unix, can you use getrusage(2)?


On Unix, I could; on Linux, nope. According to man getrusage on Linux,

"""
The above struct was taken from BSD 4.3 Reno. Not all fields are
meaningful under Linux. Right now (Linux 2.4, 2.6) only the
fields ru_utime, ru_stime, ru_minflt, ru_majflt, and ru_nswap are
maintained.
"""
and indeed the memory-usage parts are zero.

import resource
r = resource.getrusage(resource.RUSAGE_SELF)
print r[2:5]
I get zeroes on my gentoo amd64 box. Not sure why. I thought maybe it
was Python, but C gives the same results.


Yep -- at least, on Linux, this misbehavior is clearly documented in the
manpage; on Darwin, aka MacOSX, you _also_ get zeros but there is no
indication in the manpage leading you to expect that.

Unfortunately I don't have any "real Unix" box around -- only Linux and
Darwin... I could try booting up OpenBSD again to check that it works
there, but given that I know it doesn't work under the most widespread
unixoid systems, it wouldn't be much use anyway, sigh.

Another possibiity is to call sbrk(0) which should return the top of
the heap. You could then return this value and check it. It requires
a tiny C module, but should be easy and work on most unixes. You can
As I said, I'm looking for leaks in a C-coded module, so it's no problem
to add some auxiliary C code to that module to help test it --
unfortunately, this approach doesn't work, see below...
determine direction heap grows by comparing it with id(0) which should
have been allocated early in the interpreters life.

I realize this isn't perfect as memory becomes fragmented, but might
work. Since 2.3 and beyond use pymalloc, fragmentation may not be much
of an issue. As memory is allocated in a big hunk, then doled out as
necessary.
But exactly because of that, sbrk(0) doesn't mean much. Consider the
tiny extension which I've just uploaded to
http://www.aleax.it/Python/memtry.c -- it essentially exposes a type
that does malloc when constructed and free when freed, and a function
sbrk0 which returns sbrk(0). What I see on my MacOSX 10.4, Python
2.4.1, gcc 4.1, is (with a little auxiliary memi.py module that does
from memtry import *
import os
def memsiz():
return int(os.popen('ps -p %d -o vsz|tail -1' % os.getpid()).read())
)...:

Helen:~/memtry alex$ python -ic 'import memi'
memi.memsiz() 35824 memi.sbrk0() 16809984 a=memi.mem(999999)
memi.sbrk0() 16809984 memi.memsiz()

40900

See? While the process's memory size grows as expected (by 500+ "units"
when allocating one meg, confirming the hypothesis that a unit is
2Kbyte), sbrk(0) just doesn't budge.

As the MacOSX "man sbrk" says,
"""
The brk and sbrk functions are historical curiosities left over from
earlier days before the advent of virtual memory management.
"""
and apparently it's now quite hard to make any USE of those quaint
oddities, in presence of any attempt, anywhere in any library linked
with the process, to do some "smart" memory allocation &c.

These techniques could apply to Windows with some caveats. If you are
interested in Windows, see:
http://msdn.microsoft.com/library/de...n-us/dnucmg/ht
ml/UCMGch09.asp

Can't think of anything fool-proof though.


Fool-proof is way beyond what I'm looking for now -- I'd settle for
"reasonably clean, works in Linux, Mac and Windows over 90% of the time,
and I can detect somehow when it isn't working";-)
Since people DO need to keep an eye on their code's memory consumption,
I'm getting convinced that the major functional lack in today's Python
standard library is some minimal set of tools to help with that task.
PySizer appears to be a start in the right direction (although it may be
at too early a stage to make sense for the standard library of Python
2.5), but (unless I'm missing something about it) it won't help with
memory leaks not directly related to Python. Maybe we SHOULD have some
function in sys to return the best guess at current memory consumption
of the whole process, implemented by appropriate techniques on each
platform -- right now, though, I'm trying to find out which these
appropriate techniques are on today's most widespread unixoid systems,
Linux and MacOSX. (As I used to be a Win32 API guru in a previous life,
I'm confident that I can find out about _that_ platform by sweating
enough blood on MSDN -- problem here is I don't have any Windows machine
with the appropriate development system to build Python, so testing
would be pretty hard, but maybe I can interest somebody who DOES have
such a setup...;-)
Alex
Nov 22 '05 #6

P: n/a
MrJean1 <Mr*****@gmail.com> wrote:
My suggestion would also be to use sbrk() as it provides a high-water
mark for the memory usage of the process.
That's definitely what I would have used in the '70s -- nowadays, alas,
it ain't that easy.
Below is the function hiwm() I used on Linux (RedHat). MacOS X and
Unix versions are straigthforward. Not sure about Windows.
The MacOSX version using sbrk is indeed straightforward, it just doesn't
work. See my response to Neal's post and my little Python extension
module at http://www.aleax.it/Python/memtry.c -- on a Mac (OSX 10.4,
Python 2.4.1, gcc 4.1) sbrk(0) returns the same value as the process's
virtual memory consumption goes up and down (as revealed by ps). As the
MacOSX's manpage says, "The brk and sbrk functions are historical
curiosities left over from earlier days before the advent of virtual
memory management."

Guess I'll now try the linux version you suggest, with mallinfo:
#if _LINUX
#include <malloc.h>

size_t hiwm (void) {
/* info.arena - number of bytes allocated
* info.hblkhd - size of the mmap'ed space
* info.uordblks - number of bytes used (?)
*/
struct mallinfo info = mallinfo();
size_t s = (size_t) info.arena + (size_t) info.hblkhd;
return (s);
}


and see if and how it works.

I do wonder why both Linux and MacOSX "implemented" getrusage, which
would be the obviously right way to do it, as such a useless empty husk
(as far as memory consumption is concerned). Ah well!-(
Alex
Nov 22 '05 #7

P: n/a
For some more details on Linux' mallinfo, see
<ftp://gee.cs.oswego.edu/pub/misc/malloc.h> and maybe function mSTATs()
in glibc/malloc/malloc.c (RedHat).

/Jean Brouwers

Nov 22 '05 #8

P: n/a
For some more details on Linux' mallinfo, see
<ftp://gee.cs.oswego.edu/pub/misc/malloc.h> and maybe function mSTATs()
in glibc/malloc/malloc.c (RedHat).

/Jean Brouwers

Nov 22 '05 #9

P: n/a
Perhaps you could extend Valgrind (http://www.valgrind.org) so it works
with python C extensions? (x86 only)

matt

Nov 22 '05 #10

P: n/a
matt <ma*************@gmail.com> wrote:
Perhaps you could extend Valgrind (http://www.valgrind.org) so it works
with python C extensions? (x86 only)


Alas, if it's x86 only I won't even look into the task (which does sound
quite daunting as the way to solve the apparently-elementary question
"how much virtual memory is this process using right now?"...!), since I
definitely cannot drop support for all PPC-based Macs (nor would I WANT
to, since they're my favourite platform anyway).
Alex
Nov 22 '05 #11

P: n/a
This may work on MacOS X. An initial, simple test does yield credible
values.

However, I am not a MacOS X expert. It is unclear which field of the
malloc_statistics_t struct to use and how malloc_zone_statistics with
zone NULL accumulates the stats for all zones.

/Jean Brouwers

#if _MACOSX
#include <malloc/malloc.h>
/* typedef struct malloc_statistics_t {
unsigned blocks_in_use;
size_t size_in_use;
size_t max_size_in_use; -- high water mark of touched memory
size_t size_allocated; -- reserved in memory
} malloc_statistics_t;
*/
size_t hiwm (
size_t since)
{
size_t s;
malloc_statistics_t t;
/* get cummulative (?) stats for all zones */
malloc_zone_statistics(NULL, &t);
s = t.size_allocated; /* or t.max_size_in_use? */
return (s - since);
}
#endif

Nov 22 '05 #12

P: n/a
MrJean1 <Mr*****@gmail.com> wrote:
This may work on MacOS X. An initial, simple test does yield credible
values.
Definitely looks promising, thanks for the pointer.
However, I am not a MacOS X expert. It is unclear which field of the
malloc_statistics_t struct to use and how malloc_zone_statistics with
zone NULL accumulates the stats for all zones.


It appears that all of this stuff is barely documented (if at all), not
just online but also in books on advanced MacOS X programming. Still, I
can research it further, since, after all, the opendarwin sources ARE
online. Thanks again!
Alex
Nov 22 '05 #13

P: n/a
Alex Martelli wrote:
matt <ma*************@gmail.com> wrote:
Perhaps you could extend Valgrind (http://www.valgrind.org) so it works
with python C extensions? (x86 only)


Alas, if it's x86 only I won't even look into the task (which does sound
quite daunting as the way to solve the apparently-elementary question
"how much virtual memory is this process using right now?"...!), since I
definitely cannot drop support for all PPC-based Macs (nor would I WANT
to, since they're my favourite platform anyway).


Valgrind actually runs on PPC (32 only?) and amd64, but I don't think
that's the way to go for this problem.

Here's a really screwy thought that I think should be portable to all
Unixes which have dynamic linking. LD_PRELOAD.

You can create your own version of malloc (and friends) and free. You
intercept each call to malloc and free (by making use of LD_PRELOAD),
keep track of the info (pointers and size) and pass the call along to
the real malloc/free. You then have all information you should need.
It increases the scope of the problem, but I think it makes it soluble
and somewhat cross-platform. Using LD_PRELOAD, requires the app be
dynamically linked which shouldn't be too big of a deal. If you are
using C++, you can hook into new/delete directly.

n

Nov 22 '05 #14

P: n/a
Alex Martelli wrote:
matt <ma*************@gmail.com> wrote:
Perhaps you could extend Valgrind (http://www.valgrind.org) so it works
with python C extensions? (x86 only)


Alas, if it's x86 only I won't even look into the task (which does sound
quite daunting as the way to solve the apparently-elementary question
"how much virtual memory is this process using right now?"...!), since I
definitely cannot drop support for all PPC-based Macs (nor would I WANT
to, since they're my favourite platform anyway).


Valgrind actually runs on PPC (32 only?) and amd64, but I don't think
that's the way to go for this problem.

Here's a really screwy thought that I think should be portable to all
Unixes which have dynamic linking. LD_PRELOAD.

You can create your own version of malloc (and friends) and free. You
intercept each call to malloc and free (by making use of LD_PRELOAD),
keep track of the info (pointers and size) and pass the call along to
the real malloc/free. You then have all information you should need.
It increases the scope of the problem, but I think it makes it soluble
and somewhat cross-platform. Using LD_PRELOAD, requires the app be
dynamically linked which shouldn't be too big of a deal. If you are
using C++, you can hook into new/delete directly.

n

Nov 22 '05 #15

P: n/a
Neal Norwitz wrote:
Valgrind actually runs on PPC (32 only?) and amd64, but I don't think
that's the way to go for this problem.
+1 for understatement of the week.
Here's a really screwy thought that I think should be portable to all
Unixes which have dynamic linking. LD_PRELOAD.


Similar work is described here:

http://www.hpl.hp.com/personal/Hans_Boehm/gc/leak.html

On the subject of memory statistics, I'm surprised no-one has mentioned
"top" in this thread (as far as I'm aware): I would have thought such
statistics would have been available to "top" and presented by that
program.

Paul

Nov 22 '05 #16

P: n/a
Neal Norwitz wrote:
Valgrind actually runs on PPC (32 only?) and amd64, but I don't think
that's the way to go for this problem.
+1 for understatement of the week.
Here's a really screwy thought that I think should be portable to all
Unixes which have dynamic linking. LD_PRELOAD.


Similar work is described here:

http://www.hpl.hp.com/personal/Hans_Boehm/gc/leak.html

On the subject of memory statistics, I'm surprised no-one has mentioned
"top" in this thread (as far as I'm aware): I would have thought such
statistics would have been available to "top" and presented by that
program.

Paul

Nov 22 '05 #17

P: n/a
> On the subject of memory statistics, I'm surprised no-one has mentioned
"top" in this thread (as far as I'm aware): I would have thought such
statistics would have been available to "top" and presented by that
program.


Talking about "top", this article may be useful:

On measuring memory usage
http://www.kdedevelopers.org/node/1445

--
Nicola Larosa - ni*******@m-tekNico.net

....Linux security has been better than many rivals. However, even
the best systems today are totally inadequate. Saying Linux is
more secure than Windows isn't really addressing the bigger issue
- neither is good enough. -- Alan Cox, September 2005
Nov 22 '05 #18

P: n/a
> On the subject of memory statistics, I'm surprised no-one has mentioned
"top" in this thread (as far as I'm aware): I would have thought such
statistics would have been available to "top" and presented by that
program.


Talking about "top", this article may be useful:

On measuring memory usage
http://www.kdedevelopers.org/node/1445

--
Nicola Larosa - ni*******@m-tekNico.net

....Linux security has been better than many rivals. However, even
the best systems today are totally inadequate. Saying Linux is
more secure than Windows isn't really addressing the bigger issue
- neither is good enough. -- Alan Cox, September 2005
Nov 22 '05 #19

P: n/a
On Tue, Nov 15, 2005 at 10:10:42PM -0800, Neal Norwitz wrote:
Alex Martelli wrote:
matt <ma*************@gmail.com> wrote:
Perhaps you could extend Valgrind (http://www.valgrind.org) so it works
with python C extensions? (x86 only)


Alas, if it's x86 only I won't even look into the task (which does sound
quite daunting as the way to solve the apparently-elementary question
"how much virtual memory is this process using right now?"...!), since I
definitely cannot drop support for all PPC-based Macs (nor would I WANT
to, since they're my favourite platform anyway).


Valgrind actually runs on PPC (32 only?) and amd64, but I don't think
that's the way to go for this problem.

Here's a really screwy thought that I think should be portable to all
Unixes which have dynamic linking. LD_PRELOAD.

You can create your own version of malloc (and friends) and free. You
intercept each call to malloc and free (by making use of LD_PRELOAD),
keep track of the info (pointers and size) and pass the call along to
the real malloc/free. You then have all information you should need.
It increases the scope of the problem, but I think it makes it soluble
and somewhat cross-platform. Using LD_PRELOAD, requires the app be
dynamically linked which shouldn't be too big of a deal. If you are
using C++, you can hook into new/delete directly.


Electric Fence[1] uses the LD_PRELOAD method. I've successfully used it to
track down leaks in a python C extension. If you look at the setup.py in
probstat[2] you'll see
#libraries = ["efence"] # uncomment to use ElectricFence
which is a holdover from developing.

-Jack

[1] http://perens.com/FreeSoftware/ElectricFence/
[2] http://probstat.sourceforge.net/
Nov 22 '05 #20

P: n/a
On Tue, Nov 15, 2005 at 10:10:42PM -0800, Neal Norwitz wrote:
Alex Martelli wrote:
matt <ma*************@gmail.com> wrote:
Perhaps you could extend Valgrind (http://www.valgrind.org) so it works
with python C extensions? (x86 only)


Alas, if it's x86 only I won't even look into the task (which does sound
quite daunting as the way to solve the apparently-elementary question
"how much virtual memory is this process using right now?"...!), since I
definitely cannot drop support for all PPC-based Macs (nor would I WANT
to, since they're my favourite platform anyway).


Valgrind actually runs on PPC (32 only?) and amd64, but I don't think
that's the way to go for this problem.

Here's a really screwy thought that I think should be portable to all
Unixes which have dynamic linking. LD_PRELOAD.

You can create your own version of malloc (and friends) and free. You
intercept each call to malloc and free (by making use of LD_PRELOAD),
keep track of the info (pointers and size) and pass the call along to
the real malloc/free. You then have all information you should need.
It increases the scope of the problem, but I think it makes it soluble
and somewhat cross-platform. Using LD_PRELOAD, requires the app be
dynamically linked which shouldn't be too big of a deal. If you are
using C++, you can hook into new/delete directly.


Electric Fence[1] uses the LD_PRELOAD method. I've successfully used it to
track down leaks in a python C extension. If you look at the setup.py in
probstat[2] you'll see
#libraries = ["efence"] # uncomment to use ElectricFence
which is a holdover from developing.

-Jack

[1] http://perens.com/FreeSoftware/ElectricFence/
[2] http://probstat.sourceforge.net/
Nov 22 '05 #21

P: n/a
Paul Boddie <pa**@boddie.org.uk> wrote:
Neal Norwitz wrote:
Valgrind actually runs on PPC (32 only?) and amd64, but I don't think
that's the way to go for this problem.
+1 for understatement of the week.
Here's a really screwy thought that I think should be portable to all
Unixes which have dynamic linking. LD_PRELOAD.


Similar work is described here:

http://www.hpl.hp.com/personal/Hans_Boehm/gc/leak.html


Interesting considerations. Taking a step back, it does feel a bit as
if the amount of infrastructure needed for a process to ask about its
resource consumption is out of whack, though -- I don't understand why
Unix-like systems such as Linux and Darwin can't just fully support some
call such as getrusage. Ah well...

On the subject of memory statistics, I'm surprised no-one has mentioned
"top" in this thread (as far as I'm aware): I would have thought such
statistics would have been available to "top" and presented by that
program.


It seems to me that top, like ps and other platform-dependent programs
such as vmmap on Darwin (MacOSX), tend to be at the very least owned by
group kmem and setgid, if not simply setuid root, because the way they
do their job is rooting through /dev/kmem and that requires privileges.
To let a Python extension module know how much VM the process is
currently using, we'd have to have the executable itself for Python be
setgid kmem or setuid root, which somehow doesn't seem appealing;-)

On MacOSX specifically, I've been pointed to an open-source third-party
utility named MemoryCell which does manage to learn about VM use for any
process w/o needing to be setuid or setgid. It does so in a module
that's 300+ lines of ObjectiveC, so it would require quite a bit of
reverse engineering to integrate into a pure-C Python extension, but at
least it serves as proof of existence;-)
Alex
Nov 22 '05 #22

P: n/a
Paul Boddie <pa**@boddie.org.uk> wrote:
Neal Norwitz wrote:
Valgrind actually runs on PPC (32 only?) and amd64, but I don't think
that's the way to go for this problem.
+1 for understatement of the week.
Here's a really screwy thought that I think should be portable to all
Unixes which have dynamic linking. LD_PRELOAD.


Similar work is described here:

http://www.hpl.hp.com/personal/Hans_Boehm/gc/leak.html


Interesting considerations. Taking a step back, it does feel a bit as
if the amount of infrastructure needed for a process to ask about its
resource consumption is out of whack, though -- I don't understand why
Unix-like systems such as Linux and Darwin can't just fully support some
call such as getrusage. Ah well...

On the subject of memory statistics, I'm surprised no-one has mentioned
"top" in this thread (as far as I'm aware): I would have thought such
statistics would have been available to "top" and presented by that
program.


It seems to me that top, like ps and other platform-dependent programs
such as vmmap on Darwin (MacOSX), tend to be at the very least owned by
group kmem and setgid, if not simply setuid root, because the way they
do their job is rooting through /dev/kmem and that requires privileges.
To let a Python extension module know how much VM the process is
currently using, we'd have to have the executable itself for Python be
setgid kmem or setuid root, which somehow doesn't seem appealing;-)

On MacOSX specifically, I've been pointed to an open-source third-party
utility named MemoryCell which does manage to learn about VM use for any
process w/o needing to be setuid or setgid. It does so in a module
that's 300+ lines of ObjectiveC, so it would require quite a bit of
reverse engineering to integrate into a pure-C Python extension, but at
least it serves as proof of existence;-)
Alex
Nov 22 '05 #23

P: n/a
Neal Norwitz wrote:
Here's a really screwy thought that I think should be portable to all
Unixes which have dynamic linking. LD_PRELOAD.

You can create your own version of malloc (and friends) and free. You
intercept each call to malloc and free (by making use of LD_PRELOAD),
keep track of the info (pointers and size) and pass the call along to
the real malloc/free. You then have all information you should need.


That'll only get you memory usage from malloc/free calls, which could
be vastly less than the process' memory usage in plausible scenarios
(e.g. a media player that uses mmap() to read the file, or anything
that uses large shared memory segments generated with mmap() or SysV
IPC, etc).

In the real world, malloc() and mmap() are probably sufficient to get a
good picture of process usage for most processes. But I guess defining
exactly what counts as the process' current memory would be a starting
place (specifically how to deal with shared memory).

Nov 22 '05 #24

P: n/a
Neal Norwitz wrote:
Here's a really screwy thought that I think should be portable to all
Unixes which have dynamic linking. LD_PRELOAD.

You can create your own version of malloc (and friends) and free. You
intercept each call to malloc and free (by making use of LD_PRELOAD),
keep track of the info (pointers and size) and pass the call along to
the real malloc/free. You then have all information you should need.


That'll only get you memory usage from malloc/free calls, which could
be vastly less than the process' memory usage in plausible scenarios
(e.g. a media player that uses mmap() to read the file, or anything
that uses large shared memory segments generated with mmap() or SysV
IPC, etc).

In the real world, malloc() and mmap() are probably sufficient to get a
good picture of process usage for most processes. But I guess defining
exactly what counts as the process' current memory would be a starting
place (specifically how to deal with shared memory).

Nov 22 '05 #25

P: n/a
sj*******@yahoo.com <sj*******@yahoo.com> wrote:
...
Neal Norwitz wrote:
Here's a really screwy thought that I think should be portable to all
Unixes which have dynamic linking. LD_PRELOAD.

You can create your own version of malloc (and friends) and free. You
intercept each call to malloc and free (by making use of LD_PRELOAD),
keep track of the info (pointers and size) and pass the call along to
the real malloc/free. You then have all information you should need.
That'll only get you memory usage from malloc/free calls, which could
be vastly less than the process' memory usage in plausible scenarios
(e.g. a media player that uses mmap() to read the file, or anything
that uses large shared memory segments generated with mmap() or SysV
IPC, etc).


True. But hopefully a cross-platform program's memory leaks will mostly
be based on malloc (it couldn't use SysV's IPC and still be
cross-platform, for example; and while mmap might be a possibility,
perhaps it might be tracked by a similar trick as malloc might).

In the real world, malloc() and mmap() are probably sufficient to get a
good picture of process usage for most processes. But I guess defining
exactly what counts as the process' current memory would be a starting
place (specifically how to deal with shared memory).


Considering that the main purpose is adding regression tests to confirm
that a hopefully-fixed memory leak does not recur, I'm not sure why
shared memory should be a problem. What scenarios would "leak shared
memory"? If some shared library gets loaded once and stays in memory
that doesn't appear to me as something that would normally be called "a
memory leak" -- unless I'm failing to see some cross-platform scenario
that would erroneously re-load the same library over and over again,
taking up growing amounts of shared memory with time?
Alex

Nov 22 '05 #26

P: n/a
sj*******@yahoo.com <sj*******@yahoo.com> wrote:
...
Neal Norwitz wrote:
Here's a really screwy thought that I think should be portable to all
Unixes which have dynamic linking. LD_PRELOAD.

You can create your own version of malloc (and friends) and free. You
intercept each call to malloc and free (by making use of LD_PRELOAD),
keep track of the info (pointers and size) and pass the call along to
the real malloc/free. You then have all information you should need.
That'll only get you memory usage from malloc/free calls, which could
be vastly less than the process' memory usage in plausible scenarios
(e.g. a media player that uses mmap() to read the file, or anything
that uses large shared memory segments generated with mmap() or SysV
IPC, etc).


True. But hopefully a cross-platform program's memory leaks will mostly
be based on malloc (it couldn't use SysV's IPC and still be
cross-platform, for example; and while mmap might be a possibility,
perhaps it might be tracked by a similar trick as malloc might).

In the real world, malloc() and mmap() are probably sufficient to get a
good picture of process usage for most processes. But I guess defining
exactly what counts as the process' current memory would be a starting
place (specifically how to deal with shared memory).


Considering that the main purpose is adding regression tests to confirm
that a hopefully-fixed memory leak does not recur, I'm not sure why
shared memory should be a problem. What scenarios would "leak shared
memory"? If some shared library gets loaded once and stays in memory
that doesn't appear to me as something that would normally be called "a
memory leak" -- unless I'm failing to see some cross-platform scenario
that would erroneously re-load the same library over and over again,
taking up growing amounts of shared memory with time?
Alex

Nov 22 '05 #27

P: n/a
Alex Martelli wrote:

Considering that the main purpose is adding regression tests to confirm
that a hopefully-fixed memory leak does not recur, I'm not sure why
shared memory should be a problem. What scenarios would "leak shared
memory"?


Apache leaks SHM segments in some scenarios. SysV SHM segments aren't
freed explicitly when the process exits. They're easy enough to clean
up by hand, but it's nicer to avoid that.

Nov 22 '05 #28

P: n/a
Alex Martelli wrote:

Considering that the main purpose is adding regression tests to confirm
that a hopefully-fixed memory leak does not recur, I'm not sure why
shared memory should be a problem. What scenarios would "leak shared
memory"?


Apache leaks SHM segments in some scenarios. SysV SHM segments aren't
freed explicitly when the process exits. They're easy enough to clean
up by hand, but it's nicer to avoid that.

Nov 22 '05 #29

P: n/a

Jack Diederich wrote:
Electric Fence[1] uses the LD_PRELOAD method. I've successfully used it to
track down leaks in a python C extension. If you look at the setup.py in
probstat[2] you'll see
#libraries = ["efence"] # uncomment to use ElectricFence
which is a holdover from developing.


I've also successfully used Electric Fence many years ago to track down
leaks in a C/Linux program. Since that time Electric Fence has been
forked into DUMA project http://duma.sourceforge.net/ and was ported to
windows, perhaps it has become cross-platform? I've never tried it on
anything besides Linux.

Nov 22 '05 #30

P: n/a

Jack Diederich wrote:
Electric Fence[1] uses the LD_PRELOAD method. I've successfully used it to
track down leaks in a python C extension. If you look at the setup.py in
probstat[2] you'll see
#libraries = ["efence"] # uncomment to use ElectricFence
which is a holdover from developing.


I've also successfully used Electric Fence many years ago to track down
leaks in a C/Linux program. Since that time Electric Fence has been
forked into DUMA project http://duma.sourceforge.net/ and was ported to
windows, perhaps it has become cross-platform? I've never tried it on
anything besides Linux.

Nov 22 '05 #31

P: n/a
Alex Martelli schrieb:
MrJean1 <Mr*****@gmail.com> wrote:
This may work on MacOS X. An initial, simple test does yield credible
values.


Definitely looks promising, thanks for the pointer.
However, I am not a MacOS X expert. It is unclear which field of the
malloc_statistics_t struct to use and how malloc_zone_statistics with
zone NULL accumulates the stats for all zones.


It appears that all of this stuff is barely documented (if at all), not
just online but also in books on advanced MacOS X programming. Still, I
can research it further, since, after all, the opendarwin sources ARE
online. Thanks again!
Alex

"mallinfo" is available on most UNIX-like systems(Linux, Solaris, QNX,
etc.) and is also included in the dlmalloc library (which works on
win32).

There is a small C extension module at
http://hathawaymix.org/Software/Sketches/
which should give access to mallinfo() and thus byte accurate memory
usage information.

I have to admit, that I did not tried it myself...

Nov 22 '05 #32

P: n/a
Alex Martelli schrieb:
MrJean1 <Mr*****@gmail.com> wrote:
This may work on MacOS X. An initial, simple test does yield credible
values.


Definitely looks promising, thanks for the pointer.
However, I am not a MacOS X expert. It is unclear which field of the
malloc_statistics_t struct to use and how malloc_zone_statistics with
zone NULL accumulates the stats for all zones.


It appears that all of this stuff is barely documented (if at all), not
just online but also in books on advanced MacOS X programming. Still, I
can research it further, since, after all, the opendarwin sources ARE
online. Thanks again!
Alex

"mallinfo" is available on most UNIX-like systems(Linux, Solaris, QNX,
etc.) and is also included in the dlmalloc library (which works on
win32).

There is a small C extension module at
http://hathawaymix.org/Software/Sketches/
which should give access to mallinfo() and thus byte accurate memory
usage information.

I have to admit, that I did not tried it myself...

Nov 22 '05 #33

P: n/a

RalfGB wrote:
Alex Martelli schrieb:
MrJean1 <Mr*****@gmail.com> wrote:
This may work on MacOS X. An initial, simple test does yield credible
values.


Definitely looks promising, thanks for the pointer.
However, I am not a MacOS X expert. It is unclear which field of the
malloc_statistics_t struct to use and how malloc_zone_statistics with
zone NULL accumulates the stats for all zones.


It appears that all of this stuff is barely documented (if at all), not
just online but also in books on advanced MacOS X programming. Still, I
can research it further, since, after all, the opendarwin sources ARE
online. Thanks again!
Alex

"mallinfo" is available on most UNIX-like systems(Linux, Solaris, QNX,
etc.) and is also included in the dlmalloc library (which works on
win32).

There is a small C extension module at
http://hathawaymix.org/Software/Sketches/
which should give access to mallinfo() and thus byte accurate memory
usage information.

I have to admit, that I did not tried it myself...


I tried it, it doesn't work for dlmalloc.
Got to find somewhere else.
Does any body know how to report correct memory usage if using dlmalloc
package ?

Thanks in advance.

Mickie

Dec 1 '05 #34

P: n/a
mi********@gmail.com wrote:
"mallinfo" is available on most UNIX-like systems(Linux, Solaris, QNX,
etc.) and is also included in the dlmalloc library (which works on
win32).

There is a small C extension module at
http://hathawaymix.org/Software/Sketches/
which should give access to mallinfo() and thus byte accurate memory
usage information.

I have to admit, that I did not tried it myself...
I tried it, it doesn't work for dlmalloc.
Got to find somewhere else.


dlmalloc supports the mallinfo API unless you compile it with the
NO_MALLINFO option. see the comments in the beginning of
the malloc.c file for details.
Does any body know how to report correct memory usage if using dlmalloc
package ?


make sure you have a dlmalloc that has mallinfo support enabled. when you've
done that, define "doesn't work for dlmalloc" (build problems, exceptions, crashes,
compiler errors, ... ???)

</F>

Dec 1 '05 #35

P: n/a
Did you try the function I posted on Nov 15? It returns the high water
mark, like sbrk(0) and works for RH Linux (which is dlmalloc, AFAIK).

/Jean Brouwers

PS) Here is that code again (for RH Linux only!)

size_t hiwm (void) {
/* info.arena - number of bytes allocated
* info.hblkhd - size of the mmap'ed space
* info.uordblks - number of bytes used (?)
*/
struct mallinfo info = mallinfo();
size_t s = (size_t) info.arena + (size_t) info.hblkhd;
return (s);
}

Dec 2 '05 #36

This discussion thread is closed

Replies have been disabled for this discussion.