473,399 Members | 4,177 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

mmap caching

I've been trying to track down a memory leak (which I initially
attributed erroneously to numpy) and it turns out to be caused by a
memory mapped file. It seems that mmap caches without limit the chunks
it reads, as the memory usage grows to several hundreds MBs according
to the Windows task manager before it dies with a MemoryError. I'm
positive that these chunks are not referenced anywhere else; in fact if
I change the mmap object to a normal file, memory usage remains
constant. The documentation of mmap doesn't mention anything about
this. Can the caching strategy be modified at the user level ?

George

Jan 21 '07 #1
13 3478
George Sakkis <ge***********@gmail.comwrote:
I've been trying to track down a memory leak (which I initially
attributed erroneously to numpy) and it turns out to be caused by a
memory mapped file. It seems that mmap caches without limit the chunks
it reads, as the memory usage grows to several hundreds MBs according
to the Windows task manager before it dies with a MemoryError. I'm
positive that these chunks are not referenced anywhere else; in fact if
I change the mmap object to a normal file, memory usage remains
constant. The documentation of mmap doesn't mention anything about
this. Can the caching strategy be modified at the user level ?
I'm not familiar with mmap() on windows, but assuming it works the
same way as unix...

The point of mmap() is to map files into memory. It is completely up
to the OS to bring pages into memory for you to read / write to, and
completely up to the OS to get rid of them again.

What you would expect is that the file is demand paged into memory as
you access bits of it. These pages will remain in memory until the OS
feels some memory pressure when the pages will be written out if dirty
and then dropped.

The OS will try to keep hold of pages as long as possible just in case
you need them again. The pages dropped should be the least recently
used pages.

I wouldn't have expected a MemoryError though...

Did you do mmap.flush() after writing?

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Jan 21 '07 #2
Nick Craig-Wood wrote:
George Sakkis <ge***********@gmail.comwrote:
I've been trying to track down a memory leak (which I initially
attributed erroneously to numpy) and it turns out to be caused by a
memory mapped file. It seems that mmap caches without limit the chunks
it reads, as the memory usage grows to several hundreds MBs according
to the Windows task manager before it dies with a MemoryError. I'm
positive that these chunks are not referenced anywhere else; in fact if
I change the mmap object to a normal file, memory usage remains
constant. The documentation of mmap doesn't mention anything about
this. Can the caching strategy be modified at the user level ?

I'm not familiar with mmap() on windows, but assuming it works the
same way as unix...

The point of mmap() is to map files into memory. It is completely up
to the OS to bring pages into memory for you to read / write to, and
completely up to the OS to get rid of them again.

What you would expect is that the file is demand paged into memory as
you access bits of it. These pages will remain in memory until the OS
feels some memory pressure when the pages will be written out if dirty
and then dropped.

The OS will try to keep hold of pages as long as possible just in case
you need them again. The pages dropped should be the least recently
used pages.

I wouldn't have expected a MemoryError though...

Did you do mmap.flush() after writing?
The file is written once and then opened as read-only, there's no
flushing. So if caching is completely up to the OS, I take it that my
options are either (1) modify my algorithms so that they work in
fixed-size batches instead of arbitrarily long sequences or (2)
implement my own memory-mapping scheme to fit my algorithms. I guess
(1) would be the less trouble overall, or is there a way to give a hint
to the OS on how large cache can it use ?

George

Jan 21 '07 #3
George Sakkis schrieb:
I've been trying to track down a memory leak (which I initially
attributed erroneously to numpy) and it turns out to be caused by a
memory mapped file. It seems that mmap caches without limit the chunks
it reads, as the memory usage grows to several hundreds MBs according
to the Windows task manager before it dies with a MemoryError.
You must be misinterpreting what you are seeing. It's the operating
system that decides what part of a memory-mapped file are held in
memory, and that is certainly not without limits.

Notice that there are several values that can be called "memory
usage" (such as the size of the committed address space, the working
set size, etc); you don't mention which of these values grows several
hundreds MB.

Regards,
Martin
Jan 21 '07 #4
Martin v. Löwis wrote:
George Sakkis schrieb:
I've been trying to track down a memory leak (which I initially
attributed erroneously to numpy) and it turns out to be caused by a
memory mapped file. It seems that mmap caches without limit the chunks
it reads, as the memory usage grows to several hundreds MBs according
to the Windows task manager before it dies with a MemoryError.

You must be misinterpreting what you are seeing. It's the operating
system that decides what part of a memory-mapped file are held in
memory, and that is certainly not without limits.
Sure; what I meant was that that whatever the limit is, it's high
enough that a MemoryError is raised before the limit is reached.
Notice that there are several values that can be called "memory
usage" (such as the size of the committed address space, the working
set size, etc); you don't mention which of these values grows several
hundreds MB.
It's the one in the 'Processes' tab of the Windows task manager (XP
proffesional). By the way, I ran the same program on a box with more
physical memory and the mem. usage stops growing at around 430MB, by
which time the whole file is most likely cached. I'd be interested in
any suggestions other than "buy more RAM" :) (these are not my machines
anyway).

Thanks,
George

Jan 21 '07 #5
George Sakkis schrieb:
>You must be misinterpreting what you are seeing. It's the operating
system that decides what part of a memory-mapped file are held in
memory, and that is certainly not without limits.

Sure; what I meant was that that whatever the limit is, it's high
enough that a MemoryError is raised before the limit is reached.
The operating system will absolutely, definitely, certainly release
any cached data it can purge before reporting it is out of memory.

So if you get a MemoryError, it is *not* because the operating system
has cached too much data.

In fact, memory that is read in because of mmap should *never* cause
a MemoryError. Python calls MapViewOfFile when mmap.mmap is invoked,
at which point the operating commits to providing that much address
space to the application, along with backing storage on disk
(typically, from the file being mapped, unless it is an anonymous
map). Later access to the mapped range cannot fail (except for
hardware errors), and if it would, you wouldn't see a MemoryError.
It's the one in the 'Processes' tab of the Windows task manager (XP
proffesional). By the way, I ran the same program on a box with more
physical memory and the mem. usage stops growing at around 430MB, by
which time the whole file is most likely cached. I'd be interested in
any suggestions other than "buy more RAM" :) (these are not my machines
anyway).
As a starting point, try understanding better what is really happening.
Turn on "Virtual Memory Size" in "View/Select Columns" also, and perhaps
a few additional counters as well. Also take a look at the "Commit
Charge", which takes into account swap file usage as well. Try
increasing the size of the swap file.

Regards,
Martin
Jan 22 '07 #6
Martin v. Löwis <ma****@v.loewis.dewrote:
In fact, memory that is read in because of mmap should *never* cause
a MemoryError. Python calls MapViewOfFile when mmap.mmap is invoked,
at which point the operating commits to providing that much address
space to the application, along with backing storage on disk
(typically, from the file being mapped, unless it is an anonymous
map). Later access to the mapped range cannot fail (except for
hardware errors), and if it would, you wouldn't see a MemoryError.
So presumably it is python generating a MemoryError. It is asking for
a new bit of memory and it is failing so it throws a MemoryError.

Could memory allocation under windows be affected by a large chunk of
mmap()ed file which is physically swapped in at the time of the
allocation?

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Jan 22 '07 #7
George Sakkis <ge***********@gmail.comwrote:
The file is written once and then opened as read-only, there's no
flushing. So if caching is completely up to the OS, I take it that my
options are either (1) modify my algorithms so that they work in
fixed-size batches instead of arbitrarily long sequences or (2)
implement my own memory-mapping scheme to fit my algorithms. I guess
(1) would be the less trouble overall, or is there a way to give a hint
to the OS on how large cache can it use ?
The above behaviour isn't as expected. So either there is something
going on in your program that we don't know about or there is a bug
somewhere, either in the OS or in python.

Can you make a short program to replicate the problem? That will help
narrow down the problem.

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Jan 22 '07 #8
Dennis Lee Bieber wrote:
On 21 Jan 2007 13:32:19 -0800, "George Sakkis" <ge***********@gmail.com>
declaimed the following in comp.lang.python:

The file is written once and then opened as read-only, there's no
flushing. So if caching is completely up to the OS, I take it that my

How large is said file? While the OS should handle swapping pages as
needed, you do have to recall that those pages are /mapped/ into the
process virtual address space. Trying to mmap a 2GB file into a process
that is already using 1GB of memory may not work (what is the default
Windows split? 2GB process and 2GB shared OS?)
It's around 400MB. As I said, I cannot reproduce the MemoryError
locally since I have 1GB physical space but IIRC the user who reported
it had less. Actually I am less concerned about whether a MemoryError
is raised or not in this case and more about the fact that even if
there's no exception, the program may suffer from severe thrashing due
to constant swapping. That's an issue with the specific
program/algorithm rather with Python or the OS.

George

Jan 22 '07 #9
In fact, memory that is read in because of mmap should *never* cause
a MemoryError.
This is certainly not true. You can run out of virtual address space by
reading data from a memory mapped file.
Python calls MapViewOfFile when mmap.mmap is invoked,
at which point the operating commits to providing that much address
space to the application, along with backing storage on disk
(typically, from the file being mapped, unless it is an anonymous
map). Later access to the mapped range cannot fail (except for
hardware errors), and if it would, you wouldn't see a MemoryError.
Hmm, maybe I'm wrong. Are you sure that Windows allocates the size of
the whole file in terms of memory address space? I also wrote a program
before (in Delphi). That program was playing a memory mapped wave file.
From the task manager, I have seen that "used memory" was growing as
the program was playing the wave file. For me, this indicates that
Windows extends the mapped address space in chunks.

Regards,

Laszlo

Jan 22 '07 #10
It's around 400MB. As I said, I cannot reproduce the MemoryError
locally since I have 1GB physical space but IIRC the user who reported
it had less. Actually I am less concerned about whether a MemoryError
is raised or not in this case and more about the fact that even if
there's no exception, the program may suffer from severe thrashing due
to constant swapping. That's an issue with the specific
program/algorithm rather with Python or the OS.
Well, if the same program runs when you have 1GB physical memory then
probably the problem is not that you ran out of virtual address space.
It would help to provide the related code from your program.

Laszlo

Jan 22 '07 #11
Laszlo Nagy schrieb:
>
>In fact, memory that is read in because of mmap should *never* cause
a MemoryError.
This is certainly not true. You can run out of virtual address space by
reading data from a memory mapped file.
That is true, but not what I said. I said you cannot run out of memory
*while reading it*. You can only run out of virtual address space when
you invoke mmap.mmap itself (and when the application later tries to
allocate more virtual address space through VirtualAlloc).
>Python calls MapViewOfFile when mmap.mmap is invoked,
at which point the operating commits to providing that much address
space to the application, along with backing storage on disk
(typically, from the file being mapped, unless it is an anonymous
map). Later access to the mapped range cannot fail (except for
hardware errors), and if it would, you wouldn't see a MemoryError.
Hmm, maybe I'm wrong. Are you sure that Windows allocates the size of
the whole file in terms of memory address space?
Yes, I am. See MapViewOfFile, at

http://msdn2.microsoft.com/en-us/library/aa366761.aspx

"Mapping a file makes the specified portion of a file visible in the
address space of the calling process."

Notice allocating address space doesn't consume much memory (it
consumes a little memory for the page tables).
I also wrote a program
before (in Delphi). That program was playing a memory mapped wave file.
From the task manager, I have seen that "used memory" was growing as the
program was playing the wave file. For me, this indicates that Windows
extends the mapped address space in chunks.
You are misinterpreting the data. I'm not sure what precisely
"used memory" is, most likely it is the working set of the process, i.e.
the amount the number of physical pages that are allocated for the
process. That is typically much smaller than the address space, since
many pages will be paged out (or not yet read in at all).

You need to display the virtual address space in the task manager
to determine how much address space the application is using.

Regards,
Martin
Jan 22 '07 #12
Nick Craig-Wood schrieb:
So presumably it is python generating a MemoryError. It is asking for
a new bit of memory and it is failing so it throws a MemoryError.

Could memory allocation under windows be affected by a large chunk of
mmap()ed file which is physically swapped in at the time of the
allocation?
To my knowledge, no. There might be virtual memory quotas, but I don't
think Windows supports such a concept.

More likely, this is entirely unrelated to the mmap issue. I would
guess that the machine on which the problem occurs is close to
exhausting its swap file (because of other activities in the system),
so Python occasionally manages to exhaust the swap file, through
regular allocations (memory-mapped files don't contribute to
swap file usage, as they have their own disk-backing, namely in the
file being mapped).

Regards,
Martin
Jan 22 '07 #13
George Sakkis wrote:
It's around 400MB.
On Windows you may not be able to map a file of this size into memory
because of virtual address space fragmentation. A Win32 process has
only 2G of virtual address space, and DLLs tend to get scattered
through out that address space.
As I said, I cannot reproduce the MemoryError
locally since I have 1GB physical space but IIRC the user who reported
it had less.
Virtual address space fragmentation isn't affected by the amount of
physical memory in your system. A system with 64MB of RAM might be
able to map a 400MB file while system with 3G of RAM might not be able
to map it because of how DLLs got loaded in to the process.
Actually I am less concerned about whether a MemoryError
is raised or not in this case and more about the fact that even if
there's no exception, the program may suffer from severe thrashing due
to constant swapping.
Well, that's what you're asking for when you use mmap. The same
mechanism that creates virtual memory using a swap file is used to
create a virtual memory mapping of your file. When you read from the
mmap file pages from the file a swapped into memory and stay in memory
until they need to be swapped out to make room for something else. If
you don't want this behaviour, don't use mmap.

Ross Ridge

Jan 22 '07 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Hao Xu | last post by:
Hi everyone! I found that if you want to write to the memory got by mmap(), you have to get the file descriptor for mmap() in O_RDWR mode. If you got the file descriptor in O_WRONLY mode, then...
7
by: Michael | last post by:
I'm writing an application that decodes a file containing binary records. Each record is a particular event type. Each record is translated into ASCII and then written to a file. Each file contains...
4
by: Fabiano Sidler | last post by:
Hi folks! I created an mmap object like so: --- snip --- from mmap import mmap,MAP_ANONYMOUS,MAP_PRIVATE fl = file('/dev/zero','rw') mm = mmap(fl.fileno(), 1, MAP_PRIVATE|MAP_ANONYMOUS) ---...
2
by: beejisbrigit | last post by:
Hi there, I was wondering if anyone had experience with File I/O in Java vs. C++ using mmap(), and knew if the performance was better in one that the other, or more or less negligible. My...
1
by: James T. Dennis | last post by:
I've been thinking about the Python mmap module quite a bit during the last couple of days. Sadly most of it has just been thinking ... and reading pages from Google searches ... and very little...
1
by: koara | last post by:
Hello all, i am using the mmap module (python2.4) to access contents of a file. My question regards the relative performance of mmap.seek() vs mmap.tell(). I have a generator that returns...
2
by: Neal Becker | last post by:
On linux, I don't understand why: f = open ('/dev/eos', 'rw') m = mmap.mmap(f.fileno(), 1000000, prot=mmap.PROT_READ|mmap.PROT_WRITE, flags=mmap.MAP_SHARED) gives 'permission denied', but...
0
by: Kris Kennaway | last post by:
If I do the following: def mmap_search(f, string): fh = file(f) mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ) return mm.find(string) def mmap_is_in(f, string): fh =...
0
by: Gabriel Genellina | last post by:
En Thu, 29 May 2008 19:17:05 -0300, Kris Kennaway <kris@FreeBSD.org> escribió: Looks like you should define the sq_contains member in mmap_as_sequence, and the type should have the...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.