473,897 Members | 2,656 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

mmap caching

I've been trying to track down a memory leak (which I initially
attributed erroneously to numpy) and it turns out to be caused by a
memory mapped file. It seems that mmap caches without limit the chunks
it reads, as the memory usage grows to several hundreds MBs according
to the Windows task manager before it dies with a MemoryError. I'm
positive that these chunks are not referenced anywhere else; in fact if
I change the mmap object to a normal file, memory usage remains
constant. The documentation of mmap doesn't mention anything about
this. Can the caching strategy be modified at the user level ?

George

Jan 21 '07 #1
13 3536
George Sakkis <ge***********@ gmail.comwrote:
I've been trying to track down a memory leak (which I initially
attributed erroneously to numpy) and it turns out to be caused by a
memory mapped file. It seems that mmap caches without limit the chunks
it reads, as the memory usage grows to several hundreds MBs according
to the Windows task manager before it dies with a MemoryError. I'm
positive that these chunks are not referenced anywhere else; in fact if
I change the mmap object to a normal file, memory usage remains
constant. The documentation of mmap doesn't mention anything about
this. Can the caching strategy be modified at the user level ?
I'm not familiar with mmap() on windows, but assuming it works the
same way as unix...

The point of mmap() is to map files into memory. It is completely up
to the OS to bring pages into memory for you to read / write to, and
completely up to the OS to get rid of them again.

What you would expect is that the file is demand paged into memory as
you access bits of it. These pages will remain in memory until the OS
feels some memory pressure when the pages will be written out if dirty
and then dropped.

The OS will try to keep hold of pages as long as possible just in case
you need them again. The pages dropped should be the least recently
used pages.

I wouldn't have expected a MemoryError though...

Did you do mmap.flush() after writing?

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Jan 21 '07 #2
Nick Craig-Wood wrote:
George Sakkis <ge***********@ gmail.comwrote:
I've been trying to track down a memory leak (which I initially
attributed erroneously to numpy) and it turns out to be caused by a
memory mapped file. It seems that mmap caches without limit the chunks
it reads, as the memory usage grows to several hundreds MBs according
to the Windows task manager before it dies with a MemoryError. I'm
positive that these chunks are not referenced anywhere else; in fact if
I change the mmap object to a normal file, memory usage remains
constant. The documentation of mmap doesn't mention anything about
this. Can the caching strategy be modified at the user level ?

I'm not familiar with mmap() on windows, but assuming it works the
same way as unix...

The point of mmap() is to map files into memory. It is completely up
to the OS to bring pages into memory for you to read / write to, and
completely up to the OS to get rid of them again.

What you would expect is that the file is demand paged into memory as
you access bits of it. These pages will remain in memory until the OS
feels some memory pressure when the pages will be written out if dirty
and then dropped.

The OS will try to keep hold of pages as long as possible just in case
you need them again. The pages dropped should be the least recently
used pages.

I wouldn't have expected a MemoryError though...

Did you do mmap.flush() after writing?
The file is written once and then opened as read-only, there's no
flushing. So if caching is completely up to the OS, I take it that my
options are either (1) modify my algorithms so that they work in
fixed-size batches instead of arbitrarily long sequences or (2)
implement my own memory-mapping scheme to fit my algorithms. I guess
(1) would be the less trouble overall, or is there a way to give a hint
to the OS on how large cache can it use ?

George

Jan 21 '07 #3
George Sakkis schrieb:
I've been trying to track down a memory leak (which I initially
attributed erroneously to numpy) and it turns out to be caused by a
memory mapped file. It seems that mmap caches without limit the chunks
it reads, as the memory usage grows to several hundreds MBs according
to the Windows task manager before it dies with a MemoryError.
You must be misinterpreting what you are seeing. It's the operating
system that decides what part of a memory-mapped file are held in
memory, and that is certainly not without limits.

Notice that there are several values that can be called "memory
usage" (such as the size of the committed address space, the working
set size, etc); you don't mention which of these values grows several
hundreds MB.

Regards,
Martin
Jan 21 '07 #4
Martin v. Löwis wrote:
George Sakkis schrieb:
I've been trying to track down a memory leak (which I initially
attributed erroneously to numpy) and it turns out to be caused by a
memory mapped file. It seems that mmap caches without limit the chunks
it reads, as the memory usage grows to several hundreds MBs according
to the Windows task manager before it dies with a MemoryError.

You must be misinterpreting what you are seeing. It's the operating
system that decides what part of a memory-mapped file are held in
memory, and that is certainly not without limits.
Sure; what I meant was that that whatever the limit is, it's high
enough that a MemoryError is raised before the limit is reached.
Notice that there are several values that can be called "memory
usage" (such as the size of the committed address space, the working
set size, etc); you don't mention which of these values grows several
hundreds MB.
It's the one in the 'Processes' tab of the Windows task manager (XP
proffesional). By the way, I ran the same program on a box with more
physical memory and the mem. usage stops growing at around 430MB, by
which time the whole file is most likely cached. I'd be interested in
any suggestions other than "buy more RAM" :) (these are not my machines
anyway).

Thanks,
George

Jan 21 '07 #5
George Sakkis schrieb:
>You must be misinterpreting what you are seeing. It's the operating
system that decides what part of a memory-mapped file are held in
memory, and that is certainly not without limits.

Sure; what I meant was that that whatever the limit is, it's high
enough that a MemoryError is raised before the limit is reached.
The operating system will absolutely, definitely, certainly release
any cached data it can purge before reporting it is out of memory.

So if you get a MemoryError, it is *not* because the operating system
has cached too much data.

In fact, memory that is read in because of mmap should *never* cause
a MemoryError. Python calls MapViewOfFile when mmap.mmap is invoked,
at which point the operating commits to providing that much address
space to the application, along with backing storage on disk
(typically, from the file being mapped, unless it is an anonymous
map). Later access to the mapped range cannot fail (except for
hardware errors), and if it would, you wouldn't see a MemoryError.
It's the one in the 'Processes' tab of the Windows task manager (XP
proffesional). By the way, I ran the same program on a box with more
physical memory and the mem. usage stops growing at around 430MB, by
which time the whole file is most likely cached. I'd be interested in
any suggestions other than "buy more RAM" :) (these are not my machines
anyway).
As a starting point, try understanding better what is really happening.
Turn on "Virtual Memory Size" in "View/Select Columns" also, and perhaps
a few additional counters as well. Also take a look at the "Commit
Charge", which takes into account swap file usage as well. Try
increasing the size of the swap file.

Regards,
Martin
Jan 22 '07 #6
Martin v. Löwis <ma****@v.loewi s.dewrote:
In fact, memory that is read in because of mmap should *never* cause
a MemoryError. Python calls MapViewOfFile when mmap.mmap is invoked,
at which point the operating commits to providing that much address
space to the application, along with backing storage on disk
(typically, from the file being mapped, unless it is an anonymous
map). Later access to the mapped range cannot fail (except for
hardware errors), and if it would, you wouldn't see a MemoryError.
So presumably it is python generating a MemoryError. It is asking for
a new bit of memory and it is failing so it throws a MemoryError.

Could memory allocation under windows be affected by a large chunk of
mmap()ed file which is physically swapped in at the time of the
allocation?

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Jan 22 '07 #7
George Sakkis <ge***********@ gmail.comwrote:
The file is written once and then opened as read-only, there's no
flushing. So if caching is completely up to the OS, I take it that my
options are either (1) modify my algorithms so that they work in
fixed-size batches instead of arbitrarily long sequences or (2)
implement my own memory-mapping scheme to fit my algorithms. I guess
(1) would be the less trouble overall, or is there a way to give a hint
to the OS on how large cache can it use ?
The above behaviour isn't as expected. So either there is something
going on in your program that we don't know about or there is a bug
somewhere, either in the OS or in python.

Can you make a short program to replicate the problem? That will help
narrow down the problem.

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Jan 22 '07 #8
Dennis Lee Bieber wrote:
On 21 Jan 2007 13:32:19 -0800, "George Sakkis" <ge***********@ gmail.com>
declaimed the following in comp.lang.pytho n:

The file is written once and then opened as read-only, there's no
flushing. So if caching is completely up to the OS, I take it that my

How large is said file? While the OS should handle swapping pages as
needed, you do have to recall that those pages are /mapped/ into the
process virtual address space. Trying to mmap a 2GB file into a process
that is already using 1GB of memory may not work (what is the default
Windows split? 2GB process and 2GB shared OS?)
It's around 400MB. As I said, I cannot reproduce the MemoryError
locally since I have 1GB physical space but IIRC the user who reported
it had less. Actually I am less concerned about whether a MemoryError
is raised or not in this case and more about the fact that even if
there's no exception, the program may suffer from severe thrashing due
to constant swapping. That's an issue with the specific
program/algorithm rather with Python or the OS.

George

Jan 22 '07 #9
In fact, memory that is read in because of mmap should *never* cause
a MemoryError.
This is certainly not true. You can run out of virtual address space by
reading data from a memory mapped file.
Python calls MapViewOfFile when mmap.mmap is invoked,
at which point the operating commits to providing that much address
space to the application, along with backing storage on disk
(typically, from the file being mapped, unless it is an anonymous
map). Later access to the mapped range cannot fail (except for
hardware errors), and if it would, you wouldn't see a MemoryError.
Hmm, maybe I'm wrong. Are you sure that Windows allocates the size of
the whole file in terms of memory address space? I also wrote a program
before (in Delphi). That program was playing a memory mapped wave file.
From the task manager, I have seen that "used memory" was growing as
the program was playing the wave file. For me, this indicates that
Windows extends the mapped address space in chunks.

Regards,

Laszlo

Jan 22 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3548
by: Hao Xu | last post by:
Hi everyone! I found that if you want to write to the memory got by mmap(), you have to get the file descriptor for mmap() in O_RDWR mode. If you got the file descriptor in O_WRONLY mode, then writing to the memory got by mmap() will lead to segmentation fault. Anyone knows why? Is this a rule or a bug? What if I just want to write to the file and nothing else?
7
3113
by: Michael | last post by:
I'm writing an application that decodes a file containing binary records. Each record is a particular event type. Each record is translated into ASCII and then written to a file. Each file contains the same events. At the moment each record is processed one after the other. It taks about 1m40s to process a large file containing 70,000 records. Would my application benifit from multiple threads and mmap? If so what is the best way to...
4
3716
by: Fabiano Sidler | last post by:
Hi folks! I created an mmap object like so: --- snip --- from mmap import mmap,MAP_ANONYMOUS,MAP_PRIVATE fl = file('/dev/zero','rw') mm = mmap(fl.fileno(), 1, MAP_PRIVATE|MAP_ANONYMOUS) --- snap --- Now, when I try to resize mm to 10 byte
2
4935
by: beejisbrigit | last post by:
Hi there, I was wondering if anyone had experience with File I/O in Java vs. C++ using mmap(), and knew if the performance was better in one that the other, or more or less negligible. My instinct would say C++ is faster, but Java has made some improvements with its FileChannel class.
1
5480
by: James T. Dennis | last post by:
I've been thinking about the Python mmap module quite a bit during the last couple of days. Sadly most of it has just been thinking ... and reading pages from Google searches ... and very little of it as been coding. Mostly it's just academic curiosity (I might be teaching an "overview of programming" class in a few months, and I'd use Python for most of the practical examples to cover a broad range of programming topics, including the...
1
2702
by: koara | last post by:
Hello all, i am using the mmap module (python2.4) to access contents of a file. My question regards the relative performance of mmap.seek() vs mmap.tell(). I have a generator that returns stuff from the file, piece by piece. Since other things may happen to the mmap object in between consecutive next() calls (such as another iterator's next()), i have to store the file position before yield and restore it afterwards by means of tell()...
2
4645
by: Neal Becker | last post by:
On linux, I don't understand why: f = open ('/dev/eos', 'rw') m = mmap.mmap(f.fileno(), 1000000, prot=mmap.PROT_READ|mmap.PROT_WRITE, flags=mmap.MAP_SHARED) gives 'permission denied', but this c++ code works: #include <sys/mman.h> #include <fcntl.h>
0
1156
by: Kris Kennaway | last post by:
If I do the following: def mmap_search(f, string): fh = file(f) mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ) return mm.find(string) def mmap_is_in(f, string): fh = file(f)
0
1128
by: Gabriel Genellina | last post by:
En Thu, 29 May 2008 19:17:05 -0300, Kris Kennaway <kris@FreeBSD.org> escribió: Looks like you should define the sq_contains member in mmap_as_sequence, and the type should have the Py_TPFLAGS_HAVE_SEQUENCE_IN flag set (all in mmapmodule.c) -- Gabriel Genellina
0
9837
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11250
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10478
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7185
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5873
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
6074
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4698
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4293
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3300
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.