473,756 Members | 3,566 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

mmap thoughts


I've been thinking about the Python mmap module quite a bit
during the last couple of days. Sadly most of it has just been
thinking ... and reading pages from Google searches ... and
very little of it as been coding.

Mostly it's just academic curiosity (I might be teaching an "overview
of programming" class in a few months, and I'd use Python for most
of the practical examples to cover a broad range of programming
topics, including the whole concept of memory mapping used, on the
one hand, as a file access abstraction and as a form of inter-process
shared memory, on the other).

Initial observations:

* The standard library reference could use some good examples.
At least of those should show use of both anonymous and
named mmap objects as shared memory.
* On Linux (various versions) using Python 2.4.x (for at
least 2.4.4 and 2.4.2) if I create on mmap'ing in one
process, then open the file using 'w' or 'w+' or 'w+b'
in another process then my first process dies with "Bus Error"

This should probably be documented.

(It's fine if I use 'a' (append) modes for opening the file).

* It seems that it's also necessary to extend a file to a given
size before creating a mapping on it. In other words you can't
mmap a newly created, 0-length file.

So it seems like the simplest example of a newly created,
non-anonymous file mapping would be something like:

sz = (1024 * 1024 * 1024 * 2 ) - 1
f=open('/tmp/mmtst.tmp','w+b ')
f.seek(sz)
f.write('\0')
f.flush()
mm = mmap.mmap(f.fil eno(), sz, mmap.MAP_SHARED )
f.close()

Even creating a zero length file and trying to create a
zero-length mapping on it (with mmap(f.fileno() ,0,...)
... with a mind towards using mmap's .resize() method on it
doesn't work. (raises: EnvironmentErro r: "Errno 22: Invalid
Argument"). BTW: the call to f.flush() does seem to be
required at least in my environments (Linux under 2.6 kernels
various distributions and the aforementioned 2.4.2 and 2.4.4
versions of Python.

* The mmtst.tmp file is "sparse" of course. So its size in
the example above is 2GB ... but the disk usage (du command)
on it is only a few KB (depending on your filesystem cluster
size etc).

* Using a function like len(mm[:]) forces the kernel's filesystem
to return a huge stream of NUL characters. (And might thrash
your system caches a little).

* On my SuSE/Novell 10.1 system, using Python 2.4.2 (their RPM
2.4.2-18) I found that anonymous mmaps would raise an
EnvironmentErro r. Using the same code on 2.4.4 on my Debian
and Fedora Core 6 system worked with no problem:

anonmm == mmap.mmap(-1,4096,mmap.MAP _ANONYMOUS|mmap .MAP_SHARED)

... and also testing on their 2.4.2-18.5 update with the same
results:

Python 2.4.2 (#1, Oct 13 2006, 17:11:24)
[GCC 4.1.0 (SUSE Linux)] on linux2
Type "help", "copyright" , "credits" or "license" for more information.
>>import mmap
mm = mmap.mmap(-1,4096, mmap.MAP_ANONYM OUS|mmap.MAP_SH ARED)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
EnvironmentErro r: [Errno 22] Invalid argument
>>>
jadestar@dhcpho stname:~uname -a
Linux dhcphostname 2.6.16.13-4-default #1 Wed May 3 ...

* On the troublesome SuSE/Novell box using:

f = open('/dev/zero','w+')
anonmm == mmap.mmap(f.fil eno(),4096,
mmap.MAP_ANONYM OUS|mmap.MAP_SH ARED)

... seems to work. However, a .resize() on that raises the same
EnvironmentErro r I was getting before.

* As noted in a few discussions in the past Python's mmap() function
doesn't take an "offset" parameter ... it always uses an offset of 0
(It seems like a patch is slated for inclusion in some future release?)

* On 32-bit Linux systems (or on systems running a 32-bit compilation
of Python) 2GB is, of course, the upper limit of an mmap'ing
The ability to map portions of larger files is a key motivation to
include the previously mentioned "offset" patch.

Other thoughts:

(Going beyond initial observations, now)

* I haven't tested this, but I presume that anonymous|share d mappings
on UNIX can only be shared with child/descendant processes ... since
there's no sort of "handle" or "key" that can be passed to unrelated
processes via any other IPC method; so only fork() based inheritence
will work.

* Another thing I haven't tested, yet: how robust are shared mappings
to adjacent/non-overlapping concurrent writes by multiple processes?
I'm hoping that you can reliably have processes writing updates to
small, pre-assigned, blocks in the mmap'ing without contention issues.

I plan to write some "hammer test" code to test this theory
... and run it for awhile on a few multi-core/SMP systems.

* It would be nice to building something like the threading Queue
and/or POSH support for multi-process support over nothing but
pure Python (presumably using the mmap module to pass serialized
objects around).

* The limitations of Python threading and the GIL for scaling on
SMP and multi-core system are notorious; having a first-class,
and reasonably portable standard library for supporting multi-PROCESS
scaling would be of tremendous benefit now that such MP systems
are becoming the norm.

* There don't seem to be any currently maintained SysV IPC
(shm, message, and semaphore) modules for Python. I guess some
people have managed to hack something together using ctypes;
but I haven't actually read, much less tested, any of that code.

* The key to robust and efficient use of shared memory is going to be
in the design and implementation of locking primitives for using it.

* I'm guessing that some additional IPC method will be required to
co-ordinate the locking --- something like Unix domain sockets,
perhaps. At least I think it would be a practically unavoidable
requirement for unrelated process to share memory.

* For related processes I could imagine a scheme whereby the parent
of each process passes a unique "mailbox" offset to each child
and where that might be used to implement a locking scheme.

It might work something like this:

Master process (parent) creates mapping and initializes
a lock mm[0:4] a child counter and a counter (also at
pre-defined offsets) and a set of mailboxes (set to the
max-number of children, or using a blocks of the mm in a
linked-list).

For each sub-process (fork()'d child) the master increments
the counter, and passes a mailbox offset (counter + current
mailbox offset) to it.

The master then goes into a loop, scanning the mailboxes
(or goes idle with a SIGUSR* handler that scans the mailboxes)

Whenever there are any non-empty mailboxes the master appends
corresponding PIDs to a lock-request queue; then it writes pops
those PIDs and writes them into "lock" offset at mm[0] (perhaps
sending a SIGUSR* to the new lock holder, too).

That process now has the lock and can work on the shared memory
When it's done it would clear the lock and signal the master

All processes have read access to the memory while it's not
locked. However, they have to read the lock counter first,
copy the data into their own address, then verify that the
lock counter has not be incremented in the interim. (All
reads are double-checked to ensure that no changes could
have occurred during the copying).

... there are alot of details I haven't considered about such a
scheme (I'm sure they'll come up if I prototype such a system).

Obvious one could envision more complex data structures which
essentially create a sort of shared "filesystem " in the shared
memory ... where the "master" process is analogous to the filesystem
"driver" for it. Interesting cases for handling dead processes
come up (the master could handle SIGCHLD by clearing locks held by
the dearly departed) ... and timeouts/catatonic processes might be
defined (master process kills the child before forcibly removing the
lock). Recovery of the last of the "master" process might be
possible (define a portion of the shared memory pool that holds the
list of processes who become the new master ... first living one on
that list assume control). But that raises new issues (can't depend
on SIGCHLD in such a scheme checking for living processes would
have to be done via kill 0 calls for example).

It's easy to see how complicated all this could become. The question
is, how simple could we make it and still have something useful?

--
Jim Dennis,
Starshine: Signed, Sealed, Delivered

May 11 '07 #1
1 5467
In article <1178924977.507 496@smirk>,
"James T. Dennis" <ja******@idiom .comwrote:
* There don't seem to be any currently maintained SysV IPC
(shm, message, and semaphore) modules for Python. I guess some
people have managed to hack something together using ctypes;
but I haven't actually read, much less tested, any of that code.
http://NikitaTheSpider.com/python/shm/
Enjoy =)

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
May 12 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3539
by: Hao Xu | last post by:
Hi everyone! I found that if you want to write to the memory got by mmap(), you have to get the file descriptor for mmap() in O_RDWR mode. If you got the file descriptor in O_WRONLY mode, then writing to the memory got by mmap() will lead to segmentation fault. Anyone knows why? Is this a rule or a bug? What if I just want to write to the file and nothing else?
4
3705
by: Fabiano Sidler | last post by:
Hi folks! I created an mmap object like so: --- snip --- from mmap import mmap,MAP_ANONYMOUS,MAP_PRIVATE fl = file('/dev/zero','rw') mm = mmap(fl.fileno(), 1, MAP_PRIVATE|MAP_ANONYMOUS) --- snap --- Now, when I try to resize mm to 10 byte
2
4927
by: beejisbrigit | last post by:
Hi there, I was wondering if anyone had experience with File I/O in Java vs. C++ using mmap(), and knew if the performance was better in one that the other, or more or less negligible. My instinct would say C++ is faster, but Java has made some improvements with its FileChannel class.
1
2692
by: koara | last post by:
Hello all, i am using the mmap module (python2.4) to access contents of a file. My question regards the relative performance of mmap.seek() vs mmap.tell(). I have a generator that returns stuff from the file, piece by piece. Since other things may happen to the mmap object in between consecutive next() calls (such as another iterator's next()), i have to store the file position before yield and restore it afterwards by means of tell()...
2
4636
by: Neal Becker | last post by:
On linux, I don't understand why: f = open ('/dev/eos', 'rw') m = mmap.mmap(f.fileno(), 1000000, prot=mmap.PROT_READ|mmap.PROT_WRITE, flags=mmap.MAP_SHARED) gives 'permission denied', but this c++ code works: #include <sys/mman.h> #include <fcntl.h>
0
1148
by: Kris Kennaway | last post by:
If I do the following: def mmap_search(f, string): fh = file(f) mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ) return mm.find(string) def mmap_is_in(f, string): fh = file(f)
0
1120
by: Gabriel Genellina | last post by:
En Thu, 29 May 2008 19:17:05 -0300, Kris Kennaway <kris@FreeBSD.org> escribió: Looks like you should define the sq_contains member in mmap_as_sequence, and the type should have the Py_TPFLAGS_HAVE_SEQUENCE_IN flag set (all in mmapmodule.c) -- Gabriel Genellina
1
1679
by: magnus.lycka | last post by:
Does anyone recognize this little Python crasher? I'll file a bug report unless someone notices it as an already documented bug. I found some open mmap bugs, but it wasn't obvious to me that this problem was one of those... Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42) on linux2 Type "help", "copyright", "credits" or "license" for more information. Segmenteringsfel (core dumped)
0
1954
by: Akira Kitada | last post by:
Hi Marc-Andre, Thanks for the suggestion. I opened a ticket for this issue: http://bugs.python.org/issue4204 Now I understand the state of the multiprocessing module, but it's too bad to see math, mmap and readline modules, that worked fine before, cannot be built anymore.
0
9275
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10034
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9872
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9843
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8713
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7248
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5142
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5304
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3805
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.