473,856 Members | 1,644 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

using mmap on large (> 2 Gig) files

Hi
Anyone ever done this? It looks like Python2.4 won't take a length arg
2 Gig since its not seen as an int.
Mathew

Oct 23 '06 #1
26 9349
my*****@jpl.nas a.gov schrieb:
Anyone ever done this? It looks like Python2.4 won't take a length arg
>2 Gig since its not seen as an int.
What architecture are you on? On a 32-bit architecture, it's likely
impossible to map in 2GiB, anyway (since it likely won't fit into the
available address space).

On a 64-bit architecture, this is a known limitation of Python 2.4:
you can't have containers with more than 2Gi items. This limitation
was removed in Python 2.5, so I recommend to upgrade. Notice that
the code has seen little testing, due to lack of proper hardware,
so I shall suggest that you review the mmap code first before using
it (or just test it out and report bugs as you find them).

Regards,
Martin
Oct 23 '06 #2
Martin v. Löwis wrote:
my*****@jpl.nas a.gov schrieb:
>Anyone ever done this? It looks like Python2.4 won't take a length arg
>>2 Gig since its not seen as an int.

What architecture are you on? On a 32-bit architecture, it's likely
impossible to map in 2GiB, anyway (since it likely won't fit into the
available address space).

On a 64-bit architecture, this is a known limitation of Python 2.4:
you can't have containers with more than 2Gi items. This limitation
was removed in Python 2.5, so I recommend to upgrade. Notice that
the code has seen little testing, due to lack of proper hardware,
NumPy uses the mmap object and I saw a paper at SciPy 2006 that used
Python 2.5 + mmap + numpy to do some pretty nice and relatively fast
manipulations of very large data sets.

So, the very useful changes by Martin have seen more testing than he is
probably aware of.

-Travis

Oct 23 '06 #3

my*****@jpl.nas a.gov wrote:
Anyone ever done this? It looks like Python2.4 won't take a length arg
http://docs.python.org/lib/module-mmap.html

It seems that Python does take a length argument, but not an offset
argument (unlike the Windows' CreateFileMappi ng/MapViewOfFile and UNIX'
mmap), so you always map from the beginning of the file. Of course if
you have ever worked with memory mapping files in C, you will probably
have experienced that mapping a large file from beginning to end is a
major slowdown. And if the file is big enough, it does not even fit
inside the 32 bit memory space of your process. Thus you have to limit
the portion of the file that is mapped, using the offset and the length
arguments.

But the question remains whether Python's "mmap" qualifies as a "memory
mapping" at all. Memory mapping a file means that the file is "mapped"
into the process address space. So if you access a certain address
(using a pointer type in C), you will actually read from or write to
the file. On Windows, this mechanism is even used to access "files"
that does not live on the file system. E.g. if CreateFileMappi ng is
called with the file handle set to INVALID_HANDLE_ VALUE, creates a file
mapping backed by the OS paging file. That is, you actually obtain a
shared memory segment e.g. usable for for inter-process communication.
How would you use Python's mmap for something like this?

I haven't looked at the source, but I'd be surprised if Python actually
maps the file into the process image when mmap is called. I believe
Python is not memory mapping at all; rather, it just opens a file in
the file system and uses fseek to move around. That is, you can use
slicing operators on Python's "memory mapped file object" as if it were
a list or a string, but it's not really memory mapping, it's just a
syntactical convinience. Because of this, you even need to manually
"flush" the memory mapping object. If you were talking to a real memory
mapped file, flushing would obviously not be required.

This probably means that your problem is irrelevant. Even if the file
is too large to fit inside a 32 bit process image, Python's memory
mapping would not be affected by this, as it is not memory mapping the
file when "mmap" is called.

Oct 23 '06 #4

Martin v. Löwis wrote:
What architecture are you on? On a 32-bit architecture, it's likely
impossible to map in 2GiB, anyway (since it likely won't fit into the
available address space).
Indeed. But why does Python's memory mapping need to be flushed? And
why doesn't Python's mmap take an offset argument to handle large
files? Is Python actually memory mapping with mmap or just faking it
with fseek? If Python isn't memory mapping, there would be no limit
imposed by the 32 bit address space.

Oct 24 '06 #5

my*****@jpl.nas a.gov wrote:
Hi
Anyone ever done this? It looks like Python2.4 won't take a length arg
2 Gig since its not seen as an int.
Lookin at Python's source (mmapmodule.c), it seems that "mmap.mmap"
always sets the offset argument in Windows MapViewOfFile and UNIX to 0.
This means that it is always mapping from the beginning of the file.
Thus, Python's mmap module is useless for large files. This is really
bad coding. The one that wrote mmapmodule.c didn't consider the
posibility that a 64 bit file system like NTFS can harbour files to
large to fit in a 32 address space. Thus,
mmapmodule.c needs to be fixed before it can be used for large files.

Oct 24 '06 #6

my*****@jpl.nas a.gov wrote:
Hi
Anyone ever done this? It looks like Python2.4 won't take a length arg
2 Gig since its not seen as an int.
Looking at Python's source (mmapmodule.c), it seems that "mmap.mmap"
always sets the offset argument in Windows MapViewOfFile and UNIX to 0.
This means that it is always mapping from the beginning of the file.
Thus, Python's mmap module is useless for large files. This is really
bad coding. The one that wrote mmapmodule.c didn't consider the
posibility that a 64 bit file system like NTFS can harbour files to
large to fit in a 32 address space. Thus,
mmapmodule.c needs to be fixed before it can be used for large files.

Oct 24 '06 #7

my*****@jpl.nas a.gov wrote:
Hi
Anyone ever done this? It looks like Python2.4 won't take a length arg
2 Gig since its not seen as an int.
Looking at Python's source (mmapmodule.c), it seems that "mmap.mmap"
always sets the offset argument in Windows' MapViewOfFile and UNIX'
mmap to 0. This means that it is always mapping from the beginning of
the file. Thus, Python's mmap module is useless for large files. This
is really bad coding. The one that wrote mmapmodule.c didn't consider
the possibility that a 64 bit file system like NTFS can harbour files
to large to fit in a 32 address space. Thus, mmapmodule.c needs to be
fixed before it can be used for large files.

Oct 24 '06 #8
Well, compiling Python 2.5 on Solaris 10 on an x86 is no walk in the
park. pyconfig.h seems to think SIZEOF_LONG is 4 and I SEGV during my
build, even after modifying the Makefile and pyconfig.h.

Mathew

Martin v. Löwis wrote:
my*****@jpl.nas a.gov schrieb:
Anyone ever done this? It looks like Python2.4 won't take a length arg
2 Gig since its not seen as an int.

What architecture are you on? On a 32-bit architecture, it's likely
impossible to map in 2GiB, anyway (since it likely won't fit into the
available address space).

On a 64-bit architecture, this is a known limitation of Python 2.4:
you can't have containers with more than 2Gi items. This limitation
was removed in Python 2.5, so I recommend to upgrade. Notice that
the code has seen little testing, due to lack of proper hardware,
so I shall suggest that you review the mmap code first before using
it (or just test it out and report bugs as you find them).

Regards,
Martin
Oct 24 '06 #9
sturlamolden wrote:
Looking at Python's source (mmapmodule.c), it seems that "mmap.mmap"
always sets the offset argument in Windows' MapViewOfFile and UNIX'
mmap to 0. This means that it is always mapping from the beginning of
the file. Thus, Python's mmap module is useless for large files. This
is really bad coding. The one that wrote mmapmodule.c didn't consider
the possibility that a 64 bit file system like NTFS can harbour files
to large to fit in a 32 address space. Thus, mmapmodule.c needs to be
fixed before it can be used for large files.
if you've gotten that far, maybe you could come up with a patch, instead
of stating that someone else "needs to fix it" ?

</F>

Oct 24 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3421
by: AES/newspost | last post by:
Can anyone provide a quick tip on the html structure to link from a thumb to a movie poster and from there on to a QuickTime movie, all in one go, using EMBED? I have a number of (large) QuickTime movie files, each of them matched to a one-frame QuickTime movie poster of the same screen size and also to a much smaller JPEG thumb of the same image as the movie poster. I want to put the small thumbs into individual Table cells to serve...
5
1934
by: David | last post by:
Hi all: I am processing a 3D bitmaps(essentially ~1024 2D bitmaps with a size of 1MB each). If I want read large amount of radom data from this series, how could I buffer the file to get optimized performance? With WinXP pro/512MB memories and no other big programmes running at the same time. Cheers
2
1584
by: TreatmentPlant | last post by:
My father is a careers advisor who sends to his clients a monthly snail mail or email as a sort of newsletter. He has been doing this for years, so has a large number of files that he would like to have turned into some sort of online, searchable archive. I know a bit of web-based programming and database construction etc, but have no real idea where to start on this project? Any tips, code snippets, ideas would be appreciated.
0
1286
by: Alexandre Guimond | last post by:
Hi. I've noticed that when i select a large number of files (> 400) using tkFileDialog.Open i get an empty list. Does anyone knows the limits of that interface regarding the maximum number of files that can be selected, or the maximum length of the resulting list? Does anyone have any work around? thx. alex.
4
1558
by: paduffy | last post by:
Folks, I've a Python 2.5 app running on 32 bit Win 2k SP4 (NTFS volume). Reading a file of 13 GBytes, one line at a time. It appears that, once the read line passes the 4 GByte boundary, I am getting occasional random line concatenations. Input file is confirmed good via UltraEdit. Groovy version of the same app runs fine. Any ideas?
1
1278
by: abhilash12 | last post by:
hai i have to use search word from files in web application so pls telll me is there any search engine for using search word in doc files Thanks And Regards abhilash
4
1703
by: npankajk | last post by:
Hi, My requirement is to create 75000 files. I have a perl script to create files , it is working fine for small number of files but the script is exiting while creating large number of files. For this I used threads. The following is the script I used. Please help me where the script is failing. I have been fighting with this since a week. use threads; &prepare_multi_data(1000,"C:\\Auto CIFS\\","dump",2048);
1
7976
by: Seisouhen | last post by:
Hi all, I am using mmap to obtain some space(mapped anonymously) and am giving the address of the assigned space to a struct pointer. Then I want to access a member of the struct that the pointer points to. The code is: //----begin struct foo{ int test; }foo; struct foo one,*one_ptr; one_ptr=mmap(NULL,sizeof(one),PROT_READ|PROT_WRITE,MAP_ANON,-1,0);
1
3901
by: =?Utf-8?B?UVNJRGV2ZWxvcGVy?= | last post by:
Using .NET 2.0 is it more efficient to copy files to a single folder versus spreading them across multiple folders. For instance if we have 100,000 files to be copied, Do we copy all of them to a single folder called 'All Files' Do we spread them out and copy them to multiple folders like Folder 000 - Copy files from 0 to 1000 Folder 001 - Copy files from 1000 to 2000 Folder 002 - Copy files from 2000 to 2999
3
2270
TheServant
by: TheServant | last post by:
Hi guys, Well I have never done this before but I am trying to facilitate for one of my friends to upload some talks online for people to download. The files are mp3's but are all about 25MB, and obviously the normal script is timing out and not really working. The upload script works, so I think it's simply the settings for my PHP installation. I would really prefer not to change my php.ini file, if there are other ways. This is the code I...
0
9758
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11049
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10772
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10378
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7086
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5754
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4568
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4169
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3195
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.