Hi,
Is there any special support for sparse file handling in python? My
initial search didn't bring up much (not a thorough search). I wrote
the following pice of code:
options.size = 6442450944
options.ranges = ["4096,1024","30000,314572800"]
fd = open("testfile", "w")
fd.seek(options.size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.split(",")[0])
len = int(drange.split(",")[1])
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
fd.write("a")
fd.close()
This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here? Is there a better way to create/modify sparse files?
Thanks,
Raghu. 10 5636
[dr*******@gmail.com wrote] Hi,
Is there any special support for sparse file handling in python? My initial search didn't bring up much (not a thorough search). I wrote the following pice of code:
options.size = 6442450944 options.ranges = ["4096,1024","30000,314572800"] fd = open("testfile", "w") fd.seek(options.size-1) fd.write("a") for drange in options.ranges: off = int(drange.split(",")[0]) len = int(drange.split(",")[1]) print "off =", off, " len =", len fd.seek(off) for x in range(len): fd.write("a")
fd.close()
This piece of code takes very long time and in fact I had to kill it as the linux system started doing lot of swapping. Am I doing something wrong here? Is there a better way to create/modify sparse files?
test_largefile.py in the Python test suite does this kind of thing and
doesn't take very long for me to run on Linux (SuSE 9.0 box).
Trent
--
Trent Mick Tr****@ActiveState.com
In <11**********************@f14g2000cwb.googlegroups .com>, dr*******@gmail.com wrote: options.size = 6442450944 options.ranges = ["4096,1024","30000,314572800"] fd = open("testfile", "w") fd.seek(options.size-1) fd.write("a") for drange in options.ranges: off = int(drange.split(",")[0]) len = int(drange.split(",")[1]) print "off =", off, " len =", len fd.seek(off) for x in range(len): fd.write("a")
fd.close()
This piece of code takes very long time and in fact I had to kill it as the linux system started doing lot of swapping. Am I doing something wrong here? Is there a better way to create/modify sparse files?
`range(len)` creates a list of size `len` *in memory* so you are trying to
build a list with 314,572,800 numbers. That seems to eat up all your RAM
and causes the swapping.
You can use `xrange(len)` instead which uses a constant amount of memory.
But be prepared to wait some time because now you are writing 314,572,800
characters *one by one* into the file. It would be faster to write larger
strings in each step.
Ciao,
Marc 'BlackJack' Rintsch
<dr*******@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com... Is there any special support for sparse file handling in python?
Since I have not heard of such in several years, I suspect not. CPython,
normally compiled, uses the standard C stdio lib. If your system+C has a
sparseIO lib, you would probably have to compile specially to use it.
options.size = 6442450944 options.ranges = ["4096,1024","30000,314572800"]
options.ranges = [(4096,1024),(30000,314572800)] # makes below nicer
fd = open("testfile", "w") fd.seek(options.size-1) fd.write("a") for drange in options.ranges: off = int(drange.split(",")[0]) len = int(drange.split(",")[1])
off,len = map(int, drange.split(",")) # or
off,len = [int(s) for s in drange.split(",")] # or for tuples as suggested
above
off,len = drange
print "off =", off, " len =", len fd.seek(off) for x in range(len):
If I read the above right, the 2nd len is 300,000,000+ making the space
needed for the range list a few gigabytes. I suspect this is where you
started thrashing ;-). Instead:
for x in xrange(len): # this is what xrange is for ;-)
fd.write("a")
Without indent, this is syntax error, so if your code ran at all, this
cannot be an exact copy. Even with xrange fix, 300,000,000 writes will be
slow. I would expect that an real application should create or accumulate
chunks larger than single chars.
fd.close()
This piece of code takes very long time and in fact I had to kill it as the linux system started doing lot of swapping. Am I doing something wrong here?
See above
Is there a better way to create/modify sparse files?
Unless you can access builting facilities, create your own mapping index.
Terry J. Reedy
Thanks for the info on xrange. Writing single char is just to get going
quickly. I knew that I would have to improve on that. I would like to
write chunks of 1MB which would require that I have 1MB string to
write. Is there any simple way of generating this 1MB string (other
than keep appending to a string until it reaches 1MB len)? I don't care
about the actual value of the string itself.
Thanks,
Raghu.
<dr*******@gmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com... Thanks for the info on xrange. Writing single char is just to get going quickly. I knew that I would have to improve on that. I would like to write chunks of 1MB which would require that I have 1MB string to write. Is there any simple way of generating this 1MB string
megastring = 1000000*'a' # t < 1 sec on my machine
(other than keep appending to a string until it reaches 1MB len)?
You mean like (unexecuted)
s = ''
for i in xrange(1000000): s += 'a' #?
This will allocate, copy, and deallocate 1000000 successively longer
temporary strings and is a noticeable O(n**2) operation. Since strings are
immutable, you cannot 'append' to them the way you can to lists.
Terry J. Reedy
[dr*******@gmail.com] Is there any simple way of generating this 1MB string (other than keep appending to a string until it reaches 1MB len)?
You might of course use 'x' * 1000000 for fairly quickly generating a
single string holding one million `x'.
Yet, your idea of generating a sparse file is interesting. I never
tried it with Python, but would not see why Python would not allow
it. Did someone ever played with sparse files in Python? (One problem
with sparse files is that it is next to impossible for a normal user to
create an exact copy. There is no fast way to read read them either.)
--
François Pinard http://pinard.progiciels-bpi.ca
On 17 Aug 2005 11:53:39 -0700, "dr*******@gmail.com" <dr*******@gmail.com> wrote: Hi,
Is there any special support for sparse file handling in python? My initial search didn't bring up much (not a thorough search). I wrote the following pice of code:
options.size = 6442450944 options.ranges = ["4096,1024","30000,314572800"] fd = open("testfile", "w") fd.seek(options.size-1) fd.write("a") for drange in options.ranges: off = int(drange.split(",")[0]) len = int(drange.split(",")[1]) print "off =", off, " len =", len fd.seek(off) for x in range(len): fd.write("a")
fd.close()
This piece of code takes very long time and in fact I had to kill it as the linux system started doing lot of swapping. Am I doing something wrong here? Is there a better way to create/modify sparse files?
Thanks
I'm unclear as to what your goal is. Do you just need an object that provides
an interface like a file object, but internally is more efficient than an
a normal file object when you access it as above[1], or do you need to create
a real file and record all the bytes in full (with what default for gaps?)
on disk, so that it can be opened by another program and read as an ordinary file?
Some operating system file systems may have some support for virtual zero-block runs
and lazy allocation/representation of non-zero blocks in files. It's easy to imagine
the rudiments, but I don't know of such a file system, not having looked ;-)
You could write your own "sparse-file"-representation object, and maybe use pickle
for persistence. Or maybe you could use zipfiles. The kind of data you are creating above
would probably compress really well ;-)
[1] writing 314+ million identical bytes one by one is silly, of course ;-)
BTW, len is a built-in function, and using built-in names for variables
is frowned upon as a bug-prone practice.
Regards,
Bengt Richter
Terry Reedy wrote: megastring = 1000000*'a' # t < 1 sec on my machine (other than keep appending to a string until it reaches 1MB len)?
You mean like (unexecuted) s = '' for i in xrange(1000000): s += 'a' #?
This will allocate, copy, and deallocate 1000000 successively longer temporary strings and is a noticeable O(n**2) operation.
Not exactly. CPython 2.4 added an optimization of "+=" for strings.
The for loop above takes about 1 second do execute on my machine. You
are correct in that it will take *much* longer on 2.3.
--
Benji York
My goal is very simple. Have a mechanism to create sparse files and
modify them by writing arbitratry ranges of bytes at arbitrary offsets.
I did get the information I want (xrange instead of range, and a simple
way to generate 1Mb string in memory). Thanks for pointing out about
using "len" as variable. It is indeed silly.
My only assumption from underlying OS/file system is that if I seek
past end of file and write some data, it doesn't generate blocks for
data in between. This is indeed true on Linux (I tested on ext3).
Thanks,
Raghu.
"dr*******@gmail.com" <dr*******@gmail.com> writes: My goal is very simple. Have a mechanism to create sparse files and modify them by writing arbitratry ranges of bytes at arbitrary offsets. I did get the information I want (xrange instead of range, and a simple way to generate 1Mb string in memory). Thanks for pointing out about using "len" as variable. It is indeed silly.
My only assumption from underlying OS/file system is that if I seek past end of file and write some data, it doesn't generate blocks for data in between. This is indeed true on Linux (I tested on ext3).
This better be true for anything claiming to be Unix. The results on
systems that break this aren't pretty.
<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Troy |
last post by:
Hi-
I am attempting to set up an RSS feed using PHP. It would be convenient for
me to embed PHP into an xml file like I would do to an HTML file in order to
create the XML, however the apache...
|
by: Hu Nan |
last post by:
DB2 UDB 8.1 on Red Hat Linux 8:
Just installed two Linux machines with UDB 8.1. When creating stored
procedure, like "create procedure p1() begin end", got error like:
SQL0035N The file...
|
by: Barkster |
last post by:
I've been using Dreamweaver to create my php pages and love the
functionality but when I start modifying code I start wondering if I
should be something else because it always throws off the...
|
by: djhulme |
last post by:
Hi,
I'm using GCC. Please could you tell me, what is the maximum number of
array elements that I can create in C, i.e.
char* anArray = (char*) calloc( ??MAX?? , sizeof(char) ) ;
I've...
|
by: deLenn |
last post by:
Hi,
Does scipy have an equivalent to Matlab's 'find' function, to list the
indices of all nonzero elements in a sparse matrix?
Cheers.
|
by: mediratta |
last post by:
Hi,
I want to allocate memory for a large matrix, whose size will be
around 2.5 million x 17000. Three fourth of its rows will have all
zeroes, but it is not known which will be those rows. If I...
|
by: adam.kleinbaum |
last post by:
Hi there,
I'm a novice C programmer working with a series of large (30,000 x
30,000) sparse matrices on a Linux system using the GCC compiler. To
represent and store these matrices, I'd like to...
|
by: DanielJohnson |
last post by:
I have a small project which has around 10 .py files and I run this
project using command line arguments. I have to distribute this
project to somebody.
I was wondering how can I make an...
|
by: ishakteyran |
last post by:
hello to all.. i have a realy tough assignment which requires me to add, substract, multiply, and get inverse of non-sparse and sparse matrixes..
in a more clear way it wants me to to the...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
| |