469,306 Members | 2,459 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,306 developers. It's quick & easy.

creating/modifying sparse files on linux


Hi,

Is there any special support for sparse file handling in python? My
initial search didn't bring up much (not a thorough search). I wrote
the following pice of code:

options.size = 6442450944
options.ranges = ["4096,1024","30000,314572800"]
fd = open("testfile", "w")
fd.seek(options.size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.split(",")[0])
len = int(drange.split(",")[1])
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
fd.write("a")

fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here? Is there a better way to create/modify sparse files?

Thanks,
Raghu.

Aug 17 '05 #1
10 5237
[dr*******@gmail.com wrote]

Hi,

Is there any special support for sparse file handling in python? My
initial search didn't bring up much (not a thorough search). I wrote
the following pice of code:

options.size = 6442450944
options.ranges = ["4096,1024","30000,314572800"]
fd = open("testfile", "w")
fd.seek(options.size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.split(",")[0])
len = int(drange.split(",")[1])
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
fd.write("a")

fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here? Is there a better way to create/modify sparse files?


test_largefile.py in the Python test suite does this kind of thing and
doesn't take very long for me to run on Linux (SuSE 9.0 box).

Trent

--
Trent Mick
Tr****@ActiveState.com
Aug 17 '05 #2
In <11**********************@f14g2000cwb.googlegroups .com>,
dr*******@gmail.com wrote:
options.size = 6442450944
options.ranges = ["4096,1024","30000,314572800"]
fd = open("testfile", "w")
fd.seek(options.size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.split(",")[0])
len = int(drange.split(",")[1])
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
fd.write("a")

fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here? Is there a better way to create/modify sparse files?


`range(len)` creates a list of size `len` *in memory* so you are trying to
build a list with 314,572,800 numbers. That seems to eat up all your RAM
and causes the swapping.

You can use `xrange(len)` instead which uses a constant amount of memory.
But be prepared to wait some time because now you are writing 314,572,800
characters *one by one* into the file. It would be faster to write larger
strings in each step.

Ciao,
Marc 'BlackJack' Rintsch
Aug 17 '05 #3

<dr*******@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...
Is there any special support for sparse file handling in python?
Since I have not heard of such in several years, I suspect not. CPython,
normally compiled, uses the standard C stdio lib. If your system+C has a
sparseIO lib, you would probably have to compile specially to use it.
options.size = 6442450944
options.ranges = ["4096,1024","30000,314572800"]
options.ranges = [(4096,1024),(30000,314572800)] # makes below nicer
fd = open("testfile", "w")
fd.seek(options.size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.split(",")[0])
len = int(drange.split(",")[1])
off,len = map(int, drange.split(",")) # or
off,len = [int(s) for s in drange.split(",")] # or for tuples as suggested
above
off,len = drange
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
If I read the above right, the 2nd len is 300,000,000+ making the space
needed for the range list a few gigabytes. I suspect this is where you
started thrashing ;-). Instead:

for x in xrange(len): # this is what xrange is for ;-)
fd.write("a")
Without indent, this is syntax error, so if your code ran at all, this
cannot be an exact copy. Even with xrange fix, 300,000,000 writes will be
slow. I would expect that an real application should create or accumulate
chunks larger than single chars.
fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here?
See above
Is there a better way to create/modify sparse files?


Unless you can access builting facilities, create your own mapping index.

Terry J. Reedy

Aug 17 '05 #4

Thanks for the info on xrange. Writing single char is just to get going
quickly. I knew that I would have to improve on that. I would like to
write chunks of 1MB which would require that I have 1MB string to
write. Is there any simple way of generating this 1MB string (other
than keep appending to a string until it reaches 1MB len)? I don't care
about the actual value of the string itself.

Thanks,
Raghu.

Aug 17 '05 #5

<dr*******@gmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...

Thanks for the info on xrange. Writing single char is just to get going
quickly. I knew that I would have to improve on that. I would like to
write chunks of 1MB which would require that I have 1MB string to
write. Is there any simple way of generating this 1MB string
megastring = 1000000*'a' # t < 1 sec on my machine
(other than keep appending to a string until it reaches 1MB len)?


You mean like (unexecuted)
s = ''
for i in xrange(1000000): s += 'a' #?

This will allocate, copy, and deallocate 1000000 successively longer
temporary strings and is a noticeable O(n**2) operation. Since strings are
immutable, you cannot 'append' to them the way you can to lists.

Terry J. Reedy

Aug 17 '05 #6
[dr*******@gmail.com]
Is there any simple way of generating this 1MB string (other than keep
appending to a string until it reaches 1MB len)?


You might of course use 'x' * 1000000 for fairly quickly generating a
single string holding one million `x'.

Yet, your idea of generating a sparse file is interesting. I never
tried it with Python, but would not see why Python would not allow
it. Did someone ever played with sparse files in Python? (One problem
with sparse files is that it is next to impossible for a normal user to
create an exact copy. There is no fast way to read read them either.)

--
François Pinard http://pinard.progiciels-bpi.ca
Aug 17 '05 #7
On 17 Aug 2005 11:53:39 -0700, "dr*******@gmail.com" <dr*******@gmail.com> wrote:

Hi,

Is there any special support for sparse file handling in python? My
initial search didn't bring up much (not a thorough search). I wrote
the following pice of code:

options.size = 6442450944
options.ranges = ["4096,1024","30000,314572800"]
fd = open("testfile", "w")
fd.seek(options.size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.split(",")[0])
len = int(drange.split(",")[1])
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
fd.write("a")

fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here? Is there a better way to create/modify sparse files?

Thanks

I'm unclear as to what your goal is. Do you just need an object that provides
an interface like a file object, but internally is more efficient than an
a normal file object when you access it as above[1], or do you need to create
a real file and record all the bytes in full (with what default for gaps?)
on disk, so that it can be opened by another program and read as an ordinary file?

Some operating system file systems may have some support for virtual zero-block runs
and lazy allocation/representation of non-zero blocks in files. It's easy to imagine
the rudiments, but I don't know of such a file system, not having looked ;-)

You could write your own "sparse-file"-representation object, and maybe use pickle
for persistence. Or maybe you could use zipfiles. The kind of data you are creating above
would probably compress really well ;-)

[1] writing 314+ million identical bytes one by one is silly, of course ;-)
BTW, len is a built-in function, and using built-in names for variables
is frowned upon as a bug-prone practice.

Regards,
Bengt Richter
Aug 18 '05 #8
Terry Reedy wrote:
megastring = 1000000*'a' # t < 1 sec on my machine

(other than keep appending to a string until it reaches 1MB len)?


You mean like (unexecuted)
s = ''
for i in xrange(1000000): s += 'a' #?

This will allocate, copy, and deallocate 1000000 successively longer
temporary strings and is a noticeable O(n**2) operation.


Not exactly. CPython 2.4 added an optimization of "+=" for strings.
The for loop above takes about 1 second do execute on my machine. You
are correct in that it will take *much* longer on 2.3.
--
Benji York

Aug 18 '05 #9

My goal is very simple. Have a mechanism to create sparse files and
modify them by writing arbitratry ranges of bytes at arbitrary offsets.
I did get the information I want (xrange instead of range, and a simple
way to generate 1Mb string in memory). Thanks for pointing out about
using "len" as variable. It is indeed silly.

My only assumption from underlying OS/file system is that if I seek
past end of file and write some data, it doesn't generate blocks for
data in between. This is indeed true on Linux (I tested on ext3).

Thanks,
Raghu.

Aug 18 '05 #10
"dr*******@gmail.com" <dr*******@gmail.com> writes:
My goal is very simple. Have a mechanism to create sparse files and
modify them by writing arbitratry ranges of bytes at arbitrary offsets.
I did get the information I want (xrange instead of range, and a simple
way to generate 1Mb string in memory). Thanks for pointing out about
using "len" as variable. It is indeed silly.

My only assumption from underlying OS/file system is that if I seek
past end of file and write some data, it doesn't generate blocks for
data in between. This is indeed true on Linux (I tested on ext3).


This better be true for anything claiming to be Unix. The results on
systems that break this aren't pretty.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Aug 19 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

6 posts views Thread by Troy | last post: by
1 post views Thread by Barkster | last post: by
38 posts views Thread by djhulme | last post: by
4 posts views Thread by deLenn | last post: by
3 posts views Thread by mediratta | last post: by
5 posts views Thread by adam.kleinbaum | last post: by
7 posts views Thread by DanielJohnson | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
1 post views Thread by Geralt96 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.