473,414 Members | 1,775 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,414 software developers and data experts.

creating/modifying sparse files on linux


Hi,

Is there any special support for sparse file handling in python? My
initial search didn't bring up much (not a thorough search). I wrote
the following pice of code:

options.size = 6442450944
options.ranges = ["4096,1024","30000,314572800"]
fd = open("testfile", "w")
fd.seek(options.size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.split(",")[0])
len = int(drange.split(",")[1])
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
fd.write("a")

fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here? Is there a better way to create/modify sparse files?

Thanks,
Raghu.

Aug 17 '05 #1
10 5636
[dr*******@gmail.com wrote]

Hi,

Is there any special support for sparse file handling in python? My
initial search didn't bring up much (not a thorough search). I wrote
the following pice of code:

options.size = 6442450944
options.ranges = ["4096,1024","30000,314572800"]
fd = open("testfile", "w")
fd.seek(options.size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.split(",")[0])
len = int(drange.split(",")[1])
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
fd.write("a")

fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here? Is there a better way to create/modify sparse files?


test_largefile.py in the Python test suite does this kind of thing and
doesn't take very long for me to run on Linux (SuSE 9.0 box).

Trent

--
Trent Mick
Tr****@ActiveState.com
Aug 17 '05 #2
In <11**********************@f14g2000cwb.googlegroups .com>,
dr*******@gmail.com wrote:
options.size = 6442450944
options.ranges = ["4096,1024","30000,314572800"]
fd = open("testfile", "w")
fd.seek(options.size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.split(",")[0])
len = int(drange.split(",")[1])
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
fd.write("a")

fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here? Is there a better way to create/modify sparse files?


`range(len)` creates a list of size `len` *in memory* so you are trying to
build a list with 314,572,800 numbers. That seems to eat up all your RAM
and causes the swapping.

You can use `xrange(len)` instead which uses a constant amount of memory.
But be prepared to wait some time because now you are writing 314,572,800
characters *one by one* into the file. It would be faster to write larger
strings in each step.

Ciao,
Marc 'BlackJack' Rintsch
Aug 17 '05 #3

<dr*******@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...
Is there any special support for sparse file handling in python?
Since I have not heard of such in several years, I suspect not. CPython,
normally compiled, uses the standard C stdio lib. If your system+C has a
sparseIO lib, you would probably have to compile specially to use it.
options.size = 6442450944
options.ranges = ["4096,1024","30000,314572800"]
options.ranges = [(4096,1024),(30000,314572800)] # makes below nicer
fd = open("testfile", "w")
fd.seek(options.size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.split(",")[0])
len = int(drange.split(",")[1])
off,len = map(int, drange.split(",")) # or
off,len = [int(s) for s in drange.split(",")] # or for tuples as suggested
above
off,len = drange
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
If I read the above right, the 2nd len is 300,000,000+ making the space
needed for the range list a few gigabytes. I suspect this is where you
started thrashing ;-). Instead:

for x in xrange(len): # this is what xrange is for ;-)
fd.write("a")
Without indent, this is syntax error, so if your code ran at all, this
cannot be an exact copy. Even with xrange fix, 300,000,000 writes will be
slow. I would expect that an real application should create or accumulate
chunks larger than single chars.
fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here?
See above
Is there a better way to create/modify sparse files?


Unless you can access builting facilities, create your own mapping index.

Terry J. Reedy

Aug 17 '05 #4

Thanks for the info on xrange. Writing single char is just to get going
quickly. I knew that I would have to improve on that. I would like to
write chunks of 1MB which would require that I have 1MB string to
write. Is there any simple way of generating this 1MB string (other
than keep appending to a string until it reaches 1MB len)? I don't care
about the actual value of the string itself.

Thanks,
Raghu.

Aug 17 '05 #5

<dr*******@gmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...

Thanks for the info on xrange. Writing single char is just to get going
quickly. I knew that I would have to improve on that. I would like to
write chunks of 1MB which would require that I have 1MB string to
write. Is there any simple way of generating this 1MB string
megastring = 1000000*'a' # t < 1 sec on my machine
(other than keep appending to a string until it reaches 1MB len)?


You mean like (unexecuted)
s = ''
for i in xrange(1000000): s += 'a' #?

This will allocate, copy, and deallocate 1000000 successively longer
temporary strings and is a noticeable O(n**2) operation. Since strings are
immutable, you cannot 'append' to them the way you can to lists.

Terry J. Reedy

Aug 17 '05 #6
[dr*******@gmail.com]
Is there any simple way of generating this 1MB string (other than keep
appending to a string until it reaches 1MB len)?


You might of course use 'x' * 1000000 for fairly quickly generating a
single string holding one million `x'.

Yet, your idea of generating a sparse file is interesting. I never
tried it with Python, but would not see why Python would not allow
it. Did someone ever played with sparse files in Python? (One problem
with sparse files is that it is next to impossible for a normal user to
create an exact copy. There is no fast way to read read them either.)

--
François Pinard http://pinard.progiciels-bpi.ca
Aug 17 '05 #7
On 17 Aug 2005 11:53:39 -0700, "dr*******@gmail.com" <dr*******@gmail.com> wrote:

Hi,

Is there any special support for sparse file handling in python? My
initial search didn't bring up much (not a thorough search). I wrote
the following pice of code:

options.size = 6442450944
options.ranges = ["4096,1024","30000,314572800"]
fd = open("testfile", "w")
fd.seek(options.size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.split(",")[0])
len = int(drange.split(",")[1])
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
fd.write("a")

fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here? Is there a better way to create/modify sparse files?

Thanks

I'm unclear as to what your goal is. Do you just need an object that provides
an interface like a file object, but internally is more efficient than an
a normal file object when you access it as above[1], or do you need to create
a real file and record all the bytes in full (with what default for gaps?)
on disk, so that it can be opened by another program and read as an ordinary file?

Some operating system file systems may have some support for virtual zero-block runs
and lazy allocation/representation of non-zero blocks in files. It's easy to imagine
the rudiments, but I don't know of such a file system, not having looked ;-)

You could write your own "sparse-file"-representation object, and maybe use pickle
for persistence. Or maybe you could use zipfiles. The kind of data you are creating above
would probably compress really well ;-)

[1] writing 314+ million identical bytes one by one is silly, of course ;-)
BTW, len is a built-in function, and using built-in names for variables
is frowned upon as a bug-prone practice.

Regards,
Bengt Richter
Aug 18 '05 #8
Terry Reedy wrote:
megastring = 1000000*'a' # t < 1 sec on my machine

(other than keep appending to a string until it reaches 1MB len)?


You mean like (unexecuted)
s = ''
for i in xrange(1000000): s += 'a' #?

This will allocate, copy, and deallocate 1000000 successively longer
temporary strings and is a noticeable O(n**2) operation.


Not exactly. CPython 2.4 added an optimization of "+=" for strings.
The for loop above takes about 1 second do execute on my machine. You
are correct in that it will take *much* longer on 2.3.
--
Benji York

Aug 18 '05 #9

My goal is very simple. Have a mechanism to create sparse files and
modify them by writing arbitratry ranges of bytes at arbitrary offsets.
I did get the information I want (xrange instead of range, and a simple
way to generate 1Mb string in memory). Thanks for pointing out about
using "len" as variable. It is indeed silly.

My only assumption from underlying OS/file system is that if I seek
past end of file and write some data, it doesn't generate blocks for
data in between. This is indeed true on Linux (I tested on ext3).

Thanks,
Raghu.

Aug 18 '05 #10
"dr*******@gmail.com" <dr*******@gmail.com> writes:
My goal is very simple. Have a mechanism to create sparse files and
modify them by writing arbitratry ranges of bytes at arbitrary offsets.
I did get the information I want (xrange instead of range, and a simple
way to generate 1Mb string in memory). Thanks for pointing out about
using "len" as variable. It is indeed silly.

My only assumption from underlying OS/file system is that if I seek
past end of file and write some data, it doesn't generate blocks for
data in between. This is indeed true on Linux (I tested on ext3).


This better be true for anything claiming to be Unix. The results on
systems that break this aren't pretty.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Aug 19 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Troy | last post by:
Hi- I am attempting to set up an RSS feed using PHP. It would be convenient for me to embed PHP into an xml file like I would do to an HTML file in order to create the XML, however the apache...
0
by: Hu Nan | last post by:
DB2 UDB 8.1 on Red Hat Linux 8: Just installed two Linux machines with UDB 8.1. When creating stored procedure, like "create procedure p1() begin end", got error like: SQL0035N The file...
1
by: Barkster | last post by:
I've been using Dreamweaver to create my php pages and love the functionality but when I start modifying code I start wondering if I should be something else because it always throws off the...
38
by: djhulme | last post by:
Hi, I'm using GCC. Please could you tell me, what is the maximum number of array elements that I can create in C, i.e. char* anArray = (char*) calloc( ??MAX?? , sizeof(char) ) ; I've...
4
by: deLenn | last post by:
Hi, Does scipy have an equivalent to Matlab's 'find' function, to list the indices of all nonzero elements in a sparse matrix? Cheers.
3
by: mediratta | last post by:
Hi, I want to allocate memory for a large matrix, whose size will be around 2.5 million x 17000. Three fourth of its rows will have all zeroes, but it is not known which will be those rows. If I...
5
by: adam.kleinbaum | last post by:
Hi there, I'm a novice C programmer working with a series of large (30,000 x 30,000) sparse matrices on a Linux system using the GCC compiler. To represent and store these matrices, I'd like to...
7
by: DanielJohnson | last post by:
I have a small project which has around 10 .py files and I run this project using command line arguments. I have to distribute this project to somebody. I was wondering how can I make an...
4
by: ishakteyran | last post by:
hello to all.. i have a realy tough assignment which requires me to add, substract, multiply, and get inverse of non-sparse and sparse matrixes.. in a more clear way it wants me to to the...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.