473,799 Members | 2,837 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

creating/modifying sparse files on linux


Hi,

Is there any special support for sparse file handling in python? My
initial search didn't bring up much (not a thorough search). I wrote
the following pice of code:

options.size = 6442450944
options.ranges = ["4096,1024","30 000,314572800"]
fd = open("testfile" , "w")
fd.seek(options .size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.spli t(",")[0])
len = int(drange.spli t(",")[1])
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
fd.write("a")

fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here? Is there a better way to create/modify sparse files?

Thanks,
Raghu.

Aug 17 '05 #1
10 5684
[dr*******@gmail .com wrote]

Hi,

Is there any special support for sparse file handling in python? My
initial search didn't bring up much (not a thorough search). I wrote
the following pice of code:

options.size = 6442450944
options.ranges = ["4096,1024","30 000,314572800"]
fd = open("testfile" , "w")
fd.seek(options .size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.spli t(",")[0])
len = int(drange.spli t(",")[1])
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
fd.write("a")

fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here? Is there a better way to create/modify sparse files?


test_largefile. py in the Python test suite does this kind of thing and
doesn't take very long for me to run on Linux (SuSE 9.0 box).

Trent

--
Trent Mick
Tr****@ActiveSt ate.com
Aug 17 '05 #2
In <11************ **********@f14g 2000cwb.googleg roups.com>,
dr*******@gmail .com wrote:
options.size = 6442450944
options.ranges = ["4096,1024","30 000,314572800"]
fd = open("testfile" , "w")
fd.seek(options .size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.spli t(",")[0])
len = int(drange.spli t(",")[1])
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
fd.write("a")

fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here? Is there a better way to create/modify sparse files?


`range(len)` creates a list of size `len` *in memory* so you are trying to
build a list with 314,572,800 numbers. That seems to eat up all your RAM
and causes the swapping.

You can use `xrange(len)` instead which uses a constant amount of memory.
But be prepared to wait some time because now you are writing 314,572,800
characters *one by one* into the file. It would be faster to write larger
strings in each step.

Ciao,
Marc 'BlackJack' Rintsch
Aug 17 '05 #3

<dr*******@gmai l.com> wrote in message
news:11******** **************@ f14g2000cwb.goo glegroups.com.. .
Is there any special support for sparse file handling in python?
Since I have not heard of such in several years, I suspect not. CPython,
normally compiled, uses the standard C stdio lib. If your system+C has a
sparseIO lib, you would probably have to compile specially to use it.
options.size = 6442450944
options.ranges = ["4096,1024","30 000,314572800"]
options.ranges = [(4096,1024),(30 000,314572800)] # makes below nicer
fd = open("testfile" , "w")
fd.seek(options .size-1)
fd.write("a")
for drange in options.ranges:
off = int(drange.spli t(",")[0])
len = int(drange.spli t(",")[1])
off,len = map(int, drange.split(", ")) # or
off,len = [int(s) for s in drange.split(", ")] # or for tuples as suggested
above
off,len = drange
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
If I read the above right, the 2nd len is 300,000,000+ making the space
needed for the range list a few gigabytes. I suspect this is where you
started thrashing ;-). Instead:

for x in xrange(len): # this is what xrange is for ;-)
fd.write("a")
Without indent, this is syntax error, so if your code ran at all, this
cannot be an exact copy. Even with xrange fix, 300,000,000 writes will be
slow. I would expect that an real application should create or accumulate
chunks larger than single chars.
fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here?
See above
Is there a better way to create/modify sparse files?


Unless you can access builting facilities, create your own mapping index.

Terry J. Reedy

Aug 17 '05 #4

Thanks for the info on xrange. Writing single char is just to get going
quickly. I knew that I would have to improve on that. I would like to
write chunks of 1MB which would require that I have 1MB string to
write. Is there any simple way of generating this 1MB string (other
than keep appending to a string until it reaches 1MB len)? I don't care
about the actual value of the string itself.

Thanks,
Raghu.

Aug 17 '05 #5

<dr*******@gmai l.com> wrote in message
news:11******** **************@ z14g2000cwz.goo glegroups.com.. .

Thanks for the info on xrange. Writing single char is just to get going
quickly. I knew that I would have to improve on that. I would like to
write chunks of 1MB which would require that I have 1MB string to
write. Is there any simple way of generating this 1MB string
megastring = 1000000*'a' # t < 1 sec on my machine
(other than keep appending to a string until it reaches 1MB len)?


You mean like (unexecuted)
s = ''
for i in xrange(1000000) : s += 'a' #?

This will allocate, copy, and deallocate 1000000 successively longer
temporary strings and is a noticeable O(n**2) operation. Since strings are
immutable, you cannot 'append' to them the way you can to lists.

Terry J. Reedy

Aug 17 '05 #6
[dr*******@gmail .com]
Is there any simple way of generating this 1MB string (other than keep
appending to a string until it reaches 1MB len)?


You might of course use 'x' * 1000000 for fairly quickly generating a
single string holding one million `x'.

Yet, your idea of generating a sparse file is interesting. I never
tried it with Python, but would not see why Python would not allow
it. Did someone ever played with sparse files in Python? (One problem
with sparse files is that it is next to impossible for a normal user to
create an exact copy. There is no fast way to read read them either.)

--
François Pinard http://pinard.progiciels-bpi.ca
Aug 17 '05 #7
On 17 Aug 2005 11:53:39 -0700, "dr*******@gmai l.com" <dr*******@gmai l.com> wrote:

Hi,

Is there any special support for sparse file handling in python? My
initial search didn't bring up much (not a thorough search). I wrote
the following pice of code:

options.size = 6442450944
options.rang es = ["4096,1024","30 000,314572800"]
fd = open("testfile" , "w")
fd.seek(option s.size-1)
fd.write("a" )
for drange in options.ranges:
off = int(drange.spli t(",")[0])
len = int(drange.spli t(",")[1])
print "off =", off, " len =", len
fd.seek(off)
for x in range(len):
fd.write("a")

fd.close()

This piece of code takes very long time and in fact I had to kill it as
the linux system started doing lot of swapping. Am I doing something
wrong here? Is there a better way to create/modify sparse files?

Thanks

I'm unclear as to what your goal is. Do you just need an object that provides
an interface like a file object, but internally is more efficient than an
a normal file object when you access it as above[1], or do you need to create
a real file and record all the bytes in full (with what default for gaps?)
on disk, so that it can be opened by another program and read as an ordinary file?

Some operating system file systems may have some support for virtual zero-block runs
and lazy allocation/representation of non-zero blocks in files. It's easy to imagine
the rudiments, but I don't know of such a file system, not having looked ;-)

You could write your own "sparse-file"-representation object, and maybe use pickle
for persistence. Or maybe you could use zipfiles. The kind of data you are creating above
would probably compress really well ;-)

[1] writing 314+ million identical bytes one by one is silly, of course ;-)
BTW, len is a built-in function, and using built-in names for variables
is frowned upon as a bug-prone practice.

Regards,
Bengt Richter
Aug 18 '05 #8
Terry Reedy wrote:
megastring = 1000000*'a' # t < 1 sec on my machine

(other than keep appending to a string until it reaches 1MB len)?


You mean like (unexecuted)
s = ''
for i in xrange(1000000) : s += 'a' #?

This will allocate, copy, and deallocate 1000000 successively longer
temporary strings and is a noticeable O(n**2) operation.


Not exactly. CPython 2.4 added an optimization of "+=" for strings.
The for loop above takes about 1 second do execute on my machine. You
are correct in that it will take *much* longer on 2.3.
--
Benji York

Aug 18 '05 #9

My goal is very simple. Have a mechanism to create sparse files and
modify them by writing arbitratry ranges of bytes at arbitrary offsets.
I did get the information I want (xrange instead of range, and a simple
way to generate 1Mb string in memory). Thanks for pointing out about
using "len" as variable. It is indeed silly.

My only assumption from underlying OS/file system is that if I seek
past end of file and write some data, it doesn't generate blocks for
data in between. This is indeed true on Linux (I tested on ext3).

Thanks,
Raghu.

Aug 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
7379
by: Troy | last post by:
Hi- I am attempting to set up an RSS feed using PHP. It would be convenient for me to embed PHP into an xml file like I would do to an HTML file in order to create the XML, however the apache server does not realize that it's PHP content and the PHP engine is not run against that code. so I have to call it feed.php instead of feed.xml, and most of the rss readers don't appreciate this. My alternative is to create the XML file every...
0
1899
by: Hu Nan | last post by:
DB2 UDB 8.1 on Red Hat Linux 8: Just installed two Linux machines with UDB 8.1. When creating stored procedure, like "create procedure p1() begin end", got error like: SQL0035N The file "P4265487.msg" cannot be opened. Checked /home/db2inst1/sqllib/function/routine, one machine has no sqlproc subfolder there; another has sqlproc/TEST/DB2INST1/tmp, but only Pxxxxxxx.sqc files, couldn't find Pxxxxxxx.msg files.
1
1853
by: Barkster | last post by:
I've been using Dreamweaver to create my php pages and love the functionality but when I start modifying code I start wondering if I should be something else because it always throws off the extensions. I like the fact that dreamweaver will insert some of the tedious code for me but are there any php GUI's that are similar to dreamweaver? I've downloaded a few paid scripts and see the use of *.tpl files and wonder how they design that and...
38
3016
by: djhulme | last post by:
Hi, I'm using GCC. Please could you tell me, what is the maximum number of array elements that I can create in C, i.e. char* anArray = (char*) calloc( ??MAX?? , sizeof(char) ) ; I've managed to create arrays using DOUBLE data types, but when I try to access the array, the compiler complains that the number is not an INT, i.e.
4
7649
by: deLenn | last post by:
Hi, Does scipy have an equivalent to Matlab's 'find' function, to list the indices of all nonzero elements in a sparse matrix? Cheers.
3
5846
by: mediratta | last post by:
Hi, I want to allocate memory for a large matrix, whose size will be around 2.5 million x 17000. Three fourth of its rows will have all zeroes, but it is not known which will be those rows. If I try to allocate memory for this huge array, then I get a segmentation fault saying: Program received signal SIGSEGV, Segmentation fault. 0xb7dd5226 in mallopt () from /lib/tls/i686/cmov/libc.so.6
5
9716
by: adam.kleinbaum | last post by:
Hi there, I'm a novice C programmer working with a series of large (30,000 x 30,000) sparse matrices on a Linux system using the GCC compiler. To represent and store these matrices, I'd like to implement the sparse matrices as a doubly-linked list, in which each non-zero cell is stored roughly as follows: int rownum int colnum
7
3582
by: DanielJohnson | last post by:
I have a small project which has around 10 .py files and I run this project using command line arguments. I have to distribute this project to somebody. I was wondering how can I make an executable or some kind of installer, so that end user doesn't need to compile and worry if he/ she has Python installed or not ? Every help is greatly appreciated.
4
2073
by: ishakteyran | last post by:
hello to all.. i have a realy tough assignment which requires me to add, substract, multiply, and get inverse of non-sparse and sparse matrixes.. in a more clear way it wants me to to the operations listed above between two sparse, or non-sparse or a sparse and a non-sparse matrix.. for the operations an the matrixes of same kind, say sparse matrix, it seems rather easy .. but what makes me cobfuse is how to operate a sparse with a...
0
9687
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
1
10225
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10027
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9072
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7564
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6805
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5463
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3759
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2938
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.