469,356 Members | 2,016 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,356 developers. It's quick & easy.

writing large files quickly

rbt
I've been doing some file system benchmarking. In the process, I need to
create a large file to copy around to various drives. I'm creating the
file like this:

fd = file('large_file.bin', 'wb')
for x in xrange(409600000):
fd.write('0')
fd.close()

This takes a few minutes to do. How can I speed up the process?

Thanks!
Jan 27 '06 #1
35 23494
One way to speed this up is to write larger strings:

fd = file('large_file.bin', 'wb')
for x in xrange(51200000):
fd.write('00000000')
fd.close()

However, I bet within an hour or so you will have a much better answer
or 10. =)

Jan 27 '06 #2

rbt wrote:
I've been doing some file system benchmarking. In the process, I need to
create a large file to copy around to various drives. I'm creating the
file like this:

fd = file('large_file.bin', 'wb')
for x in xrange(409600000):
fd.write('0')
fd.close()

This takes a few minutes to do. How can I speed up the process?

Thanks!


Untested, but this should be faster.

block = '0' * 409600
fd = file('large_file.bin', 'wb')
for x in range(1000):
fd.write('0')
fd.close()

Jan 27 '06 #3
> Untested, but this should be faster.

block = '0' * 409600
fd = file('large_file.bin', 'wb')
for x in range(1000):
fd.write('0')
fd.close()


Just checking...you mean

fd.write(block)

right? :) Otherwise, you end up with just 1000 "0" characters in
your file :)

Is there anything preventing one from just doing the following?

fd.write("0" * 409600000)

It's one huge string for a very short time. It skips all the
looping and allows Python to pump the file out to the disk as
fast as the OS can handle it. (and sorta as fast as Python can
generate this humongous string)

-tkc

Jan 27 '06 #4
Tim Chase <py*********@tim.thechases.com> writes:
Is there anything preventing one from just doing the following?
fd.write("0" * 409600000)
It's one huge string for a very short time. It skips all the looping
and allows Python to pump the file out to the disk as fast as the OS
can handle it. (and sorta as fast as Python can generate this
humongous string)


That's large enough that it might exceed your PC's memory and cause
swapping. Try strings of about 64k (65536).
Jan 27 '06 #5
On 2006-01-27, rbt <rb*@athop1.ath.vt.edu> wrote:
I've been doing some file system benchmarking. In the process, I need to
create a large file to copy around to various drives. I'm creating the
file like this:

fd = file('large_file.bin', 'wb')
for x in xrange(409600000):
fd.write('0')
fd.close()

This takes a few minutes to do. How can I speed up the process?


Don't write so much data.

f = file('large_file.bin','wb')
f.seek(409600000-1)
f.write('\x00')
f.close()

That should be almost instantaneous in that the time required
for those 4 lines of code is neglgible compared to interpreter
startup and shutdown.

--
Grant Edwards grante Yow! These PRESERVES
at should be FORCE-FED to
visi.com PENTAGON OFFICIALS!!
Jan 27 '06 #6
Oops. I did mean

fd.write(block)

The only limit is available memory. I've used 1MB block sizes when I
did read/write tests. I was comparing NFS vs. local disk performance. I
know Python can do at least 100MB/sec.

Jan 27 '06 #7
>> fd.write('0')
[cut]

f = file('large_file.bin','wb')
f.seek(409600000-1)
f.write('\x00')


While a mindblowingly simple/elegant/fast solution (kudos!), the
OP's file ends up with full of the character zero (ASCII 0x30),
while your solution ends up full of the NUL character (ASCII 0x00):

tkc@oblique:~/temp$ xxd op.bin
0000000: 3030 3030 3030 3030 3030 0000000000
tkc@oblique:~/temp$ xxd new.bin
0000000: 0000 0000 0000 0000 0000 ..........

(using only length 10 instead of 400 megs to save time and disk
space...)

-tkc

Jan 27 '06 #8
On 2006-01-27, Tim Chase <py*********@tim.thechases.com> wrote:
fd.write('0')

[cut]

f = file('large_file.bin','wb')
f.seek(409600000-1)
f.write('\x00')


While a mindblowingly simple/elegant/fast solution (kudos!), the
OP's file ends up with full of the character zero (ASCII 0x30),
while your solution ends up full of the NUL character (ASCII 0x00):


Oops. I missed the fact that he was writing 0x30 and not 0x00.

Yes, the "hole" in the file will read as 0x00 bytes. If the OP
actually requires that the file contain something other than
0x00 bytes, then my solution won't work.

--
Grant Edwards grante Yow! I want the presidency
at so bad I can already taste
visi.com the hors d'oeuvres.
Jan 27 '06 #9
rbt
Grant Edwards wrote:
On 2006-01-27, Tim Chase <py*********@tim.thechases.com> wrote:
fd.write('0')


[cut]
f = file('large_file.bin','wb')
f.seek(409600000-1)
f.write('\x00')


While a mindblowingly simple/elegant/fast solution (kudos!), the
OP's file ends up with full of the character zero (ASCII 0x30),
while your solution ends up full of the NUL character (ASCII 0x00):

Oops. I missed the fact that he was writing 0x30 and not 0x00.

Yes, the "hole" in the file will read as 0x00 bytes. If the OP
actually requires that the file contain something other than
0x00 bytes, then my solution won't work.


Won't work!? It's absolutely fabulous! I just need something big, quick
and zeros work great.

How the heck does that make a 400 MB file that fast? It literally takes
a second or two while every other solution takes at least 2 - 5 minutes.
Awesome... thanks for the tip!!!

Thanks to all for the advice... one can really learn things here :)
Jan 27 '06 #10
rbt
Grant Edwards wrote:
On 2006-01-27, rbt <rb*@athop1.ath.vt.edu> wrote:

I've been doing some file system benchmarking. In the process, I need to
create a large file to copy around to various drives. I'm creating the
file like this:

fd = file('large_file.bin', 'wb')
for x in xrange(409600000):
fd.write('0')
fd.close()

This takes a few minutes to do. How can I speed up the process?

Don't write so much data.

f = file('large_file.bin','wb')
f.seek(409600000-1)
f.write('\x00')
f.close()


OK, I'm still trying to pick my jaw up off of the floor. One question...
how big of a file could this method create? 20GB, 30GB, limit depends
on filesystem, etc?
That should be almost instantaneous in that the time required
for those 4 lines of code is neglgible compared to interpreter
startup and shutdown.

Jan 27 '06 #11
In article <dr**********@solaris.cc.vt.edu>,
rbt <rb*@athop1.ath.vt.edu> wrote:
Won't work!? It's absolutely fabulous! I just need something big, quick
and zeros work great.

How the heck does that make a 400 MB file that fast? It literally takes
a second or two while every other solution takes at least 2 - 5 minutes.
Awesome... thanks for the tip!!!


Because it isn't really writing the zeros. You can make these
files all day long and not run out of disk space, because this
kind of file doesn't take very many blocks. The blocks that
were never written are virtual blocks, inasmuch as read() at
that location will cause the filesystem to return a block of NULs.

Donn Cave, do**@u.washington.edu
Jan 27 '06 #12
rbt
Donn Cave wrote:
In article <dr**********@solaris.cc.vt.edu>,
rbt <rb*@athop1.ath.vt.edu> wrote:
Won't work!? It's absolutely fabulous! I just need something big, quick
and zeros work great.

How the heck does that make a 400 MB file that fast? It literally takes
a second or two while every other solution takes at least 2 - 5 minutes.
Awesome... thanks for the tip!!!

Because it isn't really writing the zeros. You can make these
files all day long and not run out of disk space, because this
kind of file doesn't take very many blocks.


Hmmm... when I copy the file to a different drive, it takes up
409,600,000 bytes. Also, an md5 checksum on the generated file and on
copies placed on other drives are the same. It looks like a regular, big
file... I don't get it.

The blocks that
were never written are virtual blocks, inasmuch as read() at
that location will cause the filesystem to return a block of NULs.

Donn Cave, do**@u.washington.edu

Jan 27 '06 #13
rbt wrote:
Hmmm... when I copy the file to a different drive, it takes up
409,600,000 bytes. Also, an md5 checksum on the generated file and on
copies placed on other drives are the same. It looks like a regular, big
file... I don't get it.


google("sparse files")

--
Robert Kern
ro*********@gmail.com

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Jan 27 '06 #14
On 2006-01-27, rbt <rb*@athop1.ath.vt.edu> wrote:
> fd.write('0')

[cut]

f = file('large_file.bin','wb')
f.seek(409600000-1)
f.write('\x00')

While a mindblowingly simple/elegant/fast solution (kudos!), the
OP's file ends up with full of the character zero (ASCII 0x30),
while your solution ends up full of the NUL character (ASCII 0x00):
Oops. I missed the fact that he was writing 0x30 and not 0x00.

Yes, the "hole" in the file will read as 0x00 bytes. If the OP
actually requires that the file contain something other than
0x00 bytes, then my solution won't work.


Won't work!? It's absolutely fabulous! I just need something big, quick
and zeros work great.


Then Bob's your uncle, eh?
How the heck does that make a 400 MB file that fast?


Most of the file isn't really there, it's just a big "hole" in
a sparse array containing a single allocation block that
contains the single '0x00' byte that was written:

$ ls -l large_file.bin
-rw-r--r-- 1 grante users 409600000 Jan 27 15:02 large_file.bin
$ du -h large_file.bin
12K large_file.bin

The filesystem code in the OS is written so that it returns
'0x00' bytes when you attempt to read data from the "hole" in
the file. So, if you open the file and start reading, you'll
get 400MB of 0x00 bytes before you get an EOF return. But the
file really only takes up a couple "chunks" of disk space, and
chunks are usually on the order of 4KB.

--
Grant Edwards grante Yow! Is this an out-take
at from the "BRADY BUNCH"?
visi.com
Jan 27 '06 #15
On 2006-01-27, rbt <rb*@athop1.ath.vt.edu> wrote:
Hmmm... when I copy the file to a different drive, it takes up
409,600,000 bytes. Also, an md5 checksum on the generated file and on
copies placed on other drives are the same. It looks like a regular, big
file... I don't get it.


Because the filesystem code keeps track of where you are in
that 400MB stream, and returns 0x00 anytime you're reading from
a "hole". The "cp" program and the "md5sum" just open the file
and start read()ing. The filesystem code returns 0x00 bytes
for all of the read positions that are in the "hole", just like
Don said:
The blocks that were never written are virtual blocks,
inasmuch as read() at that location will cause the filesystem
to return a block of NULs.


--
Grant Edwards grante Yow! They
at collapsed... like nuns
visi.com in the street... they had
no teenappeal!
Jan 27 '06 #16
Grant Edwards wrote:
Because the filesystem code keeps track of where you are in
that 400MB stream, and returns 0x00 anytime you're reading from
a "hole". The "cp" program and the "md5sum" just open the file
and start read()ing. The filesystem code returns 0x00 bytes
for all of the read positions that are in the "hole", just like
Don said:


And, this file is of course useless for FS benchmarking, since you're
barely reading data from disk at all. You'll just be testing the FS's
handling of sparse files. I suggest you go for one of the suggestions
with larger block sizes. That's probably your best bet.

Regards,

Erik Brandstadmoen
Jan 27 '06 #17
On 2006-01-27, rbt <rb*@athop1.ath.vt.edu> wrote:
OK, I'm still trying to pick my jaw up off of the floor. One
question... how big of a file could this method create? 20GB,
30GB, limit depends on filesystem, etc?


Right. Back in the day, the old libc and ext2 code had a 2GB
file size limit at one point (it used an signed 32 bit value
to keep track of file size/position). That was back when 1GB
drive was something to brag about, so it wasn't a big deal for
most people.

I think everthing has large file support is enabled by default
now, so the limit is 2^63 for most "modern" filesystems --
that's the limit of the file size you can create using the
seek() trick. The limit for actual on-disk bytes may not be
that large.

Here's a good link:

http://www.suse.de/~aj/linux_lfs.html

--
Grant Edwards grante Yow! I'm GLAD I
at remembered to XEROX all
visi.com my UNDERSHIRTS!!
Jan 27 '06 #18
rbt
Grant Edwards wrote:
On 2006-01-27, rbt <rb*@athop1.ath.vt.edu> wrote:

Hmmm... when I copy the file to a different drive, it takes up
409,600,000 bytes. Also, an md5 checksum on the generated file and on
copies placed on other drives are the same. It looks like a regular, big
file... I don't get it.

Because the filesystem code keeps track of where you are in
that 400MB stream, and returns 0x00 anytime you're reading from
a "hole". The "cp" program and the "md5sum" just open the file
and start read()ing. The filesystem code returns 0x00 bytes
for all of the read positions that are in the "hole", just like
Don said:


OK I finally get it. It's too good to be true :)

I'm going back to using _real_ files... files that don't look as if they
are there but aren't. BTW, the file 'size' and 'size on disk' were
identical on win 2003. That's a bit deceptive. According to the NTFS
docs, they should be drastically different... 'size on disk' should be
like 64K or something.

The blocks that were never written are virtual blocks,
inasmuch as read() at that location will cause the filesystem
to return a block of NULs.


Jan 27 '06 #19
On 2006-01-27, Erik Andreas Brandstadmoen <er**@brandstadmoen.net> wrote:
Grant Edwards wrote:
Because the filesystem code keeps track of where you are in
that 400MB stream, and returns 0x00 anytime you're reading from
a "hole". The "cp" program and the "md5sum" just open the file
and start read()ing. The filesystem code returns 0x00 bytes
for all of the read positions that are in the "hole", just like
Don said:
And, this file is of course useless for FS benchmarking, since
you're barely reading data from disk at all.


Quite right. Copying such a sparse file is probably only
really testing the write performance of the filesystem
containing the destination file.
You'll just be testing the FS's handling of sparse files.
Which may be a useful thing to know, but I rather doubt it.
I suggest you go for one of the suggestions with larger block
sizes. That's probably your best bet.


--
Grant Edwards grante Yow! Now I'm concentrating
at on a specific tank battle
visi.com toward the end of World
War II!
Jan 27 '06 #20
On 2006-01-27, rbt <rb*@athop1.ath.vt.edu> wrote:
OK I finally get it. It's too good to be true :)
Sorry about that. I should have paid closer attention to what
you were going to do with the file.
I'm going back to using _real_ files... files that don't look
as if they are there but aren't. BTW, the file 'size' and
'size on disk' were identical on win 2003. That's a bit
deceptive.
What?! Windows lying to the user? I don't believe it!
According to the NTFS docs, they should be drastically
different... 'size on disk' should be like 64K or something.


Probably.

--
Grant Edwards grante Yow! Where's th' DAFFY
at DUCK EXHIBIT??
visi.com
Jan 27 '06 #21
On Fri, 27 Jan 2006 12:30:49 -0800, Donn Cave wrote:
In article <dr**********@solaris.cc.vt.edu>,
rbt <rb*@athop1.ath.vt.edu> wrote:
Won't work!? It's absolutely fabulous! I just need something big, quick
and zeros work great.

How the heck does that make a 400 MB file that fast? It literally takes
a second or two while every other solution takes at least 2 - 5 minutes.
Awesome... thanks for the tip!!!


Because it isn't really writing the zeros. You can make these
files all day long and not run out of disk space, because this
kind of file doesn't take very many blocks. The blocks that
were never written are virtual blocks, inasmuch as read() at
that location will cause the filesystem to return a block of NULs.

Isn't this a file system specific solution though? Won't your file system
need to have support for "sparse files", or else it won't work?
Here is another possible solution, if you are running Linux, farm the real
work out to some C code optimised for writing blocks to the disk:

# untested and, it goes without saying, untimed
os.system("dd if=/dev/zero of=largefile.bin bs=64K count=16384")
That should make a 4GB file as fast as possible. If you have lots and lots
of memory, you could try upping the block size (bs=...).

--
Steven.

Jan 27 '06 #22
Steven D'Aprano wrote:
Isn't this a file system specific solution though? Won't your file system
need to have support for "sparse files", or else it won't work?


Yes, but AFAIK the only "modern" (meaning: in wide use today) file
system that doesn't have this support is FAT/FAT32.
Jan 27 '06 #23
On 2006-01-27, Steven D'Aprano <st***@REMOVETHIScyber.com.au> wrote:
Because it isn't really writing the zeros. You can make these
files all day long and not run out of disk space, because this
kind of file doesn't take very many blocks. The blocks that
were never written are virtual blocks, inasmuch as read() at
that location will cause the filesystem to return a block of NULs.
Isn't this a file system specific solution though? Won't your file system
need to have support for "sparse files", or else it won't work?


If your fs doesn't support sparse files, then you'll end up with a
file that really does have 400MB of 0x00 bytes in it. Which is
what the OP really needed in the first place.
Here is another possible solution, if you are running Linux, farm the real
work out to some C code optimised for writing blocks to the disk:

# untested and, it goes without saying, untimed
os.system("dd if=/dev/zero of=largefile.bin bs=64K count=16384")

That should make a 4GB file as fast as possible. If you have lots and lots
of memory, you could try upping the block size (bs=...).


I agree. that probably is the optimal solution for Unix boxes.
I messed around with something like that once, and block sizes
bigger than 64k didn't make much difference.

--
Grant Edwards grante Yow! As President I
at have to go vacuum my coin
visi.com collection!
Jan 27 '06 #24
Donn wrote:
How the heck does that make a 400 MB file that fast? It literally takes
a second or two while every other solution takes at least 2 - 5 minutes.
Awesome... thanks for the tip!!!
Because it isn't really writing the zeros. You can make these
files all day long and not run out of disk space, because this
kind of file doesn't take very many blocks. The blocks that
were never written are virtual blocks, inasmuch as read() at
that location will cause the filesystem to return a block of NULs.


Under which operating system/file system?

As far as I know this should be file system dependent at least under
Linux, as the calls to open and seek are served by the file system driver.

Jens

Jan 28 '06 #25
Ivan wrote:
Steven D'Aprano wrote:
Isn't this a file system specific solution though? Won't your file system
need to have support for "sparse files", or else it won't work?

Yes, but AFAIK the only "modern" (meaning: in wide use today) file
system that doesn't have this support is FAT/FAT32.


I don't think ext2fs does this either. At least the du and df commands
tell something different.

Actually I'm not sure what this optimisation should give you anyway. The
only circumstance under which files with only zeroes are meaningful is
testing, and that's exactly when you don't want that optimisation.

On compressing filesystems such as ntfs you will get this behaviour as a
special case of compression and compression makes more sense.

Jens

Jan 28 '06 #26
Donn wrote:
Because it isn't really writing the zeros. You can make these
files all day long and not run out of disk space, because this
kind of file doesn't take very many blocks. The blocks that
were never written are virtual blocks, inasmuch as read() at
that location will cause the filesystem to return a block of NULs.


Are you sure that's not just a case of asynchronous writing that can be
done in a particularly efficient way? df quite clearly tells me that I'm
running out of disk space on my ext2fs linux when I dump it full of
zeroes.

Jens
Jan 28 '06 #27
Jens Theisen wrote:
Ivan wrote:
Yes, but AFAIK the only "modern" (meaning: in wide use today) file
system that doesn't have this support is FAT/FAT32.


I don't think ext2fs does this either. At least the du and df commands
tell something different.


ext2 is a reimplementation of BSD UFS, so it does. Here:

f = file('bigfile', 'w')
f.seek(1024*1024)
f.write('a')

$ l afile
-rw-r--r-- 1 ivoras wheel 1048577 Jan 28 14:57 afile
$ du afile
8 afile
Actually I'm not sure what this optimisation should give you anyway. The
only circumstance under which files with only zeroes are meaningful is
testing, and that's exactly when you don't want that optimisation.


I read somewhere that it has a use in database software, but the only
thing I can imagine for this is when using heap queues
(http://python.active-venture.com/lib/node162.html).
Jan 28 '06 #28
Ivan wrote:
ext2 is a reimplementation of BSD UFS, so it does. Here: f = file('bigfile', 'w')
f.seek(1024*1024)
f.write('a') $ l afile
-rw-r--r-- 1 ivoras wheel 1048577 Jan 28 14:57 afile
$ du afile
8 afile
Interesting:

cp bigfile bigfile2

cat bigfile > bigfile3

du bigfile*
8 bigfile2
1032 bigfile3

So it's not consumings 0's. It's just doesn't store unwritten data. And I
can think of an application for that: An application might want to write
the biginning of a file at a later point, so this makes it more efficient.

I wonder how other file systems behave.
I read somewhere that it has a use in database software, but the only
thing I can imagine for this is when using heap queues
(http://python.active-venture.com/lib/node162.html).


That's an article about the heap efficient data structure. Was it your
intention to link this?

Jens
Jan 28 '06 #29
Jens Theisen wrote:
Ivan wrote:
I read somewhere that it has a use in database software, but the only
thing I can imagine for this is when using heap queues
(http://python.active-venture.com/lib/node162.html).


I've used this feature eons ago where the file was essentially a single
large address space (memory mapped array) that was expected to never
fill all that full. I was tracking data from a year of (thousands? of)
students seeing Drill-and-practice questions from a huge database of
questions. The research criticism we got was that our analysis did not
rule out any kid seeing the same question more than once, and getting
"practice" that would improve performance w/o learning. I built a
bit-filter and copied tapes dropping any repeats seen by students.
We then just ran the same analysis we had on the raw data, and found
no significant difference.

The nice thing is that file size grew over time, so (for a while) I
could run on the machine with other users. By the last block of
tapes I was sitting alone in the machine room at 3:00 AM on Sat mornings
afraid to so much as fire up an editor.

--
-Scott David Daniels
sc***********@acm.org
Jan 28 '06 #30
Jens Theisen wrote:
cp bigfile bigfile2

cat bigfile > bigfile3

du bigfile*
8 bigfile2
1032 bigfile3

So it's not consumings 0's. It's just doesn't store unwritten data. And I


Very possibly cp "understands" sparse file and cat (doint what it's
meant to do) doesn't :)

I read somewhere that it has a use in database software, but the only
thing I can imagine for this is when using heap queues
(http://python.active-venture.com/lib/node162.html).


That's an article about the heap efficient data structure. Was it your
intention to link this?


Yes. The idea is that in implementing such a structure, in which each
level is 2^x (x="level" of the structure, and it's depentent on the
number of entries the structure must hold) wide, most of blocks could
exist and never be written to (i.e. they'd be "empty"). Using sparse
files would save space :)

(It has nothing to do with python; I remembered the article so I linked
to it; The sparse-file issue is useful only when implementing heaps
directly on file or in mmaped file).
Jan 28 '06 #31
[Jens Theisen]
...
Actually I'm not sure what this optimisation should give you anyway. The
only circumstance under which files with only zeroes are meaningful is
testing, and that's exactly when you don't want that optimisation.


In most cases, a guarantee that reading "uninitialized" file data will
return zeroes is a security promise, not an optimization. C doesn't
require this behavior, but POSIX does.

On FAT/FAT32, if you create a file, seek to a "large" offset, write a
byte, then read the uninitialized data from offset 0 up to the byte
just written, you get back whatever happened to be sitting on disk at
the locations now reserved for the file. That can include passwords,
other peoples' email, etc -- anything whatsoever that may have been
written to disk at some time in the disk's history. Security weenies
get upset at stuff like that ;-)
Jan 28 '06 #32
On Fri, 27 Jan 2006 12:30:49 -0800, Donn Cave <do**@u.washington.edu> wrote:
In article <dr**********@solaris.cc.vt.edu>,
rbt <rb*@athop1.ath.vt.edu> wrote:
Won't work!? It's absolutely fabulous! I just need something big, quick
and zeros work great.

How the heck does that make a 400 MB file that fast? It literally takes
a second or two while every other solution takes at least 2 - 5 minutes.
Awesome... thanks for the tip!!!


Because it isn't really writing the zeros. You can make these
files all day long and not run out of disk space, because this
kind of file doesn't take very many blocks. The blocks that
were never written are virtual blocks, inasmuch as read() at
that location will cause the filesystem to return a block of NULs.

I wonder if it will also "write" virtual blocks when it gets real
zero blocks to write from a user, or even with file system copy utils?

Regards,
Bengt Richter
Jan 28 '06 #33
On 2006-01-28, Bengt Richter <bo**@oz.net> wrote:
Because it isn't really writing the zeros. You can make these
files all day long and not run out of disk space, because this
kind of file doesn't take very many blocks. The blocks that
were never written are virtual blocks, inasmuch as read() at
that location will cause the filesystem to return a block of NULs.


I wonder if it will also "write" virtual blocks when it gets real
zero blocks to write from a user, or even with file system copy utils?


No, in my experience none of the Linux filesystems do that.
It's easy enough to test:

$ dd if=/dev/zero of=zeros bs=64k count=1024
1024+0 records in
1024+0 records out
$ ls -l zeros
-rw-r--r-- 1 grante users 67108864 Jan 28 14:49 zeros
$ du -h zeros
65M zeros
In my book that's 64MB not 65MB, but that's an argument for
another day.

--
Grant Edwards grante Yow! It's the land of
at DONNY AND MARIE as promised
visi.com in TV GUIDE!
Jan 28 '06 #34
Bengt Richter wrote:
How the heck does that make a 400 MB file that fast? It literally takes
a second or two while every other solution takes at least 2 - 5 minutes.
Awesome... thanks for the tip!!!


Because it isn't really writing the zeros. You can make these
files all day long and not run out of disk space, because this
kind of file doesn't take very many blocks. The blocks that
were never written are virtual blocks, inasmuch as read() at
that location will cause the filesystem to return a block of NULs.

I wonder if it will also "write" virtual blocks when it gets real
zero blocks to write from a user, or even with file system copy utils?


I've seen this behaviour on "big iron" Unix systems, in a benchmark that
repeatedly copied data from a memory mapped section to an output file.

but for the general case, I doubt that adding "is this block all zeros" or
"does this block match something we recently wrote to disk" checks will
speed things up, on average...

</F>

Jan 29 '06 #35
Grant Edwards <gr****@visi.com> writes:
$ dd if=/dev/zero of=zeros bs=64k count=1024
1024+0 records in
1024+0 records out
$ ls -l zeros
-rw-r--r-- 1 grante users 67108864 Jan 28 14:49 zeros
$ du -h zeros
65M zeros In my book that's 64MB not 65MB, but that's an argument for
another day.


You should be aware that the size that 'du' and 'ls -s' reports,
include any indirect blocks needed to keep track of the data
blocks of the file. Thus, you get the amount of space that the
file actually uses in the file system, and would become free if
you removed it. That's why it is larger than 64 Mbyte. And 'du'
(at least GNU du) rounds upwards when you use -h.

Try for instance:

$ dd if=/dev/zero of=zeros bs=4k count=16367
16367+0 records in
16367+0 records out
$ ls -ls zeros
65536 -rw-rw-r-- 1 bellman bellman 67039232 Jan 29 13:57 zeros
$ du -h zeros
64M zeros

$ dd if=/dev/zero of=zeros bs=4k count=16368
16368+0 records in
16368+0 records out
$ ls -ls zeros
65540 -rw-rw-r-- 1 bellman bellman 67043328 Jan 29 13:58 zeros
$ du -h zeros
65M zeros

(You can infer from the above that my file system has a block
size of 4*Kbyte.)
--
Thomas Bellman, Lysator Computer Club, Linköping University, Sweden
"There are many causes worth dying for, but ! bellman @ lysator.liu.se
none worth killing for." -- Gandhi ! Make Love -- Nicht Wahr!
Jan 29 '06 #36

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Buddy Ackerman | last post: by
reply views Thread by jeremyz | last post: by
3 posts views Thread by Eric Twietmeyer | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.