By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,089 Members | 2,359 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,089 IT Pros & Developers. It's quick & easy.

Qustion about struct.unpack

P: n/a
Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))
# do something

Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.
I would like it to run faster.
Do you have any suggestions?
Thank you very much.

OhKyu

Apr 30 '07 #1
Share this Question
Share on Google+
4 Replies


P: n/a
On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:
Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))
I assume that is supposed to be infile.read()

# do something

Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.
You're reading 400 million bytes, or 400MB, in about half an hour. Whether
that's fast or slow depends on what the "do something" lines are doing.

I would like it to run faster.
Do you have any suggestions?
Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
# do something
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
# do something
(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)
--
Steven D'Aprano

Apr 30 '07 #2

P: n/a
eC
On Apr 30, 9:41 am, Steven D'Aprano <s...@REMOVEME.cybersource.com.au>
wrote:
On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:
Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:
for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))

I assume that is supposed to be infile.read()
# do something
Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.

You're reading 400 million bytes, or 400MB, in about half an hour. Whether
that's fast or slow depends on what the "do something" lines are doing.
I would like it to run faster.
Do you have any suggestions?

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
# do something
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
# do something

(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)

--
Steven D'Aprano
I think the file.read() already buffers reads... :)

May 1 '07 #3

P: n/a
En Tue, 01 May 2007 05:22:49 -0300, eC <er*********@gmail.comescribió:
On Apr 30, 9:41 am, Steven D'Aprano <s...@REMOVEME.cybersource.com.au>
wrote:
>On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:
I have a really long binary file that I want to read.
The way I am doing it now is:
for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
# do something
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
# do something

(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)

I think the file.read() already buffers reads... :)
Now we need someone to actually measure it, to confirm the expected
behavior... Done.

--- begin code ---
import struct,timeit,os

fn = r"c:\temp\delete.me"
fsize = 1000000
if not os.path.isfile(fn):
f = open(fn, "wb")
f.write("\0" * fsize)
f.close()
os.system("sync")

def smallreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', f.read(8))
tdc = struct.unpack('=LiLiLiLi', f.read(32))
f.close()
def bigreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N//1000):
buffer = f.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
f.close()

print "smallreads", timeit.Timer("smallreads(fn)","from __main__ import
fn,smallreads,fsize").repeat(3,1)
print "bigreads", timeit.Timer("bigreads(fn)", "from __main__ import
fn,bigreads,fsize").repeat(3,1)
--- end code ---

Output:
smallreads [4.2534193777646663, 4.126013885559789, 4.2389176672125458]
bigreads [1.2897319939456011, 1.3076018578892405, 1.2703250635695138]

So in this sample case, reading in big chunks is about 3 times faster than
reading many tiny pieces.

--
Gabriel Genellina
May 1 '07 #4

P: n/a
Wow, thank you all!

"Gabriel Genellina" <ga*******@yahoo.com.arwrote in message
news:op***************@furufufa-ec0e13.cpe.telecentro.com.ar...
En Tue, 01 May 2007 05:22:49 -0300, eC <er*********@gmail.comescribió:
>On Apr 30, 9:41 am, Steven D'Aprano <s...@REMOVEME.cybersource.com.au>
wrote:
>>On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:
>I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
# do something
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
# do something

(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)

I think the file.read() already buffers reads... :)

Now we need someone to actually measure it, to confirm the expected
behavior... Done.

--- begin code ---
import struct,timeit,os

fn = r"c:\temp\delete.me"
fsize = 1000000
if not os.path.isfile(fn):
f = open(fn, "wb")
f.write("\0" * fsize)
f.close()
os.system("sync")

def smallreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', f.read(8))
tdc = struct.unpack('=LiLiLiLi', f.read(32))
f.close()
def bigreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N//1000):
buffer = f.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
f.close()

print "smallreads", timeit.Timer("smallreads(fn)","from __main__ import
fn,smallreads,fsize").repeat(3,1)
print "bigreads", timeit.Timer("bigreads(fn)", "from __main__ import
fn,bigreads,fsize").repeat(3,1)
--- end code ---

Output:
smallreads [4.2534193777646663, 4.126013885559789, 4.2389176672125458]
bigreads [1.2897319939456011, 1.3076018578892405, 1.2703250635695138]

So in this sample case, reading in big chunks is about 3 times faster than
reading many tiny pieces.

--
Gabriel Genellina
May 1 '07 #5

This discussion thread is closed

Replies have been disabled for this discussion.