Qustion about struct.unpack

OhKyu Yoon

Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))
# do something

Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.
I would like it to run faster.
Do you have any suggestions?
Thank you very much.

OhKyu

Apr 30 '07 #1

Subscribe Post Reply

2780

Steven D'Aprano

On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:

Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))

I assume that is supposed to be infile.read()

# do something

Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.

You're reading 400 million bytes, or 400MB, in about half an hour. Whether
that's fast or slow depends on what the "do something" lines are doing.

I would like it to run faster.
Do you have any suggestions?

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
# do something
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
# do something
(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)
--
Steven D'Aprano

Apr 30 '07 #2

On Apr 30, 9:41 am, Steven D'Aprano <s...@REMOVEME.cybersource.com.au>
wrote:

On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:
Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))

I assume that is supposed to be infile.read()

# do something

Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.

You're reading 400 million bytes, or 400MB, in about half an hour. Whether
that's fast or slow depends on what the "do something" lines are doing.

I would like it to run faster.
Do you have any suggestions?

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
# do something
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
# do something

(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)

--
Steven D'Aprano

I think the file.read() already buffers reads... :)

May 1 '07 #3

Gabriel Genellina

En Tue, 01 May 2007 05:22:49 -0300, eC <er*********@gmail.comescribió:

On Apr 30, 9:41 am, Steven D'Aprano <s...@REMOVEME.cybersource.com.au>
wrote:
>On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:

I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
# do something
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
# do something

(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)

I think the file.read() already buffers reads... :)

Now we need someone to actually measure it, to confirm the expected
behavior... Done.

--- begin code ---
import struct,timeit,os

fn = r"c:\temp\delete.me"
fsize = 1000000
if not os.path.isfile(fn):
f = open(fn, "wb")
f.write("\0" * fsize)
f.close()
os.system("sync")

def smallreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', f.read(8))
tdc = struct.unpack('=LiLiLiLi', f.read(32))
f.close()
def bigreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N//1000):
buffer = f.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
f.close()

print "smallreads", timeit.Timer("smallreads(fn)","from __main__ import
fn,smallreads,fsize").repeat(3,1)
print "bigreads", timeit.Timer("bigreads(fn)", "from __main__ import
fn,bigreads,fsize").repeat(3,1)
--- end code ---

Output:
smallreads [4.2534193777646663, 4.126013885559789, 4.2389176672125458]
bigreads [1.2897319939456011, 1.3076018578892405, 1.2703250635695138]

So in this sample case, reading in big chunks is about 3 times faster than
reading many tiny pieces.

--
Gabriel Genellina

May 1 '07 #4

OhKyu Yoon

Wow, thank you all!

"Gabriel Genellina" <ga*******@yahoo.com.arwrote in message
news:op***************@furufufa-ec0e13.cpe.telecentro.com.ar...

En Tue, 01 May 2007 05:22:49 -0300, eC <er*********@gmail.comescribió:

>On Apr 30, 9:41 am, Steven D'Aprano <s...@REMOVEME.cybersource.com.au>
wrote:
>>On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:

>I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
# do something
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
# do something

(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)

I think the file.read() already buffers reads... :)

Now we need someone to actually measure it, to confirm the expected
behavior... Done.

--- begin code ---
import struct,timeit,os

fn = r"c:\temp\delete.me"
fsize = 1000000
if not os.path.isfile(fn):
f = open(fn, "wb")
f.write("\0" * fsize)
f.close()
os.system("sync")

def smallreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', f.read(8))
tdc = struct.unpack('=LiLiLiLi', f.read(32))
f.close()
def bigreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N//1000):
buffer = f.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
f.close()

print "smallreads", timeit.Timer("smallreads(fn)","from __main__ import
fn,smallreads,fsize").repeat(3,1)
print "bigreads", timeit.Timer("bigreads(fn)", "from __main__ import
fn,bigreads,fsize").repeat(3,1)
--- end code ---

Output:
smallreads [4.2534193777646663, 4.126013885559789, 4.2389176672125458]
bigreads [1.2897319939456011, 1.3076018578892405, 1.2703250635695138]

So in this sample case, reading in big chunks is about 3 times faster than
reading many tiny pieces.

--
Gabriel Genellina

May 1 '07 #5

by: Josiah Carlson | last post by:

Good day everyone, I have produced a patch against the latest CVS to add support for two new formatting characters in the struct module. It is currently an RFE, which I include a link to at the...

Python

Struggling with struct.unpack() and "p" format specifier

by: Geoffrey | last post by:

Hope someone can help. I am trying to read data from a file binary file and then unpack the data into python variables. Some of the data is store like this; xbuffer:...

Python

struct unpack newline

by: grant | last post by:

Hi All, I am pretty new to python and am having a problem intepreting binary data using struct.unpack. I am reading a file containing binary packed data using open with "rb". All the values are...

Python

What's wrong with this code ? (struct serialization to raw memoryblock)

by: Alfonso Morra | last post by:

Hi, I am at the end of my tether now - after spending several days trying to figure how to do this. I have finally written a simple "proof of concept" program to test serializing a structure...

C / C++

struct: type registration?

by: Giovanni Bajo | last post by:

Hello, given the ongoing work on struct (which I thought was a dead module), I was wondering if it would be possible to add an API to register custom parsing codes for struct. Whenever I use it...

Python

inet_aton and struct issue

by: David Bear | last post by:

I found this simple recipe for converting a dotted quad ip address to a string of a long int. struct.unpack('L',socket.inet_aton(ip)) trouble is when I use this, I get struct.error: unpack...

Python

struct unpack

by: brnstrmrs | last post by:

If I run: testValue = '\x02\x00' junk = struct.unpack('h', testValue) Everything works but If I run testValue = raw_input("Enter Binary Code..:") inputting at the console '\x02\x00' junk...

Python

struct unpack issue

by: Ping Zhao | last post by:

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I am writing a small program to decode MS bitmap image. When I use statements as follow, it works fine: header = str(struct.unpack('2s',...

Python

Cross-platform socket.getsockopt and struct.unpack (or socket timeout)?

by: Heikki Toivonen | last post by:

M2Crypto has some old code that gets and sets socket timeouts in http://svn.osafoundation.org/m2crypto/trunk/M2Crypto/SSL/Connection.py, for example: def get_socket_read_timeout(self): return...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Qustion about struct.unpack

Similar topics