Qustion about struct.unpack

OhKyu Yoon

Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack(' =HHHH', infile.read(8))
# do something
tdc = struct.unpack(' =LiLiLiLi',self .lmf.read(32))
# do something

Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.
I would like it to run faster.
Do you have any suggestions?
Thank you very much.

OhKyu

Apr 30 '07 #1

Subscribe Reply

2788

Steven D'Aprano

On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:

Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack(' =HHHH', infile.read(8))
# do something
tdc = struct.unpack(' =LiLiLiLi',self .lmf.read(32))

I assume that is supposed to be infile.read()

# do something

Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.

You're reading 400 million bytes, or 400MB, in about half an hour. Whether
that's fast or slow depends on what the "do something" lines are doing.

I would like it to run faster.
Do you have any suggestions?

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsi ze*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack(' =HHHH', buffer[offset:offset+8])
# do something
tdc = struct.unpack(' =LiLiLiLi', buffer[offset+8:offset +rsize])
# do something
(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)
--
Steven D'Aprano

Apr 30 '07 #2

On Apr 30, 9:41 am, Steven D'Aprano <s...@REMOVEME. cybersource.com .au>
wrote:

On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:
Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack(' =HHHH', infile.read(8))
# do something
tdc = struct.unpack(' =LiLiLiLi',self .lmf.read(32))

I assume that is supposed to be infile.read()

# do something

Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.

You're reading 400 million bytes, or 400MB, in about half an hour. Whether
that's fast or slow depends on what the "do something" lines are doing.

I would like it to run faster.
Do you have any suggestions?

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsi ze*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack(' =HHHH', buffer[offset:offset+8])
# do something
tdc = struct.unpack(' =LiLiLiLi', buffer[offset+8:offset +rsize])
# do something

(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)

--
Steven D'Aprano

I think the file.read() already buffers reads... :)

May 1 '07 #3

Gabriel Genellina

En Tue, 01 May 2007 05:22:49 -0300, eC <er*********@gm ail.comescribió :

On Apr 30, 9:41 am, Steven D'Aprano <s...@REMOVEME. cybersource.com .au>
wrote:
>On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:

I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack(' =HHHH', infile.read(8))
# do something
tdc = struct.unpack(' =LiLiLiLi',self .lmf.read(32))

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsi ze*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack(' =HHHH', buffer[offset:offset+8])
# do something
tdc = struct.unpack(' =LiLiLiLi', buffer[offset+8:offset +rsize])
# do something

(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)

I think the file.read() already buffers reads... :)

Now we need someone to actually measure it, to confirm the expected
behavior... Done.

--- begin code ---
import struct,timeit,o s

fn = r"c:\temp\delet e.me"
fsize = 1000000
if not os.path.isfile( fn):
f = open(fn, "wb")
f.write("\0" * fsize)
f.close()
os.system("sync ")

def smallreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N): # N is about 10,000,000
time = struct.unpack(' =HHHH', f.read(8))
tdc = struct.unpack(' =LiLiLiLi', f.read(32))
f.close()
def bigreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N//1000):
buffer = f.read(rsize*10 00) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack(' =HHHH', buffer[offset:offset+8])
tdc = struct.unpack(' =LiLiLiLi', buffer[offset+8:offset +rsize])
f.close()

print "smallreads ", timeit.Timer("s mallreads(fn)", "from __main__ import
fn,smallreads,f size").repeat(3 ,1)
print "bigreads", timeit.Timer("b igreads(fn)", "from __main__ import
fn,bigreads,fsi ze").repeat(3,1 )
--- end code ---

Output:
smallreads [4.2534193777646 663, 4.1260138855597 89, 4.2389176672125 458]
bigreads [1.2897319939456 011, 1.3076018578892 405, 1.2703250635695 138]

So in this sample case, reading in big chunks is about 3 times faster than
reading many tiny pieces.

--
Gabriel Genellina

May 1 '07 #4

OhKyu Yoon

Wow, thank you all!

"Gabriel Genellina" <ga*******@yaho o.com.arwrote in message
news:op******** *******@furufuf a-ec0e13.cpe.tele centro.com.ar.. .

En Tue, 01 May 2007 05:22:49 -0300, eC <er*********@gm ail.comescribió :

>On Apr 30, 9:41 am, Steven D'Aprano <s...@REMOVEME. cybersource.com .au>
wrote:
>>On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:

>I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack(' =HHHH', infile.read(8))
# do something
tdc = struct.unpack(' =LiLiLiLi',self .lmf.read(32))

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsi ze*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack(' =HHHH', buffer[offset:offset+8])
# do something
tdc = struct.unpack(' =LiLiLiLi', buffer[offset+8:offset +rsize])
# do something

(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)

I think the file.read() already buffers reads... :)

Now we need someone to actually measure it, to confirm the expected
behavior... Done.

--- begin code ---
import struct,timeit,o s

fn = r"c:\temp\delet e.me"
fsize = 1000000
if not os.path.isfile( fn):
f = open(fn, "wb")
f.write("\0" * fsize)
f.close()
os.system("sync ")

def smallreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N): # N is about 10,000,000
time = struct.unpack(' =HHHH', f.read(8))
tdc = struct.unpack(' =LiLiLiLi', f.read(32))
f.close()
def bigreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N//1000):
buffer = f.read(rsize*10 00) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack(' =HHHH', buffer[offset:offset+8])
tdc = struct.unpack(' =LiLiLiLi', buffer[offset+8:offset +rsize])
f.close()

print "smallreads ", timeit.Timer("s mallreads(fn)", "from __main__ import
fn,smallreads,f size").repeat(3 ,1)
print "bigreads", timeit.Timer("b igreads(fn)", "from __main__ import
fn,bigreads,fsi ze").repeat(3,1 )
--- end code ---

Output:
smallreads [4.2534193777646 663, 4.1260138855597 89, 4.2389176672125 458]
bigreads [1.2897319939456 011, 1.3076018578892 405, 1.2703250635695 138]

So in this sample case, reading in big chunks is about 3 times faster than
reading many tiny pieces.

--
Gabriel Genellina

May 1 '07 #5

Similar topics

2283

proposed struct module format code addition

by: Josiah Carlson | last post by:

Good day everyone, I have produced a patch against the latest CVS to add support for two new formatting characters in the struct module. It is currently an RFE, which I include a link to at the end of this post. Please read the email before you respond to it. Generally, the struct module is for packing and unpacking of binary data. It...

Python

13781

Struggling with struct.unpack() and "p" format specifier

by: Geoffrey | last post by:

Hope someone can help. I am trying to read data from a file binary file and then unpack the data into python variables. Some of the data is store like this; xbuffer: '\x00\x00\xb9\x02\x13EXCLUDE_CREDIT_CARD' # the above was printed using repr(xbuffer). # Note that int(0x13) = 19 which is exactly the length of the visible text #

Python

5444

struct unpack newline

by: grant | last post by:

Hi All, I am pretty new to python and am having a problem intepreting binary data using struct.unpack. I am reading a file containing binary packed data using open with "rb". All the values are coming through fine when using (integer1,) = struct.unpack('l', line) except when line contains "carriage-return" "linefeed" which are valid...

Python

2986

What's wrong with this code ? (struct serialization to raw memoryblock)

by: Alfonso Morra | last post by:

Hi, I am at the end of my tether now - after spending several days trying to figure how to do this. I have finally written a simple "proof of concept" program to test serializing a structure containing pointers into a "flattened" bit stream. Here is my code (it dosen't work - compiles fine, pack appears to work, but unpack retrieves...

C / C++

2257

struct: type registration?

by: Giovanni Bajo | last post by:

Hello, given the ongoing work on struct (which I thought was a dead module), I was wondering if it would be possible to add an API to register custom parsing codes for struct. Whenever I use it for non-trivial tasks, I always happen to write small wrapper functions to adjust the values returned by struct. An example API would be the...

Python

4829

inet_aton and struct issue

by: David Bear | last post by:

I found this simple recipe for converting a dotted quad ip address to a string of a long int. struct.unpack('L',socket.inet_aton(ip)) trouble is when I use this, I get struct.error: unpack str size does not match format I thought ip addresses were unsigned 32 bit integers.

Python

4377

struct unpack

by: brnstrmrs | last post by:

If I run: testValue = '\x02\x00' junk = struct.unpack('h', testValue) Everything works but If I run testValue = raw_input("Enter Binary Code..:") inputting at the console '\x02\x00' junk = struct.unpack('h', testValue)

Python

1598

struct unpack issue

by: Ping Zhao | last post by:

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I am writing a small program to decode MS bitmap image. When I use statements as follow, it works fine: header = str(struct.unpack('2s', self.__read(src, 2))) header = int(struct.unpack('1i', self.__read(src, 4)))

Python

3524

Cross-platform socket.getsockopt and struct.unpack (or socket timeout)?

by: Heikki Toivonen | last post by:

M2Crypto has some old code that gets and sets socket timeouts in http://svn.osafoundation.org/m2crypto/trunk/M2Crypto/SSL/Connection.py, for example: def get_socket_read_timeout(self): return timeout.struct_to_timeout(self.socket.getsockopt(socket.SOL_SOCKET, socket.SO_RCVTIMEO, timeout.struct_size())) The helper timeout module is here:

Python

7496

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...

General

7452

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...

Windows Server

7784

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...

General

6014

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...

Career Advice

5354

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...

Microsoft Access / VBA

5071

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...

C# / C Sharp

3467

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1916

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

738

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

General