Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:
for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))
# do something
Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.
I would like it to run faster.
Do you have any suggestions?
Thank you very much.
OhKyu 4 2752
On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:
Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:
for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))
I assume that is supposed to be infile.read()
# do something
Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.
You're reading 400 million bytes, or 400MB, in about half an hour. Whether
that's fast or slow depends on what the "do something" lines are doing.
I would like it to run faster.
Do you have any suggestions?
Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.
# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
# do something
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
# do something
(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)
--
Steven D'Aprano
On Apr 30, 9:41 am, Steven D'Aprano <s...@REMOVEME.cybersource.com.au>
wrote:
On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:
Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:
for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))
I assume that is supposed to be infile.read()
# do something
Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.
You're reading 400 million bytes, or 400MB, in about half an hour. Whether
that's fast or slow depends on what the "do something" lines are doing.
I would like it to run faster.
Do you have any suggestions?
Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.
# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
# do something
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
# do something
(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)
--
Steven D'Aprano
I think the file.read() already buffers reads... :)
En Tue, 01 May 2007 05:22:49 -0300, eC <er*********@gmail.comescribió:
On Apr 30, 9:41 am, Steven D'Aprano <s...@REMOVEME.cybersource.com.au>
wrote:
>On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:
I have a really long binary file that I want to read.
The way I am doing it now is:
for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))
Disk I/O is slow, so don't read from files in tiny little chunks. Read a bunch of records into memory, then process them.
# UNTESTED! rsize = 8 + 32 # record size for i in xrange(N//1000): buffer = infile.read(rsize*1000) # read 1000 records at once for j in xrange(1000): # process each record offset = j*rsize time = struct.unpack('=HHHH', buffer[offset:offset+8]) # do something tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize]) # do something
(Now I'm just waiting for somebody to tell me that file.read() already buffers reads...)
I think the file.read() already buffers reads... :)
Now we need someone to actually measure it, to confirm the expected
behavior... Done.
--- begin code ---
import struct,timeit,os
fn = r"c:\temp\delete.me"
fsize = 1000000
if not os.path.isfile(fn):
f = open(fn, "wb")
f.write("\0" * fsize)
f.close()
os.system("sync")
def smallreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', f.read(8))
tdc = struct.unpack('=LiLiLiLi', f.read(32))
f.close()
def bigreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N//1000):
buffer = f.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
f.close()
print "smallreads", timeit.Timer("smallreads(fn)","from __main__ import
fn,smallreads,fsize").repeat(3,1)
print "bigreads", timeit.Timer("bigreads(fn)", "from __main__ import
fn,bigreads,fsize").repeat(3,1)
--- end code ---
Output:
smallreads [4.2534193777646663, 4.126013885559789, 4.2389176672125458]
bigreads [1.2897319939456011, 1.3076018578892405, 1.2703250635695138]
So in this sample case, reading in big chunks is about 3 times faster than
reading many tiny pieces.
--
Gabriel Genellina
Wow, thank you all!
"Gabriel Genellina" <ga*******@yahoo.com.arwrote in message
news:op***************@furufufa-ec0e13.cpe.telecentro.com.ar...
En Tue, 01 May 2007 05:22:49 -0300, eC <er*********@gmail.comescribió:
>On Apr 30, 9:41 am, Steven D'Aprano <s...@REMOVEME.cybersource.com.au> wrote:
>>On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:
>I have a really long binary file that I want to read. The way I am doing it now is:
for i in xrange(N): # N is about 10,000,000 time = struct.unpack('=HHHH', infile.read(8)) # do something tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))
Disk I/O is slow, so don't read from files in tiny little chunks. Read a bunch of records into memory, then process them.
# UNTESTED! rsize = 8 + 32 # record size for i in xrange(N//1000): buffer = infile.read(rsize*1000) # read 1000 records at once for j in xrange(1000): # process each record offset = j*rsize time = struct.unpack('=HHHH', buffer[offset:offset+8]) # do something tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize]) # do something
(Now I'm just waiting for somebody to tell me that file.read() already buffers reads...)
I think the file.read() already buffers reads... :)
Now we need someone to actually measure it, to confirm the expected
behavior... Done.
--- begin code ---
import struct,timeit,os
fn = r"c:\temp\delete.me"
fsize = 1000000
if not os.path.isfile(fn):
f = open(fn, "wb")
f.write("\0" * fsize)
f.close()
os.system("sync")
def smallreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', f.read(8))
tdc = struct.unpack('=LiLiLiLi', f.read(32))
f.close()
def bigreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N//1000):
buffer = f.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset:offset+8])
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
f.close()
print "smallreads", timeit.Timer("smallreads(fn)","from __main__ import
fn,smallreads,fsize").repeat(3,1)
print "bigreads", timeit.Timer("bigreads(fn)", "from __main__ import
fn,bigreads,fsize").repeat(3,1)
--- end code ---
Output:
smallreads [4.2534193777646663, 4.126013885559789, 4.2389176672125458]
bigreads [1.2897319939456011, 1.3076018578892405, 1.2703250635695138]
So in this sample case, reading in big chunks is about 3 times faster than
reading many tiny pieces.
--
Gabriel Genellina
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Josiah Carlson |
last post by:
Good day everyone,
I have produced a patch against the latest CVS to add support for two
new formatting characters in the struct module. It is currently an RFE,
which I include a link to at the...
|
by: Geoffrey |
last post by:
Hope someone can help.
I am trying to read data from a file binary file and then unpack the
data into python variables. Some of the data is store like this;
xbuffer:...
|
by: grant |
last post by:
Hi All,
I am pretty new to python and am having a problem
intepreting binary data using struct.unpack.
I am reading a file containing binary packed data
using open with "rb". All the values are...
|
by: Alfonso Morra |
last post by:
Hi,
I am at the end of my tether now - after spending several days trying to
figure how to do this. I have finally written a simple "proof of
concept" program to test serializing a structure...
|
by: Giovanni Bajo |
last post by:
Hello,
given the ongoing work on struct (which I thought was a dead module), I was
wondering if it would be possible to add an API to register custom parsing
codes for struct. Whenever I use it...
|
by: David Bear |
last post by:
I found this simple recipe for converting a dotted quad ip address to a
string of a long int.
struct.unpack('L',socket.inet_aton(ip))
trouble is when I use this, I get
struct.error: unpack...
|
by: brnstrmrs |
last post by:
If I run:
testValue = '\x02\x00'
junk = struct.unpack('h', testValue)
Everything works but If I run
testValue = raw_input("Enter Binary Code..:") inputting at the
console '\x02\x00'
junk...
|
by: Ping Zhao |
last post by:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
I am writing a small program to decode MS bitmap image. When I use statements as follow, it works fine:
header = str(struct.unpack('2s',...
|
by: Heikki Toivonen |
last post by:
M2Crypto has some old code that gets and sets socket timeouts in
http://svn.osafoundation.org/m2crypto/trunk/M2Crypto/SSL/Connection.py,
for example:
def get_socket_read_timeout(self):
return...
|
by: lllomh |
last post by:
Define the method first
this.state = {
buttonBackgroundColor: 'green',
isBlinking: false, // A new status is added to identify whether the button is blinking or not
}
autoStart=()=>{
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: Aliciasmith |
last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
|
by: tracyyun |
last post by:
Hello everyone,
I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
|
by: giovanniandrean |
last post by:
The energy model is structured as follows and uses excel sheets to give input data:
1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
|
by: NeoPa |
last post by:
Introduction
For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
|
by: Teri B |
last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course.
0ne-to-many. One course many roles.
Then I created a report based on the Course form and...
|
by: NeoPa |
last post by:
Introduction
For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
|
by: isladogs |
last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, Mike...
| |