473,473 Members | 1,535 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Iterating over a binary file

Pardon the newbie question, but how can I iterate over blocks of data
from a binary file (i.e., I can't just iterate over lines, because
there may be no end-of-line delimiters at all). Essentially I want to
to this:

f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.
Jul 18 '05 #1
16 6415
Derek wrote:
Pardon the newbie question, but how can I iterate over blocks of data
from a binary file (i.e., I can't just iterate over lines, because
there may be no end-of-line delimiters at all). Essentially I want to
to this:

f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.


You can tuck away the ugliness in a generator:

def blocks(infile, size=1024):
while True:
block = infile.read(size)
if len(block) == 0:
break
yield block

#use it:
for data in blocks(f):
someobj.update(data)

Peter


Jul 18 '05 #2
"Derek" <no**@none.com> writes:
f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.


You can make it even uglier:

f = file(filename, 'rb')
while 1:
data = f.read(1024)
if len(data) <= 0:
break
someobj.update(data)
f.close()

There's been proposals around to add an assignment-expression operator
like in C, so you could say something like

f = file(filename, 'rb')
while len(data := f.read(1024)) > 0:
someobj.update(data)
f.close()

but that's the subject of holy war around here too many times ;-). Don't
hold your breath waiting for it.
Jul 18 '05 #3
Paul Rubin <http://ph****@NOSPAM.invalid> writes:
The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.
You can make it even uglier:

f = file(filename, 'rb')
while 1:
data = f.read(1024)
if len(data) <= 0:
break
someobj.update(data)
f.close()

There's been proposals around to add an assignment-expression operator
like in C, so you could say something like

f = file(filename, 'rb')
while len(data := f.read(1024)) > 0:
someobj.update(data)
f.close()


It's funny, but I find the first version much more readable than the
second one. Especially if I consciously forget the "do lots of stuff
in condition part of while" indoctrination from C. If there is lots of
stuff in while you have to stare at it a bit more, and it becomes
"idiomatic", something you learn, perhaps even cookbook stuff, instead
of obvious-as-such.
but that's the subject of holy war around here too many times ;-). Don't
hold your breath waiting for it.


Probably true. Instead of ":=", I wouldn't mind getting rid of
expressions/statements difference as a whole.

--
Ville Vainio http://www.students.tut.fi/~vainio24
Jul 18 '05 #4
On 06 Jan 2004 23:52:30 +0200, Ville Vainio wrote:
Paul Rubin <http://ph****@NOSPAM.invalid> writes:
There's been proposals around to add an assignment-expression operator
like in C, so you could say something like It's funny, but I find the first version much more readable than the
second one. Especially if I consciously forget the "do lots of stuff
in condition part of while" indoctrination from C.

but that's the subject of holy war around here too many times ;-). Don't
hold your breath waiting for it.


Probably true. Instead of ":=", I wouldn't mind getting rid of
expressions/statements difference as a whole.


Uh-oh. Don't go there. If there was no difference, then you would be
able to perform assignment, even define a class, in the condition of a
while. I don't think you want that based on what you said above. (I
certainly don't want to have to read code with such complexity!)

-D

--
For society, it's probably a good thing that engineers value function
over appearance. For example, you wouldn't want engineers to build
nuclear power plants that only _look_ like they would keep all the
radiation inside.
(Scott Adams - The Dilbert principle)

www: http://dman13.dyndns.org/~dman/ jabber: dm**@dman13.dyndns.org
Jul 18 '05 #5
Ville Vainio <ville.vainio@spamster_tut_remove.fi> writes:
Paul Rubin <http://ph****@NOSPAM.invalid> writes:
The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.


You can make it even uglier:

f = file(filename, 'rb')
while 1:
data = f.read(1024)
if len(data) <= 0:
break
someobj.update(data)
f.close()

There's been proposals around to add an assignment-expression operator
like in C, so you could say something like

f = file(filename, 'rb')
while len(data := f.read(1024)) > 0:
someobj.update(data)
f.close()


It's funny, but I find the first version much more readable than the
second one. Especially if I consciously forget the "do lots of stuff
in condition part of while" indoctrination from C. If there is lots of
stuff in while you have to stare at it a bit more, and it becomes
"idiomatic", something you learn, perhaps even cookbook stuff, instead
of obvious-as-such.


Idioms exist because they're useful, and there's already plenty of
them in Python, like ''.join(stringlist) or "for i in xrange(n)" etc.

Maybe the condition in the while statement makes that statement twice
as hard to read. However, the example as a whole can still be easier,
simply because it's shorter.

Version 1:

Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while 1: 1
data = f.read(1024) 1
if len(data) <= 0: 1
break 1
someobj.update(data) 1
f.close() 1

Total reading difficulty: 7

Now the second version:

Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while len(data := f.read(1024)) > 0: 2
someobj.update(data) 1
f.close() 1
Total reading difficulty: 5

I got through college on a version of this reasoning. I was a math
major. I had friends studying history and literature who said "that's
a hard subject", but I thought they were crazy. But in a normal math
class, there's one textbook that you use for the whole semester, and
you cover maybe half the chapters in it. I was able to keep up. But
in a literature course, you usually have to read a different entire
book from cover to cover EVERY WEEK. I took a couple classes like
that and barely survived. Yes, it takes a lot more effort to read a
page of a math book than a page of a novel. When you compare the
total reading load though, math was a much easier major than
literature or history.

It's the same with programs. I'd rather read 5 lines of tight code
that each actually does something, than 3 pages of loose code (the
kind that's usually written in Java) that spastically meanders trying
to do the same thing, even if the individual loose lines are easier to
read than the tight lines.
Jul 18 '05 #6
Paul Rubin wrote:
Ville Vainio <ville.vainio@spamster_tut_remove.fi> writes:

Paul Rubin <http://ph****@NOSPAM.invalid> writes:

The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.

You can make it even uglier:

f = file(filename, 'rb')
while 1:
data = f.read(1024)
if len(data) <= 0:
break
someobj.update(data)
f.close()

There's been proposals around to add an assignment-expression operator
like in C, so you could say something like

f = file(filename, 'rb')
while len(data := f.read(1024)) > 0:
someobj.update(data)
f.close()

It's funny, but I find the first version much more readable than the
second one. Especially if I consciously forget the "do lots of stuff
in condition part of while" indoctrination from C. If there is lots of
stuff in while you have to stare at it a bit more, and it becomes
"idiomatic", something you learn, perhaps even cookbook stuff, instead
of obvious-as-such.


Idioms exist because they're useful, and there's already plenty of
them in Python, like ''.join(stringlist) or "for i in xrange(n)" etc.

Maybe the condition in the while statement makes that statement twice
as hard to read. However, the example as a whole can still be easier,
simply because it's shorter.

Version 1:

Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while 1: 1
data = f.read(1024) 1
if len(data) <= 0: 1
break 1
someobj.update(data) 1
f.close() 1

Total reading difficulty: 7

Now the second version:

Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while len(data := f.read(1024)) > 0: 2
someobj.update(data) 1
f.close() 1
Total reading difficulty: 5

I got through college on a version of this reasoning. I was a math
major. I had friends studying history and literature who said "that's
a hard subject", but I thought they were crazy. But in a normal math
class, there's one textbook that you use for the whole semester, and
you cover maybe half the chapters in it. I was able to keep up. But
in a literature course, you usually have to read a different entire
book from cover to cover EVERY WEEK. I took a couple classes like
that and barely survived. Yes, it takes a lot more effort to read a
page of a math book than a page of a novel. When you compare the
total reading load though, math was a much easier major than
literature or history.

It's the same with programs. I'd rather read 5 lines of tight code
that each actually does something, than 3 pages of loose code (the
kind that's usually written in Java) that spastically meanders trying
to do the same thing, even if the individual loose lines are easier to
read than the tight lines.


I would say, that depends on the persons competency in a given language.
Naturally once you are writing long/large programs it is better to have tight
code, but for a newby it is too much to translate at once.
While I consider myself expert in "C" , I am still learning "C++".

That does not mean a language has to lack the capability.
Then again how large a program can you or would you want to write with python?

Cheers, Sam.

Jul 18 '05 #7
> Uh-oh. Don't go there. If there was no difference, then you would be
able to perform assignment, even define a class, in the condition of a
while. I don't think you want that based on what you said above. (I
certainly don't want to have to read code with such complexity!)

-D


I was able to create an simple text pager (like Unix's more) in some
nested list comprehensions. Just because I can do that doesn't mean
that real programs will be made like that. IMHO the difference between
statements and expressions doesn't really make sense, and it is one of
the few advantages Lisp/Scheme (and almost Lua) has over Python.

Daniel Ehrenberg
Jul 18 '05 #8
On Tue, Jan 06, 2004 at 03:25:11PM -0500, Derek wrote:
Pardon the newbie question, but how can I iterate over blocks of data
from a binary file (i.e., I can't just iterate over lines, because
there may be no end-of-line delimiters at all). Essentially I want to
to this:

f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.


f = file(filename, 'rb')
for data in iter(lambda: f.read(1024), ''):
someobj.update(data)
f.close()

Jp

Jul 18 '05 #9
Paul Rubin <http://ph****@NOSPAM.invalid> wrote:
Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while len(data := f.read(1024)) > 0: 2
someobj.update(data) 1
f.close() 1
Total reading difficulty: 5


In Python it can be done even simpler than in C, by making the
"someobj.update" method return the length of the data:

#derek.py

class X:

def update(self,data):
#print a chunk and a space
print data,
return len(data)

def test():
x = X()
f = file('derek.py','rb')
while x.update(f.read(1)):
pass
f.close()

if __name__=='__main__':
test()

IMHO the generator solution proposed earlier is more natural to some
(all?) Python programmers.

Anton
Jul 18 '05 #10
On Tue, 6 Jan 2004, Derek wrote:
f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.


I believe the canonical form is:

f = file(filename, 'rb')
while 1:
data = f.read(1024)
if not data:
break
someobj.update(data)
f.close()

This was also the canonical form for text files, in the case where
f.readlines() wasn't appropriate, prior to the introduction of file
iterators and xreadlines().

--
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: an*****@bullseye.apana.org.au (pref) | Snail: PO Box 370
an*****@pcug.org.au (alt) | Belconnen ACT 2616
Web: http://www.andymac.org/ | Australia

Jul 18 '05 #11
"Derek" <no**@none.com> wrote in message news:<bt************@ID-46268.news.uni-berlin.de>...
Pardon the newbie question, but how can I iterate over blocks of data
from a binary file (i.e., I can't just iterate over lines, because
there may be no end-of-line delimiters at all). Essentially I want to
to this:

f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.


There's an aproach to mimic the following C-statements in Python:

while (result = f.read(1024))
{
do_some_thing(result);
}
def assign(val): .... global result
.... result=val
.... return val
....
f=file('README.txt','rb') while assign(f.read(1024)): .... print len(result)
....

121
f.close()


Regards
Peter
Jul 18 '05 #12
|Thus Spake Derek On the now historical date of Tue, 06 Jan 2004 15:25:11
-0500|
f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any way
to avoid it, or a more elegant syntax? Thanks.


Sounds to me like what you're missing (as in "longing in the heart" not
"missed while reading the documentation") is a "do while" construct.

----- Not Real Python Code ------
f = file(filename, 'rb')
do:
data = f.read(1024)
if len(data) > 0:
someobj.update(data)
while len(data) > 0
----- End of Fictional Python Code -----

Python doesn't have this construct. My understanding is that since
anything that can be done with a "do while" can be accomplished with a
"for" or "while" statement, that "do while" was not included. I'm
probably wrong, but that's my understanding.

Sometimes I miss the "do while" construct because it *can* make code more
legible, but I've mentally replaced it with many of the constructs
mentioned elsewhere in these threads. Generators can make a nice way of
hiding the complexity of code, but it's a judgment call of when your code
starts to become obtuse enough to hide bits of it elsewhere.

HTH

Sam Walters.

--
Never forget the halloween documents.
http://www.opensource.org/halloween/
""" Where will Microsoft try to drag you today?
Do you really want to go there?"""

Jul 18 '05 #13
"Samuel Walters" wrote:
Sounds to me like what you're missing (as in "longing in
the heart" not "missed while reading the documentation") is
a "do while" construct.

----- Not Real Python Code ------
f = file(filename, 'rb')
do:
data = f.read(1024)
if len(data) > 0:
someobj.update(data)
while len(data) > 0
----- End of Fictional Python Code -----
Yup. Being the naive Python newbie that I am, your fictional code is
exactly what I wrote first without realizing Python has no do loop.
Python doesn't have this construct. My understanding is
that since anything that can be done with a "do while" can
be accomplished with a "for" or "while" statement, that "do
while" was not included. I'm probably wrong, but that's my
understanding.

Sometimes I miss the "do while" construct because it *can*
make code more legible, but I've mentally replaced it
with many of the constructs mentioned elsewhere in these
threads. Generators can make a nice way of hiding the
complexity of code, but it's a judgment call of when your
code starts to become obtuse enough to hide bits of it
elsewhere.
Generators seem like a powerful way to hide complexity. While in this
case my code is so simple that a generator would probably introduce
unnecessary obfuscation, it's a technique I'm sure I'll use a great
deal in the future. (And to think I didn't know generators even
existed when I asked this question.)
HTH

Sam Walters.

Jul 18 '05 #14
|Thus Spake Derek On the now historical date of Wed, 07 Jan 2004 16:01:52
-0500|
Yup. Being the naive Python newbie that I am, your fictional code is
exactly what I wrote first without realizing Python has no do loop.
I still do that all the time. Most languages have a do-while and I just
can't get it through my thick skull that python doesn't have that. At
least now I usually catch it as soon as I type "do."

Also take note that python has no select-case construct. The equivalent
construct is if-elif-elif-elif-else. If you think about it, this makes
sense. In languages with a select-case construct, you're usually
comparing a strongly typed variable with a set of constant cases. In a
loosely typed language, there's no guarantee that the variable being
passed into a select clause will come even close to being like the
constant cases. if-elif-else allows you to deal with each case in a much
more dynamic way via on-the-fly comparisons. select-case is a construct I
have not missed in the slightest because pythons if-elif-else construct is
just as legible and doesn't have the danger of forgetting a "break" clause.
Generators seem like a powerful way to hide complexity. While in this
case my code is so simple that a generator would probably introduce
unnecessary obfuscation, it's a technique I'm sure I'll use a great deal
in the future. (And to think I didn't know generators even existed when I
asked this question.)


Generators are fairly new, and for the moment I've put them under the same
mental category as regular expressions, threads and thermonuclear
warheads. Sometimes they're exactly what you need, but most of the time
they unnecessarily complicate things and are probably not what you want to
use.

Sam Walters.

--
Never forget the halloween documents.
http://www.opensource.org/halloween/
""" Where will Microsoft try to drag you today?
Do you really want to go there?"""

Jul 18 '05 #15
Paul Rubin wrote:

Version 1:

Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while 1: 1
data = f.read(1024) 1
if len(data) <= 0: 1
break 1
someobj.update(data) 1
f.close() 1

Total reading difficulty: 7

Now the second version:

Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while len(data := f.read(1024)) > 0: 2
someobj.update(data) 1
f.close() 1

Total reading difficulty: 5


Hmmm... why only "2" for that line? It combines two function
calls, an assignment, a comparison, and control flow. Sounds
a lot like a candidate for, say, a "4", or maybe higher... after
all, the reading difficulty surely grows in some exponential
fashion, not just linearly with line length of number of nested
parentheses.

(Obviously this is all subjective... that's my point. Many people
who have grown comfortable with Python would find the second
unacceptable, and would certainly re-write the first to use
a nice iterator or a nice function call or something anyway,
so the point is sort of moot for them.)

-Peter
Jul 18 '05 #16

"Samuel Walters" <sw*************@yahoo.com> wrote in message
news:pa****************************@yahoo.com...

Also take note that python has no select-case construct. The equivalent
construct is if-elif-elif-elif-else.


Dictionaries are also useful for some of the things people do with case
select.

tjr
Jul 18 '05 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Peter Maas | last post by:
Hi, I'm trying to edit a PDF document line-wise. This is more difficult than I thought, because PDF uses a mixture of all line terminators available in *X, Mac and Win so that utilizing "for...
13
by: yaipa | last post by:
What would be the common sense way of finding a binary pattern in a ..bin file, say some 200 bytes, and replacing it with an updated pattern of the same length at the same offset? Also, the...
7
by: Dave Hansen | last post by:
OK, first, I don't often have the time to read this group, so apologies if this is a FAQ, though I couldn't find anything at python.org. Second, this isn't my code. I wouldn't do this. But a...
28
by: wwj | last post by:
void main() { char* p="Hello"; printf("%s",p); *p='w'; printf("%s",p); }
9
by: Ching-Lung | last post by:
Hi all, I try to create a tool to check the delta (diff) of 2 binaries and create the delta binary. I use binary formatter (serialization) to create the delta binary. It works fine but the...
12
by: Adam J. Schaff | last post by:
I am writing a quick program to edit a binary file that contains file paths (amongst other things). If I look at the files in notepad, they look like: ...
7
by: John Dann | last post by:
I'm trying to read some binary data from a file created by another program. I know the binary file format but can't change or control the format. The binary data is organised such that it should...
4
by: dustin.getz | last post by:
consider the following working loop where Packet is a subclass of list, with Packet.insert(index, iterable) inserting each item in iterable into Packet at consecutive indexes starting at index. ...
16
by: Erwin Moller | last post by:
Why is a binary file executable? Is any binary file executable? Is only binary file executable? Are all executable files binary? What is the connection between the attribute of binary and that of...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.