By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,901 Members | 1,496 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,901 IT Pros & Developers. It's quick & easy.

Graceful detection of EOF

P: n/a
How does one detect the EOF gracefully? Assuming I have a pickle file
containing an unknown number of objects, how can I read (i.e.,
pickle.load()) until the EOF is encountered without generating an EOF
exception?

Thanks for any assistance.
MickeyBob

Jul 18 '05 #1
Share this Question
Share on Google+
20 Replies


P: n/a
Write a file-like object that can "look ahead" and provide a flag to
check in your unpickling loop, and which implements enough of the file
protocol ("read" and "readline", apparently) to please pickle. The
following worked for me.

class PeekyFile:
def __init__(self, f):
self.f = f
self.peek = ""

def eofnext(self):
if self.peek: return False
try:
self.peek = self.f.read(1)
except EOFError:
return True
return not self.peek

def read(self, n=None):
if n is not None:
n = n - len(self.peek)
result = self.peek + self.f.read(n)
else:
result = self.peek + self.f.read()
self.peek = ""
return result

def readline(self):
result = self.peek + self.f.readline()
self.peek = ""
return result

import StringIO, pickle
o = StringIO.StringIO()
for x in range(5):
pickle.dump(x, o)
i = PeekyFile(StringIO.StringIO(o.getvalue()))
while 1:
i.eofnext()
if i.eofnext():
break
print pickle.load(i)
print "at the end"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFBZZsVJd01MZaTXX0RAl0FAJ9GCBIWmLaS+UbhCgZGR6 PlJ94c4QCePq/k
x9c7Hokjaj+RpSYryvEwCJ8=
=sIw8
-----END PGP SIGNATURE-----

Jul 18 '05 #2

P: n/a
MickeyBob wrote:
How does one detect the EOF gracefully? Assuming I have a pickle file
containing an unknown number of objects, how can I read (i.e.,
pickle.load()) until the EOF is encountered without generating an EOF
exception?


Why isn't catching the exception graceful?

# UNTESTED CODE

def load_pickle_iter(infile):
while 1:
try:
yield pickle.load(infile)
except EOFError:
break

for obj in load_pickle_iter(open("mydata.pickle", "rb")):
print obj
This is well in line with the normal Python idiom,
as compared to "look before you leap".

Andrew
da***@dalkescientific.com
Jul 18 '05 #3

P: n/a
Andrew Dalke wrote:
MickeyBob wrote:
How does one detect the EOF gracefully? Assuming I have a pickle file
containing an unknown number of objects, how can I read (i.e.,
pickle.load()) until the EOF is encountered without generating an EOF
exception?

Why isn't catching the exception graceful?

# UNTESTED CODE

def load_pickle_iter(infile):
while 1:
try:
yield pickle.load(infile)
except EOFError:
break

for obj in load_pickle_iter(open("mydata.pickle", "rb")):
print obj
This is well in line with the normal Python idiom,
as compared to "look before you leap".

Andrew
da***@dalkescientific.com


So, what you're saying is that the Python way, in contradistinction to
"look before you leap", is "land in it, then wipe it off?" Can we get
that in the Zen of Python? :-)

Seriously, this is beautiful. I understand generators, but haven't
become accustomed to using them yet. That is just beautiful, which _is_
Zen.
Jeremy Jones

Jul 18 '05 #4

P: n/a
A file is too large to fit into memory.
The first line must receive a special treatment, because
it contains information about how to handle the rest of the file.

Of course it is not difficult to test if you are reading the first line
or another one, but it hurts my feelings to do a test which by definition
succeeds at the first record, and never afterwards.
Any suggestions ?
egbert
--
Egbert Bouwman - Keizersgracht 197 II - 1016 DS Amsterdam - 020 6257991
================================================== ======================
Jul 18 '05 #5

P: n/a
Egbert Bouwman wrote:
A file is too large to fit into memory.
The first line must receive a special treatment, because
it contains┬*┬*information┬*about┬*how┬*to┬*handle┬*t he┬*rest┬*of┬*the┬*file.

Of course it is not difficult to test if you are reading the first line
or another one, but it hurts my feelings to do a test which by definition
succeeds at the first record, and never afterwards.

lines = iter("abc")
for first in lines: .... print first
.... break
....
a for line in lines:

.... print line
....
b
c

Unless it hurts your feelings to unconditionally break out of a for-loop,
that is.

Peter
Jul 18 '05 #6

P: n/a
Peter Otten wrote:
>>> lines = iter("abc")
for first in lines: ... print first
... break
...
a for line in lines: ... print line
...
b
c

Unless it hurts your feelings to unconditionally break out of a for-loop,
that is.


How about:
lines = iter("abc")
first = lines.next()
print first a for line in lines:

.... print line
....
b
c

Would hurt less feeling I presume.

Gerrit.

--
Weather in Twenthe, Netherlands 08/10 11:25:
11.0┬░C Few clouds mostly cloudy wind 0.9 m/s None (57 m above NAP)
--
In the councils of government, we must guard against the acquisition of
unwarranted influence, whether sought or unsought, by the
military-industrial complex. The potential for the disastrous rise of
misplaced power exists and will persist.
-Dwight David Eisenhower, January 17, 1961
Jul 18 '05 #7

P: n/a
Jeremy Jones <za******@bellsouth.net> wrote:
...
This is well in line with the normal Python idiom,
as compared to "look before you leap".

Andrew
da***@dalkescientific.com


So, what you're saying is that the Python way, in contradistinction to
"look before you leap", is "land in it, then wipe it off?" Can we get
that in the Zen of Python? :-)


The "normal Python idiom" is often called, in honor and memory of
Admiral Grace Murray-Hopper (arguably the most significant woman in the
history of programming languages to this time), "it's Easier to Ask
Forgiveness than Permission" (EAFP, vs the LBYL alternative). This
motto has been attributed to many, but Ms Hopper was undoubtedly the
first one to use it reportedly and in our field.

In the general case, trying to ascertain that an operation will succeed
before attempting the operation has many problems. Often you end up
repeating the same steps between the ascertaining and the actual usage,
which offends the "Once and Only Once" principle as well as slowing
things down. Sometimes you cannot ensure that the ascertaining and the
operating pertain to exactly the same thing -- the world can have
changed in-between, or the code might present subtle differences between
the two cases.

In contrast, if a failed attempt can be guaranteed to not alter
persistent state and only result in an easily catchable exception, EAFP
can better deliver on its name. In terms of your analogy, there's
nothing to "wipe off" -- if the leap "misfires", no damage is done.
Alex

Jul 18 '05 #8

P: n/a
Egbert Bouwman <eg*********@hccnet.nl> wrote:
A file is too large to fit into memory.
The first line must receive a special treatment, because
it contains information about how to handle the rest of the file.

Of course it is not difficult to test if you are reading the first line
or another one, but it hurts my feelings to do a test which by definition
succeeds at the first record, and never afterwards.


option 1, the one I would use:

thefile = open('somehugefile.txt')
first_line = thefile.next()
deal_with_first(first_line)
for line in thefile:
deal_with_other(line)

this requires Python 2.3 or better, so that thefile IS-AN iterator; in
2.2, get an iterator with foo=iter(thefile) and use .next and for on
that (better still, upgrade!).

option 2, not unreasonable (not repeating the open & calls...):

first_line = thefile.readline()
for line in thefile: ...

option 3, a bit cutesy:

for first_line in thefile: break
for line in thefile: ...

(again, in 2.2 you'll need some foo=iter(thefile)).
I'm sure there are others, but 3 is at least 2 too many already,
so...;-)
Alex
Jul 18 '05 #9

P: n/a
Gerrit wrote:
first = lines.next()
[as opposed to 'for first in lines: break']
Would hurt less feeling I presume.

iter("").next()

Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration

I feel a little uneasy with that ...unless I'm sure I want to deal with the
StopIteration elsewhere.
Looking at it from another angle, the initial for-loop ist just a peculiar
way to deal with an empty iterable. So the best (i. e. clear, robust and
general) approach is probably

items = iter(...)
try:
first = items.next()
except StopIteration:
# deal with empty iterator, e. g.:
raise ValueError("need at least one item")
else:
# process remaining data

part of which is indeed your suggestion.

Peter
Jul 18 '05 #10

P: n/a
lines = iter("abc")
first = lines.next()
print first a for line in lines:

... print line
...
b
c

Would hurt less feeling I presume.


Unless it was empty, then you'd get the dreaded StopIteration!

IMO, unconditionally breaking out of a for loop is the nicer way of
handling things in this case, no exceptions to catch.

- Josiah

Jul 18 '05 #11

P: n/a
In article <ma**************************************@python.o rg>,
Egbert Bouwman <eg*********@hccnet.nl> wrote:
A file is too large to fit into memory.
The first line must receive a special treatment, because
it contains information about how to handle the rest of the file.

Of course it is not difficult to test if you are reading the first line
or another one, but it hurts my feelings to do a test which by definition
succeeds at the first record, and never afterwards.
Any suggestions ?


f = file("lines.txt", "rt")
first_line_processing (f.readline())
for line in f:
line_processing (line)

ought to work.

Regards. Mel.
Jul 18 '05 #12

P: n/a
Peter Otten <__*******@web.de> wrote:
...
Looking at it from another angle, the initial for-loop ist just a peculiar
way to deal with an empty iterable. So the best (i. e. clear, robust and
general) approach is probably

items = iter(...)
try:
first = items.next()
except StopIteration:
# deal with empty iterator, e. g.:
raise ValueError("need at least one item")
else:
# process remaining data


I think it can't be optimal, as coded, because it's more nested than it
needs to be (and "flat is better than nested"): since the exception
handler doesn't fall through, I would omit the try statement's else
clause and outdent the "process remaining data" part. The else clause
would be needed if the except clause could fall through, though.
Alex
Jul 18 '05 #13

P: n/a
Josiah Carlson <jcarlson <at> uci.edu> writes:
IMO, unconditionally breaking out of a for loop is the nicer way of
handling things in this case, no exceptions to catch.


There's still a NameError to catch if you haven't initialized line:
for line in []: .... break
.... line

Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'line' is not defined

I don't much like the break out of a for loop, because it feels like a misuse
of a construct designed for iteration... But take your pick: StopIteration or
NameError. =)

Steve
Jul 18 '05 #14

P: n/a
Steven Bethard wrote:
There's still a NameError to catch if you haven't initialized line:
for line in []: ... break
... line
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'line' is not defined


No, you would put code specific to the first line into the loop before the
break.
I don't much like the break out of a for loop, because it feels like a
misuse


I can understand that.

Peter

Jul 18 '05 #15

P: n/a
Alex Martelli wrote:
Peter Otten <__*******@web.de> wrote:
...
Looking at it from another angle, the initial for-loop ist just a
peculiar way to deal with an empty iterable. So the best (i. e. clear,
robust and general) approach is probably

items = iter(...)
try:
first = items.next()
except StopIteration:
# deal with empty iterator, e. g.:
raise ValueError("need at least one item")
else:
# process remaining data


I think it can't be optimal, as coded, because it's more nested than it
needs to be (and "flat is better than nested"): since the exception
handler doesn't fall through, I would omit the try statement's else
clause and outdent the "process remaining data" part. The else clause
would be needed if the except clause could fall through, though.


I relied more on the two letters 'e. g.' than I should have as there are two
different aspects I wanted to convey:

1. Don't let the StopIteration propagate:

items = iter(...)
try:
first = items.next()
except StopIteration:
raise MeaningfulException("clear indication of what caused the error")

2. General structure when handling the first item specially:

items = iter(...)
try:
first = items.next()
except StopIteration:
# handle error
else:
# a. code relying on 'first'
# b. code independent of 'first' or relying on the error handler
# defining a proper default.

where both (a) and (b) are optional.

As we have now two variants, I have to drop the claim to generality.
Regarding the Zen incantation, "flat is better than nested", I tend measure
nesting as max(indent level) rather than avg(), i. e. following my (perhaps
odd) notion the else clause would affect nesting only if it contained an
additional if, for, etc. Therefore I have no qualms to sometimes use else
where it doesn't affect control flow:

def whosAfraidOf(color):
if color == red:
return peopleAfraidOfRed
else:
# if it ain't red it must be yellow - nobody's afraid of blue
return peopleAfraidOfYellow

as opposed to

def whosAfraidOf(color):
if color == red:
return peopleAfraidOfRed
return peopleAfraidOfAnyOtherColor

That said, usually my programs have bigger problems than the above subtlety.

Peter
Jul 18 '05 #16

P: n/a
On Fri, Oct 08, 2004 at 11:59:32AM +0200, Alex Martelli wrote:

option 3, a bit cutesy:

for first_line in thefile: break
for line in thefile: ...

(again, in 2.2 you'll need some foo=iter(thefile)).
This technique depends in the file being positioned at line 2,
after the break.

However, In the Nutshell book, page 191, you write: Interrupting such a loop prematurely (e.g. with break)
leaves the file's current position with an arbitrary value.


So the information about the current position is useless.

Do I discover a contradiction ?
egbert
--
Egbert Bouwman - Keizersgracht 197 II - 1016 DS Amsterdam - 020 6257991
================================================== ======================
Jul 18 '05 #17

P: n/a
Egbert Bouwman <eg*********@hccnet.nl> wrote:
On Fri, Oct 08, 2004 at 11:59:32AM +0200, Alex Martelli wrote:

option 3, a bit cutesy:

for first_line in thefile: break
for line in thefile: ...

(again, in 2.2 you'll need some foo=iter(thefile)).
This technique depends in the file being positioned at line 2,
after the break.


Not exactly, if by "being positioned" you mean what's normally meant for
file objects (what will thefile.tell() respond, what next five bytes
will thefile.read(5) read, and so on). All it depends on is the
_iterator_ on the file being "positioned" in the sense in which
iterators are positioned (what item will come if you call next on the
iterator).

In 2.3 a file is-an iterator; in 2.2 you need to explicitly get an
iterator as indicated in the parenthesis you've also quoted.

However, In the Nutshell book, page 191, you write:
Interrupting such a loop prematurely (e.g. with break)
leaves the file's current position with an arbitrary value.


So the information about the current position is useless.

Do I discover a contradiction ?


Nope -- the file's current position is (e.g.) what tell will respond if
you call it, and that IS arbitrary. In 2.2 (which is what the Nutshell
covers) you need to explicitly get an iterator to do anything else; in
2.3 you can rely on the fact that a file is its own iterator to make
your code simpler. But the iteration state is not connected with the
file's current position.
Alex
Jul 18 '05 #18

P: n/a
On Sun, Oct 10, 2004 at 12:41:37AM +0200, Alex Martelli wrote:
....
But the iteration state is not connected with the
file's current position.

That is very useful information.Thanks.
egbert
--
Egbert Bouwman - Keizersgracht 197 II - 1016 DS Amsterdam - 020 6257991
================================================== ======================
Jul 18 '05 #19

P: n/a
Steven Bethard <st************@gmail.com> wrote:
...
I don't much like the break out of a for loop, because it feels like a misuse
of a construct designed for iteration... But take your pick: StopIteration or
NameError. =)


Jacopini and Bohm have much to answer for...;-)

Alex
Jul 18 '05 #20

P: n/a
Egbert Bouwman <eg*********@hccnet.nl> wrote in message news:<ma**************************************@pyt hon.org>...
Of course it is not difficult to test if you are reading the first line
or another one, but it hurts my feelings to do a test which by definition
succeeds at the first record, and never afterwards.
Any suggestions ?

An alternative approach (which I'm sure will offend just as many
sensibilities) is to use a function that replaces itself.

------ Pseudo-code ------
def process_firstline(...):

# ...do something here...
global processline
processline = process_otherlines
def process_otherlines(...):

# ...do something here...
processline = process_firstline

for line in file:
result = processline(line)

----------------------------

If you read more than one file you'll need to reset processline at the
beginning of each file.

I never said this was a *good* way, just one way. :-)

--Phil.
Jul 18 '05 #21

This discussion thread is closed

Replies have been disabled for this discussion.