473,385 Members | 1,356 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Graceful detection of EOF

How does one detect the EOF gracefully? Assuming I have a pickle file
containing an unknown number of objects, how can I read (i.e.,
pickle.load()) until the EOF is encountered without generating an EOF
exception?

Thanks for any assistance.
MickeyBob

Jul 18 '05 #1
20 14892
Write a file-like object that can "look ahead" and provide a flag to
check in your unpickling loop, and which implements enough of the file
protocol ("read" and "readline", apparently) to please pickle. The
following worked for me.

class PeekyFile:
def __init__(self, f):
self.f = f
self.peek = ""

def eofnext(self):
if self.peek: return False
try:
self.peek = self.f.read(1)
except EOFError:
return True
return not self.peek

def read(self, n=None):
if n is not None:
n = n - len(self.peek)
result = self.peek + self.f.read(n)
else:
result = self.peek + self.f.read()
self.peek = ""
return result

def readline(self):
result = self.peek + self.f.readline()
self.peek = ""
return result

import StringIO, pickle
o = StringIO.StringIO()
for x in range(5):
pickle.dump(x, o)
i = PeekyFile(StringIO.StringIO(o.getvalue()))
while 1:
i.eofnext()
if i.eofnext():
break
print pickle.load(i)
print "at the end"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFBZZsVJd01MZaTXX0RAl0FAJ9GCBIWmLaS+UbhCgZGR6 PlJ94c4QCePq/k
x9c7Hokjaj+RpSYryvEwCJ8=
=sIw8
-----END PGP SIGNATURE-----

Jul 18 '05 #2
MickeyBob wrote:
How does one detect the EOF gracefully? Assuming I have a pickle file
containing an unknown number of objects, how can I read (i.e.,
pickle.load()) until the EOF is encountered without generating an EOF
exception?


Why isn't catching the exception graceful?

# UNTESTED CODE

def load_pickle_iter(infile):
while 1:
try:
yield pickle.load(infile)
except EOFError:
break

for obj in load_pickle_iter(open("mydata.pickle", "rb")):
print obj
This is well in line with the normal Python idiom,
as compared to "look before you leap".

Andrew
da***@dalkescientific.com
Jul 18 '05 #3
Andrew Dalke wrote:
MickeyBob wrote:
How does one detect the EOF gracefully? Assuming I have a pickle file
containing an unknown number of objects, how can I read (i.e.,
pickle.load()) until the EOF is encountered without generating an EOF
exception?

Why isn't catching the exception graceful?

# UNTESTED CODE

def load_pickle_iter(infile):
while 1:
try:
yield pickle.load(infile)
except EOFError:
break

for obj in load_pickle_iter(open("mydata.pickle", "rb")):
print obj
This is well in line with the normal Python idiom,
as compared to "look before you leap".

Andrew
da***@dalkescientific.com


So, what you're saying is that the Python way, in contradistinction to
"look before you leap", is "land in it, then wipe it off?" Can we get
that in the Zen of Python? :-)

Seriously, this is beautiful. I understand generators, but haven't
become accustomed to using them yet. That is just beautiful, which _is_
Zen.
Jeremy Jones

Jul 18 '05 #4
A file is too large to fit into memory.
The first line must receive a special treatment, because
it contains information about how to handle the rest of the file.

Of course it is not difficult to test if you are reading the first line
or another one, but it hurts my feelings to do a test which by definition
succeeds at the first record, and never afterwards.
Any suggestions ?
egbert
--
Egbert Bouwman - Keizersgracht 197 II - 1016 DS Amsterdam - 020 6257991
================================================== ======================
Jul 18 '05 #5
Egbert Bouwman wrote:
A file is too large to fit into memory.
The first line must receive a special treatment, because
it containsÂ*Â*informationÂ*aboutÂ*howÂ*toÂ*handleÂ*t heÂ*restÂ*ofÂ*theÂ*file.

Of course it is not difficult to test if you are reading the first line
or another one, but it hurts my feelings to do a test which by definition
succeeds at the first record, and never afterwards.

lines = iter("abc")
for first in lines: .... print first
.... break
....
a for line in lines:

.... print line
....
b
c

Unless it hurts your feelings to unconditionally break out of a for-loop,
that is.

Peter
Jul 18 '05 #6
Peter Otten wrote:
>>> lines = iter("abc")
for first in lines: ... print first
... break
...
a for line in lines: ... print line
...
b
c

Unless it hurts your feelings to unconditionally break out of a for-loop,
that is.


How about:
lines = iter("abc")
first = lines.next()
print first a for line in lines:

.... print line
....
b
c

Would hurt less feeling I presume.

Gerrit.

--
Weather in Twenthe, Netherlands 08/10 11:25:
11.0°C Few clouds mostly cloudy wind 0.9 m/s None (57 m above NAP)
--
In the councils of government, we must guard against the acquisition of
unwarranted influence, whether sought or unsought, by the
military-industrial complex. The potential for the disastrous rise of
misplaced power exists and will persist.
-Dwight David Eisenhower, January 17, 1961
Jul 18 '05 #7
Jeremy Jones <za******@bellsouth.net> wrote:
...
This is well in line with the normal Python idiom,
as compared to "look before you leap".

Andrew
da***@dalkescientific.com


So, what you're saying is that the Python way, in contradistinction to
"look before you leap", is "land in it, then wipe it off?" Can we get
that in the Zen of Python? :-)


The "normal Python idiom" is often called, in honor and memory of
Admiral Grace Murray-Hopper (arguably the most significant woman in the
history of programming languages to this time), "it's Easier to Ask
Forgiveness than Permission" (EAFP, vs the LBYL alternative). This
motto has been attributed to many, but Ms Hopper was undoubtedly the
first one to use it reportedly and in our field.

In the general case, trying to ascertain that an operation will succeed
before attempting the operation has many problems. Often you end up
repeating the same steps between the ascertaining and the actual usage,
which offends the "Once and Only Once" principle as well as slowing
things down. Sometimes you cannot ensure that the ascertaining and the
operating pertain to exactly the same thing -- the world can have
changed in-between, or the code might present subtle differences between
the two cases.

In contrast, if a failed attempt can be guaranteed to not alter
persistent state and only result in an easily catchable exception, EAFP
can better deliver on its name. In terms of your analogy, there's
nothing to "wipe off" -- if the leap "misfires", no damage is done.
Alex

Jul 18 '05 #8
Egbert Bouwman <eg*********@hccnet.nl> wrote:
A file is too large to fit into memory.
The first line must receive a special treatment, because
it contains information about how to handle the rest of the file.

Of course it is not difficult to test if you are reading the first line
or another one, but it hurts my feelings to do a test which by definition
succeeds at the first record, and never afterwards.


option 1, the one I would use:

thefile = open('somehugefile.txt')
first_line = thefile.next()
deal_with_first(first_line)
for line in thefile:
deal_with_other(line)

this requires Python 2.3 or better, so that thefile IS-AN iterator; in
2.2, get an iterator with foo=iter(thefile) and use .next and for on
that (better still, upgrade!).

option 2, not unreasonable (not repeating the open & calls...):

first_line = thefile.readline()
for line in thefile: ...

option 3, a bit cutesy:

for first_line in thefile: break
for line in thefile: ...

(again, in 2.2 you'll need some foo=iter(thefile)).
I'm sure there are others, but 3 is at least 2 too many already,
so...;-)
Alex
Jul 18 '05 #9
Gerrit wrote:
first = lines.next()
[as opposed to 'for first in lines: break']
Would hurt less feeling I presume.

iter("").next()

Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration

I feel a little uneasy with that ...unless I'm sure I want to deal with the
StopIteration elsewhere.
Looking at it from another angle, the initial for-loop ist just a peculiar
way to deal with an empty iterable. So the best (i. e. clear, robust and
general) approach is probably

items = iter(...)
try:
first = items.next()
except StopIteration:
# deal with empty iterator, e. g.:
raise ValueError("need at least one item")
else:
# process remaining data

part of which is indeed your suggestion.

Peter
Jul 18 '05 #10
lines = iter("abc")
first = lines.next()
print first a for line in lines:

... print line
...
b
c

Would hurt less feeling I presume.


Unless it was empty, then you'd get the dreaded StopIteration!

IMO, unconditionally breaking out of a for loop is the nicer way of
handling things in this case, no exceptions to catch.

- Josiah

Jul 18 '05 #11
In article <ma**************************************@python.o rg>,
Egbert Bouwman <eg*********@hccnet.nl> wrote:
A file is too large to fit into memory.
The first line must receive a special treatment, because
it contains information about how to handle the rest of the file.

Of course it is not difficult to test if you are reading the first line
or another one, but it hurts my feelings to do a test which by definition
succeeds at the first record, and never afterwards.
Any suggestions ?


f = file("lines.txt", "rt")
first_line_processing (f.readline())
for line in f:
line_processing (line)

ought to work.

Regards. Mel.
Jul 18 '05 #12
Peter Otten <__*******@web.de> wrote:
...
Looking at it from another angle, the initial for-loop ist just a peculiar
way to deal with an empty iterable. So the best (i. e. clear, robust and
general) approach is probably

items = iter(...)
try:
first = items.next()
except StopIteration:
# deal with empty iterator, e. g.:
raise ValueError("need at least one item")
else:
# process remaining data


I think it can't be optimal, as coded, because it's more nested than it
needs to be (and "flat is better than nested"): since the exception
handler doesn't fall through, I would omit the try statement's else
clause and outdent the "process remaining data" part. The else clause
would be needed if the except clause could fall through, though.
Alex
Jul 18 '05 #13
Josiah Carlson <jcarlson <at> uci.edu> writes:
IMO, unconditionally breaking out of a for loop is the nicer way of
handling things in this case, no exceptions to catch.


There's still a NameError to catch if you haven't initialized line:
for line in []: .... break
.... line

Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'line' is not defined

I don't much like the break out of a for loop, because it feels like a misuse
of a construct designed for iteration... But take your pick: StopIteration or
NameError. =)

Steve
Jul 18 '05 #14
Steven Bethard wrote:
There's still a NameError to catch if you haven't initialized line:
for line in []: ... break
... line
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'line' is not defined


No, you would put code specific to the first line into the loop before the
break.
I don't much like the break out of a for loop, because it feels like a
misuse


I can understand that.

Peter

Jul 18 '05 #15
Alex Martelli wrote:
Peter Otten <__*******@web.de> wrote:
...
Looking at it from another angle, the initial for-loop ist just a
peculiar way to deal with an empty iterable. So the best (i. e. clear,
robust and general) approach is probably

items = iter(...)
try:
first = items.next()
except StopIteration:
# deal with empty iterator, e. g.:
raise ValueError("need at least one item")
else:
# process remaining data


I think it can't be optimal, as coded, because it's more nested than it
needs to be (and "flat is better than nested"): since the exception
handler doesn't fall through, I would omit the try statement's else
clause and outdent the "process remaining data" part. The else clause
would be needed if the except clause could fall through, though.


I relied more on the two letters 'e. g.' than I should have as there are two
different aspects I wanted to convey:

1. Don't let the StopIteration propagate:

items = iter(...)
try:
first = items.next()
except StopIteration:
raise MeaningfulException("clear indication of what caused the error")

2. General structure when handling the first item specially:

items = iter(...)
try:
first = items.next()
except StopIteration:
# handle error
else:
# a. code relying on 'first'
# b. code independent of 'first' or relying on the error handler
# defining a proper default.

where both (a) and (b) are optional.

As we have now two variants, I have to drop the claim to generality.
Regarding the Zen incantation, "flat is better than nested", I tend measure
nesting as max(indent level) rather than avg(), i. e. following my (perhaps
odd) notion the else clause would affect nesting only if it contained an
additional if, for, etc. Therefore I have no qualms to sometimes use else
where it doesn't affect control flow:

def whosAfraidOf(color):
if color == red:
return peopleAfraidOfRed
else:
# if it ain't red it must be yellow - nobody's afraid of blue
return peopleAfraidOfYellow

as opposed to

def whosAfraidOf(color):
if color == red:
return peopleAfraidOfRed
return peopleAfraidOfAnyOtherColor

That said, usually my programs have bigger problems than the above subtlety.

Peter
Jul 18 '05 #16
On Fri, Oct 08, 2004 at 11:59:32AM +0200, Alex Martelli wrote:

option 3, a bit cutesy:

for first_line in thefile: break
for line in thefile: ...

(again, in 2.2 you'll need some foo=iter(thefile)).
This technique depends in the file being positioned at line 2,
after the break.

However, In the Nutshell book, page 191, you write: Interrupting such a loop prematurely (e.g. with break)
leaves the file's current position with an arbitrary value.


So the information about the current position is useless.

Do I discover a contradiction ?
egbert
--
Egbert Bouwman - Keizersgracht 197 II - 1016 DS Amsterdam - 020 6257991
================================================== ======================
Jul 18 '05 #17
Egbert Bouwman <eg*********@hccnet.nl> wrote:
On Fri, Oct 08, 2004 at 11:59:32AM +0200, Alex Martelli wrote:

option 3, a bit cutesy:

for first_line in thefile: break
for line in thefile: ...

(again, in 2.2 you'll need some foo=iter(thefile)).
This technique depends in the file being positioned at line 2,
after the break.


Not exactly, if by "being positioned" you mean what's normally meant for
file objects (what will thefile.tell() respond, what next five bytes
will thefile.read(5) read, and so on). All it depends on is the
_iterator_ on the file being "positioned" in the sense in which
iterators are positioned (what item will come if you call next on the
iterator).

In 2.3 a file is-an iterator; in 2.2 you need to explicitly get an
iterator as indicated in the parenthesis you've also quoted.

However, In the Nutshell book, page 191, you write:
Interrupting such a loop prematurely (e.g. with break)
leaves the file's current position with an arbitrary value.


So the information about the current position is useless.

Do I discover a contradiction ?


Nope -- the file's current position is (e.g.) what tell will respond if
you call it, and that IS arbitrary. In 2.2 (which is what the Nutshell
covers) you need to explicitly get an iterator to do anything else; in
2.3 you can rely on the fact that a file is its own iterator to make
your code simpler. But the iteration state is not connected with the
file's current position.
Alex
Jul 18 '05 #18
On Sun, Oct 10, 2004 at 12:41:37AM +0200, Alex Martelli wrote:
....
But the iteration state is not connected with the
file's current position.

That is very useful information.Thanks.
egbert
--
Egbert Bouwman - Keizersgracht 197 II - 1016 DS Amsterdam - 020 6257991
================================================== ======================
Jul 18 '05 #19
Steven Bethard <st************@gmail.com> wrote:
...
I don't much like the break out of a for loop, because it feels like a misuse
of a construct designed for iteration... But take your pick: StopIteration or
NameError. =)


Jacopini and Bohm have much to answer for...;-)

Alex
Jul 18 '05 #20
Egbert Bouwman <eg*********@hccnet.nl> wrote in message news:<ma**************************************@pyt hon.org>...
Of course it is not difficult to test if you are reading the first line
or another one, but it hurts my feelings to do a test which by definition
succeeds at the first record, and never afterwards.
Any suggestions ?

An alternative approach (which I'm sure will offend just as many
sensibilities) is to use a function that replaces itself.

------ Pseudo-code ------
def process_firstline(...):

# ...do something here...
global processline
processline = process_otherlines
def process_otherlines(...):

# ...do something here...
processline = process_firstline

for line in file:
result = processline(line)

----------------------------

If you read more than one file you'll need to reset processline at the
beginning of each file.

I never said this was a *good* way, just one way. :-)

--Phil.
Jul 18 '05 #21

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Jacob H | last post by:
Hello all, I'm nearing the completion of my first graphical console game and my thoughts have turned to the subject of gracefully handling runtime errors. During development I like to not handle...
60
by: Fotios | last post by:
Hi guys, I have put together a flexible client-side user agent detector (written in js). I thought that some of you may find it useful. Code is here: http://fotios.cc/software/ua_detect.htm ...
6
by: Gustav Medler | last post by:
Hello, there is a known problem with Opera and the execution of content shown in <NOSCRIPT> tag. Everythings works fine, if there is only one simple script like:...
8
by: R. Smits | last post by:
I've have got this script, the only thing I want to be changed is the first part. It has to detect IE version 6 instead of just "Microsoft Internet Explorer". Can somebody help me out? I tried...
7
by: mosaic | last post by:
Hi, all I really interested in how to check the memory leak of a program. Your smart guys, do you have excellent ideas that could share with me? Thank you. The following is my idea: In C...
2
by: csgonan | last post by:
I have a new 64 bit apache 2.2.4 server on Solaris 10 with openssl 0.9.8e. When I DO NOT have the ssl.conf file included and I "apachectl graceful" to apache, all my processes that are gracefully...
0
by: darrenhello | last post by:
hi there, I am doing my last year's project and I have a 'little' problem. I have to do an edge detection filter. for now, some normal edge detection filters that I used worked fine but there a...
0
by: origami.takarana | last post by:
Intrusion Detection Strategies ----------------------------------- Until now, we’ve primarily discussed monitoring in how it relates to intrusion detection, but there’s more to an overall...
10
by: Conrad Lender | last post by:
In a recent thread in this group, I said that in some cases object detection and feature tests weren't sufficient in the development of cross-browser applications, and that there were situations...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.