eof

On 2007-11-21, braver <de*********@gmail.comwrote:

I'd like to check, for a filehandle f, that EOF has been reached on
it. What's the way to do it? I don't want to try/except on EOF, I
want to check, after I read a line, that now we're in the EOF state.
In Ruby it's f.eof:

In Ruby:

>>f = File.open("jopa")

=#<File:jopa>

>>f.read()

="jopa\n"

>>f.eof

=true

Is there a Python analog?

Yes.

>>f = file('jopa')
f.read()

'jopa\n'

....and in both Ruby and Python you are at EOF by definition.
There's no need to check.

--
Neil Cerutti

Nov 22 '07 #4

I V

On Wed, 21 Nov 2007 17:06:15 -0800, braver wrote:

Why do I have to count sizes of lines read and compare it to some
filesize or do other weird tricks just to see, in a way not changing my
input stream, whether it's at the, well, EOF?

Because you can't, generally, tell whether or not a stream is at the end
of the file without reading from it; the C standard library doesn't
guarantee that the EOF status is set until _after_ you've tried to read
past the end of the file. I don't know, but I suspect that may be why
python doesn't support querying files for EOF status - the fact that EOF
can be false when you've read all the data in the file is a source of
confusion to lots of C programmers.

It looks like ruby internally buffers the stream itself, which is how
come it can support this. According to the docs:

"Note that IO#eof? reads data to a input buffer."

http://www.ruby-doc.org/core/classes/IO.html#M002309

I'd imagine that's inefficient in a lot of cases; and I don't think it
provides much benefit. What's the hardship in checking the return values
of your IO calls? Or iterating over the file, which is a more elegant
solution in many cases.

Nov 22 '07 #5

Steven D'Aprano

On Wed, 21 Nov 2007 15:17:14 -0800, braver wrote:

I'd like to check, for a filehandle f, that EOF has been reached on it.
What's the way to do it? I don't want to try/except on EOF, I want to
check, after I read a line, that now we're in the EOF state.

Why? For some file-like objects, the OS can't tell if you're at EOF until
you actually try to read.

The usual way to deal with EOF in Python is not to bother.

fp = open("filename", "r")
for line in fp:
do_something_with(line)
# When we exit the loop, we're at EOF.
What are you doing that needs more hands-on control?

--
Steven.

Nov 22 '07 #6

Hendrik van Rooyen

"braver" <de*********@gmail.comwrote:

Well folks compare scripting languages all the time, and surely Ruby
is closer to Python than C++. Since Ruby can do f.eof, which is
easily found in its references, and Python can't, or its EOF can't
easily be found -- the one *equivalent* to a semantically clear
Ruby's, or Pascal's IIRC, f.eof -- something's missing here...

Why do I have to count sizes of lines read and compare it to some
filesize or do other weird tricks just to see, in a way not changing
my input stream, whether it's at the, well, EOF?

The man has a valid point -

In general, Python allows you, for an object f,
to write:

f.SomethingNew = 'Whatever the hell you like'

However, when f is a builtin file object,
it barfs on an AttributeError.

So he can't even help himself by setting his
own EOF attribute to False initially, and
to True when he sees an empty string.

Is there a reason for this Bondage style?

- Hendrik

Nov 22 '07 #7

braver

On Nov 22, 10:37 am, Wayne Brehaut <wbreh...@mcsnet.cawrote:

As others have already pointed out, "because it's seldom necessary in Python".

You know what? I've read this many times, and it's a lot of self-
congratulation. There's lot of things which can be useful in Python.

This lack of EOF is inconvenient, there's nothing "Python way" about
it, as a simple search on "EOF" or "eof" in this group demonstrates.
Just a few threads:

http://groups.google.com/group/comp....25388487b3ac7b

Here folks fight the same problem in one way uglier than another.
Iterators and whatnot are thrown at a poor li'l EOF:

http://groups.google.com/group/comp....9d612e99f67561

The recurrence of the question simply shows Python lacks an f.eof()
method. That's all!

Why do we may need it? The last thread shows clearly -- because the
last line processing is often a special case. After I couldn't find
the f.eof() in Python docs, I've quickly changed the logic to this
(extracts):

filesize = os.stat(filename)[6]
cur_size = 0

def eof():
return cur_size == filesize

def add_line(line):
self.cur_size += len(line)
...
if eof():
print >>sys.stderr, "* * * eof * * *"

Basically, it allows me to write what I want, and it's natural to ask
for eof() in this case -- and for a language to provide it. Ruby has
all iterators-schmiterators you want and a kitchen sink, yet provides
the said eof() without much fuss and 100 threads on c.l.*. :)

"Python is not Ruby."

Both Python and Ruby happily agree here:

>"Python" == "Ruby"

=false
In [11]: "Python" == "Ruby"
Out[11]: False

There's nothing special about Python except indentation, which gets
screwed up between editors all the time. (It's much easier to flip-
flop between TextMate and Emacs with Ruby than with Python, without
setting your tabs and spaces pedantically.) It's faster than Ruby,
otherwise they're similar. When Ruby X.Y gets faster, it'll be a
tough call to between 'em. I use Python to accomplish things I know
how, with algorithms which work and proven control logic, so this is
reasonable to ask about certain control flow equivalents. And
comparisons will always be good. :)

Cheers,
Alexy

Nov 22 '07 #8

braver

On Nov 22, 5:08 am, I V <ivle...@gmail.comwrote:

On Wed, 21 Nov 2007 17:06:15 -0800, braver wrote:
It looks like ruby internally buffers the stream itself, which is how
come it can support this. According to the docs:

"Note that IO#eof? reads data to a input buffer."

http://www.ruby-doc.org/core/classes/IO.html#M002309

I'd imagine that's inefficient in a lot of cases; and I don't think it
provides much benefit. What's the hardship in checking the return values
of your IO calls? Or iterating over the file, which is a more elegant
solution in many cases.

Exactly. But in many cases, built-in look-ahead is convenient. The
threads I cite in reply to Wayne are essentially trying to reimplement
the look-ahead with iterators -- and boy, it doesn't look pretty.

In many cases, you want to do this:

for line in f:
<do something with the line, setup counts and things>
if line % 1000 == 0 or f.eof(): # eof() doesn't exist in Python
yet!
<use the setup variables and things to process the chunk>

My control logic summarizes every 1000 lines of a file. I have to
issue the summary after each 1000 lines, or whatever incomplete tail
chunk remains. If I do it after the for loop, I have to refactor my
logic into a procedure to call it twice. Now I want to avoid the
overhead of the procedure call, and generally for a script to keep it
simple. (Well I guess f.eof() is an overhead itself if it's
inefficiently done in Python due to Python's impleentation of IO --
see below.) Having eof() in this case is natural, and there's no
legitimate reason for Python not to provide it -- except if its IO is
implemented so that it's hard to provide such an eof()! So in that
case, instead of ideological arguments about ways Python and not
Python, it's interesting to look at why Python implements its IO so
that giving eof() is hard, while Ruby gladly and cheerfully does it!

Cheers,
Alexy

Nov 22 '07 #9

Duncan Booth

braver <de*********@gmail.comwrote:

In many cases, you want to do this:

for line in f:
<do something with the line, setup counts and things>
if line % 1000 == 0 or f.eof(): # eof() doesn't exist in Python
yet!
<use the setup variables and things to process the chunk>

My control logic summarizes every 1000 lines of a file. I have to
issue the summary after each 1000 lines, or whatever incomplete tail
chunk remains. If I do it after the for loop, I have to refactor my
logic into a procedure to call it twice. Now I want to avoid the
overhead of the procedure call, and generally for a script to keep it
simple.

This sounds like a case for writing a generator. Try this one:

----- begin chunks.py -------
import itertools
def chunks(f, size):
iterator = iter(f)
def onechunk(line):
yield line
for line in itertools.islice(iterator, size-1):
yield line
for line in iterator:
yield onechunk(line)

for chunk in chunks(open('chunks.py'), 3):
for n, line in enumerate(chunk):
print "%d:%s" % (n,line.rstrip())
print "---------------"
print "done"
#eof
------ end chunks.py --------

Ths output when you run this is:

C:\Temp>chunks.py
0:import itertools
1:def chunks(f, size):
2: iterator = iter(f)
---------------
0: def onechunk(line):
1: yield line
2: for line in itertools.islice(iterator, size-1):
---------------
0: yield line
1: for line in iterator:
2: yield onechunk(line)
---------------
0:
1:for chunk in chunks(open('chunks.py'), 3):
2: for n, line in enumerate(chunk):
---------------
0: print "%d:%s" % (n,line.rstrip())
1: print "---------------"
2:print "done"
---------------
0:#eof
---------------
done

Or change it to do:

for chunk in chunks(enumerate(open('chunks.py')), 3):
for n, line in chunk:

and you get all lines numbered from 0 to 15 instead of resetting the
count each chunk.

Nov 22 '07 #10

braver

On Nov 22, 3:26 pm, Duncan Booth <duncan.bo...@invalid.invalidwrote:

This sounds like a case for writing a generator. Try this one: [...]

Thanks, Duncan! Really cool & useful. And yield is the Ruby way,
too! (Wayne -- :P).
Cheers,
Alexy

Nov 22 '07 #11

Duncan Booth wrote:

import itertools
def chunks(f, size):
iterator = iter(f)
def onechunk(line):
yield line
for line in itertools.islice(iterator, size-1):
yield line
for line in iterator:
yield onechunk(line)

Quite simpler, and provides chunk# as well :)

def chunked(chunksize,f) :
from itertools import count,groupby
counter=count(chunksize).next
return groupby(f,lambda _ : counter()/chunksize)

>
for chunk in chunks(open('chunks.py'), 3):
for n, line in enumerate(chunk):
print "%d:%s" % (n,line.rstrip())
print "---------------"
print "done"
#eof
------ end chunks.py --------

Ths output when you run this is:

C:\Temp>chunks.py
0:import itertools
1:def chunks(f, size):
2: iterator = iter(f)
---------------
0: def onechunk(line):
1: yield line
2: for line in itertools.islice(iterator, size-1):
---------------
0: yield line
1: for line in iterator:
2: yield onechunk(line)
---------------
0:
1:for chunk in chunks(open('chunks.py'), 3):
2: for n, line in enumerate(chunk):
---------------
0: print "%d:%s" % (n,line.rstrip())
1: print "---------------"
2:print "done"
---------------
0:#eof
---------------
done

Or change it to do:

for chunk in chunks(enumerate(open('chunks.py')), 3):
for n, line in chunk:

and you get all lines numbered from 0 to 15 instead of resetting the
count each chunk.

Nov 22 '07 #12

def chunked(chunksize,f) :

from itertools import count,groupby
counter=count(chunksize).next
return groupby(f,lambda _ : counter()/chunksize)

And more to the point, no "yield" for Alexy to mock :)

Nov 22 '07 #13

Duncan Booth

Boris Borcic <bb*****@gmail.comwrote:

Duncan Booth wrote:
>import itertools
def chunks(f, size):
iterator = iter(f)
def onechunk(line):
yield line
for line in itertools.islice(iterator, size-1):
yield line
for line in iterator:
yield onechunk(line)

Quite simpler, and provides chunk# as well :)

def chunked(chunksize,f) :
from itertools import count,groupby
counter=count(chunksize).next
return groupby(f,lambda _ : counter()/chunksize)

Nice, thank you. But why 'count(chunksize)' rather than just 'count()'?
Does it make a difference anywhere? And I'd recommend using // rather than
/ otherwise it breaks if you do 'from __future__ import division':

def chunked(chunksize,f) :
from itertools import count,groupby
counter=count().next
return groupby(f,lambda _ : counter()//chunksize)

Nov 22 '07 #14

On 2007-11-22, braver <de*********@gmail.comwrote:

On Nov 22, 10:37 am, Wayne Brehaut <wbreh...@mcsnet.cawrote:
>As others have already pointed out, "because it's seldom
necessary in Python".

You know what? I've read this many times, and it's a lot of
self- congratulation. There's lot of things which can be
useful in Python.

But not all useful things should be in a library. Your use case
for eof may yield to a more general solution.

This lack of EOF is inconvenient, there's nothing "Python way"
about it, as a simple search on "EOF" or "eof" in this group
demonstrates. Just a few threads:

http://groups.google.com/group/comp....25388487b3ac7b

That person simply didn't know Python's idioms for processing
files.

Here folks fight the same problem in one way uglier than
another. Iterators and whatnot are thrown at a poor li'l EOF:

http://groups.google.com/group/comp....9d612e99f67561

The recurrence of the question simply shows Python lacks an
f.eof() method. That's all!

Why do we may need it? The last thread shows clearly -- because the
last line processing is often a special case. After I couldn't
find the f.eof() in Python docs, I've quickly changed the logic
to this (extracts):

You'll be better off with a general-purpose generator that allows
special handling for the last item.

http://groups.google.com/group/comp....4?dmode=source

filesize = os.stat(filename)[6]
cur_size = 0

def eof():
return cur_size == filesize

def add_line(line):
self.cur_size += len(line)
...
if eof():
print >>sys.stderr, "* * * eof * * *"

For text mode files, the number of characters is not always equal
to the file's size in bytes.

There's nothing special about Python except indentation, which
gets screwed up between editors all the time. (It's much
easier to flip- flop between TextMate and Emacs with Ruby than
with Python, without setting your tabs and spaces
pedantically.)

That horse is dead, buried, decayed, and there's a fig tree
growing out of the gravesight. Have a fig.

It's faster than Ruby, otherwise they're similar. When Ruby
X.Y gets faster, it'll be a tough call to between 'em. I use
Python to accomplish things I know how, with algorithms which
work and proven control logic, so this is reasonable to ask
about certain control flow equivalents. And comparisons will
always be good. :)

Language comparisons are sometimes good. They are best when
they are free of FUD.

--
Neil Cerutti

Nov 22 '07 #15

Duncan Booth wrote:

Nice, thank you.

Welcome.

But why 'count(chunksize)' rather than just 'count()'?

To number the first chunk 1, etc. using count() starts with 0. Matter of taste.

Does it make a difference anywhere? And I'd recommend using // rather than
/ otherwise it breaks if you do 'from __future__ import division':

Right.

>
def chunked(chunksize,f) :
from itertools import count,groupby
counter=count().next
return groupby(f,lambda _ : counter()//chunksize)

Nov 22 '07 #16

braver

On Nov 22, 5:32 pm, Neil Cerutti <horp...@yahoo.comwrote:

There's nothing special about Python except indentation, which
gets screwed up between editors all the time. (It's much
easier to flip- flop between TextMate and Emacs with Ruby than
with Python, without setting your tabs and spaces
pedantically.)

That horse is dead, buried, decayed, and there's a fig tree
growing out of the gravesight. Have a fig.

(Well, TextMate is pretty new, and I've just got a brand new Carbon
Emacs-devel from ports. And tabs don't match in a Python bundle and
the Python mode. Have to fix'em tabs. Chews a fig, mumbles to
himself... :)

Language comparisons are sometimes good. They are best when
they are free of FUD.

So why Python's IO cannot yield f.eof() as easily as Ruby's can? :)

Nov 22 '07 #17

J. Clifford Dyer

On Thu, Nov 22, 2007 at 06:53:59AM -0800, braver wrote regarding Re: eof:

>
Language comparisons are sometimes good. They are best when
they are free of FUD.

So why Python's IO cannot yield f.eof() as easily as Ruby's can? :)

Because that's not how you compare languages. You compare languages by stating what you are actually trying to do, and figuring out the most natural solution in each language. Not "I can do this in x--how come I can't do it in y?"

Cheers,
Cliff

Nov 22 '07 #18

Diez B. Roggisch

braver schrieb:

On Nov 22, 5:32 pm, Neil Cerutti <horp...@yahoo.comwrote:

>>There's nothing special about Python except indentation, which
gets screwed up between editors all the time. (It's much
easier to flip- flop between TextMate and Emacs with Ruby than
with Python, without setting your tabs and spaces
pedantically.)
That horse is dead, buried, decayed, and there's a fig tree
growing out of the gravesight. Have a fig.

(Well, TextMate is pretty new, and I've just got a brand new Carbon
Emacs-devel from ports. And tabs don't match in a Python bundle and
the Python mode. Have to fix'em tabs. Chews a fig, mumbles to
himself... :)

Which is the reason one should use spaces.

>Language comparisons are sometimes good. They are best when
they are free of FUD.

So why Python's IO cannot yield f.eof() as easily as Ruby's can? :)

Because that requires buffering, something that affects speed. Are you
willing to sacrifice the speed for _all_ usecases just for the _few_
that would actually benefit from the eof()? I myself have seldomly found
the need for eof() - but permanently used the generator style of
line-producing files implement.

Considering your own repeated remarks about "I'd only use ruby if it
wasn't slower than Python", I'd think you could value that.

And you have been shown clear, concise solutions to your problem. Which
add the benefit of working in general stream scenarios, not only with
actual files. Granted, they aren't part of the stdlib - but then, lots
of things aren't.

Diez

Nov 22 '07 #19

braver

On Nov 22, 6:08 pm, "J. Clifford Dyer" <j...@sdf.lonestar.orgwrote:

So why Python's IO cannot yield f.eof() as easily as Ruby's can? :)

Because that's not how you compare languages. You compare languages by stating what you are actually trying to do, and figuring out the most natural solution in each language. Not "I can do this in x--how come I can't do it in y?"

Python doesn't have f.eof() because it doesn't compare to Ruby? Or
because I'm trying to compare them? :) That's giving up to Ruby too
early!

Ruby has iterators and generators too, but it also has my good ol'
f.eof(). I challenge the assumption here of some majectically Python-
wayist spirit forbidding Python to have f.eof(), while Ruby, which has
all the same features, has it. Saying "it's not the Python way" is
not a valid argument.

The suspicion lurking in the thread above was that that has to do with
Python IO buffering, that it somehow can't tell the f.eof() with
automatic look-ahead/push-back/simulate read, as transparently an
effectively as (in practice) Ruby does without much fuss. The reason
why such a useful feature -- useful not in Ruby or Perl or Pascal, but
algorithmically -- is not present in Python is a recurrent mystery,
evidenced in this group recurrently.

Cheers,
Alexy

Nov 22 '07 #20

braver

On Nov 22, 6:10 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:

Granted, they aren't part of the stdlib - but then, lots
of things aren't.

As Hendrik noticed, I can't even add my own f.eof() if I want to have
buffering -- is that right? The tradeoff between speed and
convenience is something I'd rather determine and enable myself, if I
have the right tools.

Cheers,
Alexy

Nov 22 '07 #21

Hrvoje Niksic

"Diez B. Roggisch" <de***@nospam.web.dewrites:

>>Language comparisons are sometimes good. They are best when
they are free of FUD.

So why Python's IO cannot yield f.eof() as easily as Ruby's can? :)

Because that requires buffering, something that affects speed.

I don't get it, Python's files are implemented on top of stdio FILE
objects, which do buffering and provide EOF checking (of the sort
where you can check if a previous read hit the EOF, but still). Why
not export that functionality?

Considering your own repeated remarks about "I'd only use ruby if it
wasn't slower than Python", I'd think you could value that.

I see no reason why exposing the EOF check would slow things down.

Nov 22 '07 #22

J. Clifford Dyer

On Thu, Nov 22, 2007 at 07:17:41AM -0800, braver wrote regarding Re: eof:

>
On Nov 22, 6:08 pm, "J. Clifford Dyer" <j...@sdf.lonestar.orgwrote:

So why Python's IO cannot yield f.eof() as easily as Ruby's can? :)

Because that's not how you compare languages. You compare languages by stating what you are actually trying to do, and figuring out the most natural solution in each language. Not "I can do this in x--how come I can't do it in y?"

Python doesn't have f.eof() because it doesn't compare to Ruby? Or
because I'm trying to compare them? :) That's giving up to Ruby too
early!

No and no, to your two questions. I'm not giving up on Ruby at all. In fact, I've never tried Ruby. My point isn't that the languages don't compare. My point is that your question shouldn't be "why doesn't python have an eof method on its file objects?" Your question should be "I want to do something different with the last line of a file that I iterate over. How do I do that best in python?" You've been given a couple solutions, and a very valid reason (performance) why buffered file objects are not the default. You may also consider trying subclassing file with a buffered file object that provides self.eof. (I recommend making it an attribute rather than a method. Set it when you hit eof.) That way you have the fast version, and the robust version.

You may find something of interest in the for/else construction as well

for line in file:
pass
else:
# This gets processed at the end unless you break out of the for loop.
pass

Ruby has iterators and generators too, but it also has my good ol'
f.eof(). I challenge the assumption here of some majectically Python-
wayist spirit forbidding Python to have f.eof(), while Ruby, which has
all the same features, has it. Saying "it's not the Python way" is
not a valid argument.

No, but showing a different python way is more valid, and if you were more forthcoming about your use case from the get-go, you would have gotten fewer vague answers.

The suspicion lurking in the thread above was that that has to do with
Python IO buffering, that it somehow can't tell the f.eof() with
automatic look-ahead/push-back/simulate read, as transparently an
effectively as (in practice) Ruby does without much fuss. The reason
why such a useful feature -- useful not in Ruby or Perl or Pascal, but
algorithmically -- is not present in Python is a recurrent mystery,
evidenced in this group recurrently.

A mystery which has been answered a couple times in this thread--it causes a performance hit, and python is designed so that you don't suffer that performance hit, unless you want it, so you have to program for it yourself.

You yourself said that performance is a complaint of yours regarding Ruby, so why claim that Ruby's way is clearly better in a case where it causes a known performance hit?

Cheers,
Alexy

Cheers,
Cliff

Nov 22 '07 #23

Hrvoje Niksic

braver <de*********@gmail.comwrites:

On Nov 22, 6:10 pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:
>Granted, they aren't part of the stdlib - but then, lots
of things aren't.

As Hendrik noticed, I can't even add my own f.eof() if I want to
have buffering -- is that right?

You can, you just need to inherit from built-in file type. Then
instances of your class get the __dict__ and with it the ability to
attach arbitrary information to any instance. For example:

class MyFile(file):
def __init__(self, *args, **kwds):
file.__init__(self, *args, **kwds)
self.eof = False

def read(self, size=None):
if size is None:
val = file.read(self)
self.eof = True
else:
val = file.read(self, size)
if len(val) < size:
self.eof = True
return val

def readline(self, size=None):
if size is None:
val = file.readline(self)
else:
val = file.readline(self, size)
if len(val) == 0:
self.eof = True
return val

The code needed to support iteration is left as an excercise for the
reader.

Nov 22 '07 #24

braver

On Nov 22, 6:40 pm, "J. Clifford Dyer" <j...@sdf.lonestar.orgwrote:

You yourself said that performance is a complaint of yours regarding Ruby, so why claim that Ruby's way is clearly better in a case where it causes a known performance hit?

See Hrvoje's remark above -- we can have EOF and eat it too! Perhaps
it was just somehow omitted from the standard Python library because
it disliked by some folks or just forgotten. Is there a history of
the eof()'s fall from grace? Was it ever considered for inclusion?

Cheers,
Alexy

Nov 22 '07 #25

Thomas Bellman

Hrvoje Niksic <hn*****@xemacs.orgwrote:

I don't get it, Python's files are implemented on top of stdio FILE
objects, which do buffering and provide EOF checking (of the sort
where you can check if a previous read hit the EOF, but still). Why
not export that functionality?

Alexy wants to check if a file is about to hit EOF, not if it has
already hit EOF. Stdio does not support that. Nor does Unix.
The only way to check if the file is about to hit EOF, is to
actually perform a read and see if it returns zero bytes. That
is usually OK to do on plain files, but on pipes, sockets or
terminals, you would have major problems, since suddenly calling
the eof() method would block the process. Probably not what you
were expecting.
--
Thomas Bellman, Lysator Computer Club, Linköping University, Sweden
"Life IS pain, highness. Anyone who tells ! bellman @ lysator.liu.se
differently is selling something." ! Make Love -- Nicht Wahr!

Nov 22 '07 #26

On 2007-11-22, Hrvoje Niksic <hn*****@xemacs.orgwrote:

"Diez B. Roggisch" <de***@nospam.web.dewrites:

>>>Language comparisons are sometimes good. They are best when
they are free of FUD.

So why Python's IO cannot yield f.eof() as easily as Ruby's can? :)

Because that requires buffering, something that affects speed.

I don't get it, Python's files are implemented on top of stdio
FILE objects, which do buffering and provide EOF checking (of
the sort where you can check if a previous read hit the EOF,
but still). Why not export that functionality?

You have to make a failed read attempt before feof returns true.

>Considering your own repeated remarks about "I'd only use ruby
if it wasn't slower than Python", I'd think you could value
that.

I see no reason why exposing the EOF check would slow things down.

I think it's too low level, and so doesn't do what naive users
expect. It's really only useful, even in C, as part of the
forensic study of a stream in an error state, yet naive C
programmers often write code like:

while (!f.feof()) {
/* Read a line and process it.
}

....and are flumoxed by the way it fails to work.

I think Python is well rid of such a seldomly useful source of
confusion.

--
Neil Cerutti

Nov 22 '07 #27

greg

Hendrik van Rooyen wrote:

So he can't even help himself by setting his
own EOF attribute to False initially, and
to True when he sees an empty string.

Is there a reason for this Bondage style?

There's a fair amount of overhead associated with providing
the ability to set arbitrary attributes on an object, which
is almost never wanted for built-in types, so it's not
provided by default.

You can easily get it if you want it by defining a Python
subclass of the type concerned.

--
Greg

Nov 23 '07 #28

On Nov 22, 11:04 am, Neil Cerutti <horp...@yahoo.comwrote:

I think it's too low level, and so doesn't do what naive users
expect. It's really only useful, even in C, as part of the
forensic study of a stream in an error state, [...]

Indeed. I just wrote a little implementation of an IPS patcher for the
ips patches used on many old game roms (snes, genesis) for doing fan
translations from Japanese to other languages. The basic format of a
patch is the ascii header "PATCH", followed by 3 bytes telling offest
into datafile to apply patch chunk, 2 bytes telling chunk size, n
bytes of chunk, repeated, with final ascii "EOF" footer. As I was
using Haskell, the function was recursive, and it was useful to check
that "EOF" were the final bytes read and that no more bytes had been
read between the last data chunk and eof. In other words, on the
corner case that all the data in the patch was structurally valid,
except up to two bytes after the last chunk and before the "EOF",
checking that the absolute position in the file was eof gave me the
ability to differentiate the error states of the patch lacking the
closing ascii "EOF", or including extra data between the last chunk
and the "EOF." Without checking eof (or doing something more complex),
I would have only been able to detect the error as a missing footer.

Regards,
Jordan

Nov 23 '07 #29

braver

On Nov 22, 8:04 pm, Neil Cerutti <horp...@yahoo.comwrote:

I think Python is well rid of such a seldomly useful source of
confusion.

So all that code folks wrote in Algol-like languages, -- e.g. this
works in Ada, --

while not End_of_File(f) loop
--
end if;

-- are confusing? Why not interpret it as reading until ^D on a
pipe? And plain files work fine. (What would Ruby do?:)
Historically, is it possible to trace the eof-related design decision
in stdlib? Most languages have the look-ahead eof, so when Python was
codified, there should gave been some decisions made.

Can we say that f.eof() in fact can check for EOF right after we've
read all characters from a file, but before a failed attempt to read
beyond? In Python's idiom,

for line lin file:
# look at a line
# we can tell eof occurs right here after the last line

After the last line, we've read all bytes but didn't try a new line
yet -- is it the semantics of the for line in file:? I assume it'll
do the right thing if our file ends in \n. What if the last line is
not \n-terminated?

Cheers,
Alexy

Nov 24 '07 #30

On 2007-11-23, braver <de*********@gmail.comwrote:

Can we say that f.eof() in fact can check for EOF right after
we've read all characters from a file, but before a failed
attempt to read beyond? In Python's idiom,

for line lin file:
# look at a line
# we can tell eof occurs right here after the last line

After the last line, we've read all bytes but didn't try a new
line yet -- is it the semantics of the for line in file:?

Yes. After the above construction, there's no need to check for
eof.

I assume it'll do the right thing if our file ends in \n. What
if the last line is not \n-terminated?

Nothing bad happens as far as I know, but it may depend on the
underlying clib.

--
Neil Cerutti

Nov 24 '07 #31

greg

braver wrote:

Historically, is it possible to trace the eof-related design decision
in stdlib?

You seem to be assuming that someone started out with a design
that included an eof() of the kind you want, and then decided
to remove it.

But I doubt that such a method was ever considered in the first
place. Someone used to programming with the C stdlib doesn't
think in terms of testing for EOF separately from reading,
because the C stdlib doesn't work that way.

Pascal started out with an eof() function because the earliest
implementations only worked with disk files. Later, when people
tried to make Pascal programs work interactively, they found
out that it was a mistake, as it provides opportunities such
as the following classic wrong way to read interactive input
in Pascal:

while not eof(input) do begin
write('Enter some data: ');
readln(input, line);
end

which stops and waits for input before printing the first
prompt.

By not providing an eof() function, C -- and Python -- make
it clear that testing for eof is not a passive operation.
It's always obvious what's going on, and it's much harder to
make mistakes like the above.

when Python was
codified, there should gave been some decisions made.

Some decisions were made when the C stdlib was designed, and
I think they were the right ones. Python wisely followed them.

for line lin file:
# look at a line
# we can tell eof occurs right here after the last line

No, if the line you just read ends in "\n", you don't know whether
eof has been reached until the for-loop tries to read another line.

--
Greg

Nov 24 '07 #32

On Nov 23, 6:56 pm, greg <g...@cosc.canterbury.ac.nzwrote:

By not providing an eof() function, C -- and Python -- make
it clear that testing for eof is not a passive operation.
It's always obvious what's going on, and it's much harder to
make mistakes like the above.

err...C has feof() in stdio (see "man 3 ferror").

Regards,
Jordan

Nov 24 '07 #33

hdante

On Nov 22, 1:17 pm, braver <delivera...@gmail.comwrote:

Ruby has iterators and generators too, but it also has my good ol'
f.eof(). I challenge the assumption here of some majectically Python-

Ruby doesn't have the good ol' eof. Good old eof tests a single flag
and requires a pre read(). Ruby's eof blocks and does buffering (and
this is a very strong technical statement). I find it ok that ruby
subverts the good old eof, but it's unaceptable for python to do so.

Besides, it's probable that your code could work with the following
construct.

def try_read(f):
line = f.readline()
eof = (line == '')
return (line, eof)
def xor(a, b):
return a and not b or b and not a

count = 0
while True:
next_line, eof = try_read(f)
if not eof:
count += 1
line = next_line
process(line)
if xor(count % 1000 == 0, eof):
summarize(count, line)
if eof:
break

wayist spirit forbidding Python to have f.eof(), while Ruby, which has
all the same features, has it. Saying "it's not the Python way" is
not a valid argument.

Yes, it is.

>
Cheers,
Alexy

Nov 24 '07 #34

On Nov 23, 10:00 pm, "hda...@gmail.com" <hda...@gmail.comwrote:

Ruby doesn't have the good ol' eof. Good old eof tests a single flag
and requires a pre read(). Ruby's eof blocks and does buffering (and
this is a very strong technical statement).

Actually, to be a bit more technical, IO#eof acts like standard C eof
for File objects, it only blocks / requires a previous read() on
character devices and pipes and such. For files, it's the same as
checking the absolute position of the file stream: f.tell ==
File.size(f.path).

Of course, the same can be done in python quite easily (and probably
better implemented):

f.tell() == os.stat(f.name).st_size

I don't honestly see what the big deal is about including or excluding
an eof function / method in python. If you need it, it is easy to
implement, and like yourself and others have shown, you usually don't
need it.

Regards,
Jordan

Nov 24 '07 #35

hdante

On Nov 24, 2:24 am, MonkeeSage <MonkeeS...@gmail.comwrote:

>
Actually, to be a bit more technical, IO#eof acts like standard C eof
for File objects, it only blocks / requires a previous read() on
character devices and pipes and such. For files, it's the same as
checking the absolute position of the file stream: f.tell ==
File.size(f.path).

This is not the same as ISO C. f.tell could be equal to
File.size(f.path) and eof could be false. An extra read() is required.

>
Regards,
Jordan

Nov 24 '07 #36

On Nov 23, 10:43 pm, "hda...@gmail.com" <hda...@gmail.comwrote:

This is not the same as ISO C. f.tell could be equal to
File.size(f.path) and eof could be false. An extra read() is required.

My bad. As you might have surmised, I'm not a genius when it comes to
C. I thought that the eof flag was set when the pointer into the
stream was the same as the length of the stream, but I guess it makes
since that as an error flag, it would only be set after an attempted
read past the end of the stream (in which case, I guess you'd get a
NULL from the read, equivalent to python's empty string?).

Ps. braver, if you really want a ruby-like eof method on file objects
in python, how about overriding the open() alias with a file subclass
including eof?

import os

class open(file):
def __init__(self, name):
self.size = os.stat(name).st_size
file.__init__(self, name)
def eof(self):
return self.tell() == self.size

f = open('tmp.py')
print f.eof() # False
f.read()
print f.eof() # True

Regards,
Jordan

Nov 24 '07 #37

greg

Dennis Lee Bieber wrote:

Pascal I/O worked with a "one element preread", where what we'd
consider a read operation was performed by the open operation -- which
made console I/O a royal pain

Yep. Later implementations reduced the pain somewhat by
using a "lazy" scheme which deferred the read until
you tried to do something with the buffer. But they
couldn't completely hide the fact that testing for eof
requires a lookahead.

Original Pascal uses

f = open(somefile)
do something with f^
read(f)

Actually, I think it was get(f) to read the next record
into the buffer. Read(f, x) was a higher-level procedure
equivalent to something like

x = f^;
get(f)

Plus for text files read() and write() did various other
fancy things.

--
Greg

Nov 24 '07 #38

samwyse

On Nov 23, 2:06 am, greg <g...@cosc.canterbury.ac.nzwrote:

There's a fair amount of overhead associated with providing
the ability to set arbitrary attributes on an object, which
is almost never wanted for built-in types, so it's not
provided by default.

You can easily get it if you want it by defining a Python
subclass of the type concerned.

Speaking of which, I've got a big file:

>>input = open('LoTR.iso')

I'd like to get the md5 hash of the file:

>>import md5
m = md5.new()

I've also got this nifty standard module which will allow me, among
other things, to copy an arbitrary file:

>>import shutil

I'd like to say copy the file to my object, but it doesn't quite work:

>>shutil.copyfileobj(input, m)

Traceback (most recent call last):
File "<pyshell#20>", line 1, in <module>
shutil.copyfileobj(source, m)
File "C:\Python25\lib\shutil.py", line 24, in copyfileobj
fdst.write(buf)
AttributeError: '_hashlib.HASH' object has no attribute 'write'

No problem, I'll just add an attribute:

>>setattr(m, 'write', m.update)

Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
setattr(m, 'write', m.update)
AttributeError: '_hashlib.HASH' object has no attribute 'write'

Anyone have an example of how to efficiently do this? Thanks!

Nov 24 '07 #39

hd****@gmail.com wrote:

def xor(a, b):
return a and not b or b and not a

>>from operator import xor
help(xor)

Help on built-in function xor in module operator:

xor(...)
xor(a, b) -- Same as a ^ b.

Nov 26 '07 #40

ZeD

Grant Edwards wrote:

The user-defined xor is operates on "logical" boolean values.
The one in the operator module is a bitwise operator.

def xor(a, b):
return bool(a) ^ bool(b)

seems more explicit to me.
maybe, to make "more" explicit (too much, onestly...)

from operator import xor as bitwise_xor

def logical_xor(a, b):
return bitwise_xor(bool(a), bool(b))

--
Under construction

Nov 26 '07 #41