473,386 Members | 1,793 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Checking for EOF in stream

Hi!

Classic situation - I have to process an input stream of unknown length
until a I reach its end (EOF, End Of File). How do I check for EOF? The
input stream can be anything from opened file through sys.stdin to a
network socket. And it's binary and potentially huge (gigabytes), thus
"for line in stream.readlines()" isn't really a way to go.

For now I have roughly:

stream = sys.stdin
while True:
data = stream.read(1024)
process_data(data)
if len(data) < 1024: ## (*)
break

I smell a fragile point at (*) because as far as I know e.g. network
sockets streams may return less data than requested even when the socket
is still open.

I'd better like something like:

while not stream.eof():
...

but there is not eof() method :-(

This is probably a trivial problem but I haven't found a decent solution.

Any hints?

Thanks!

GiBo
Feb 20 '07 #1
8 17774
On 2007-02-19, GiBo <gi**@gentlemail.comwrote:
Hi!

Classic situation - I have to process an input stream of unknown length
until a I reach its end (EOF, End Of File). How do I check for EOF? The
input stream can be anything from opened file through sys.stdin to a
network socket. And it's binary and potentially huge (gigabytes), thus
"for line in stream.readlines()" isn't really a way to go.

For now I have roughly:

stream = sys.stdin
while True:
data = stream.read(1024)
if len(data) == 0:
break #EOF
process_data(data)
--
Grant Edwards grante Yow! CALIFORNIA is where
at people from IOWA or NEW
visi.com YORK go to subscribe to
CABLE TELEVISION!!
Feb 20 '07 #2
Grant Edwards wrote:
On 2007-02-19, GiBo <gi**@gentlemail.comwrote:
>Hi!

Classic situation - I have to process an input stream of unknown length
until a I reach its end (EOF, End Of File). How do I check for EOF? The
input stream can be anything from opened file through sys.stdin to a
network socket. And it's binary and potentially huge (gigabytes), thus
"for line in stream.readlines()" isn't really a way to go.

For now I have roughly:

stream = sys.stdin
while True:
data = stream.read(1024)
if len(data) == 0:
break #EOF
> process_data(data)
Right, not a big difference though. Isn't there a cleaner / more
intuitive way? Like using some wrapper objects around the streams or
something?

GiBo

Feb 20 '07 #3
En Mon, 19 Feb 2007 21:50:11 -0300, GiBo <gi**@gentlemail.comescribió:
Grant Edwards wrote:
>On 2007-02-19, GiBo <gi**@gentlemail.comwrote:
>>>
Classic situation - I have to process an input stream of unknown length
until a I reach its end (EOF, End Of File). How do I check for EOF? The
input stream can be anything from opened file through sys.stdin to a
network socket. And it's binary and potentially huge (gigabytes), thus
"for line in stream.readlines()" isn't really a way to go.

For now I have roughly:

stream = sys.stdin
while True:
data = stream.read(1024)
if len(data) == 0:
break #EOF
>> process_data(data)

Right, not a big difference though. Isn't there a cleaner / more
intuitive way? Like using some wrapper objects around the streams or
something?
Read the documentation... For a true file object:
read([size]) ... An empty string is returned when EOF is encountered
immediately.
All the other "file-like" objects (like StringIO, socket.makefile, etc)
maintain this behavior.
So this is the way to check for EOF. If you don't like how it was spelled,
try this:

if data=="": break

If your data is made of lines of text, you can use the file as its own
iterator, yielding lines:

for line in stream:
process_line(line)

--
Gabriel Genellina

Feb 20 '07 #4
On 2007-02-20, GiBo <gi**@gentlemail.comwrote:
>>stream = sys.stdin
while True:
data = stream.read(1024)
if len(data) == 0:
break #EOF
>> process_data(data)

Right, not a big difference though. Isn't there a cleaner /
more intuitive way?
A file is at EOF when read() returns ''. The above is the
cleanest, simplest, most direct way to do what you specified.
Everybody does it that way, and everybody recognizes what's
being done.

It's also the "standard, Pythonic" way to do it.
Like using some wrapper objects around the streams or
something?
You can do that, but then you're mostly just obfuscating
things.

--
Grant Edwards grante Yow! Vote for ME
at -- I'm well-tapered,
visi.com half-cocked, ill-conceived
and TAX-DEFERRED!
Feb 20 '07 #5
In article <ma***************************************@python. org>, Gabriel Genellina wrote:
So this is the way to check for EOF. If you don't like how it was spelled,
try this:

if data=="": break
How about:

if not data: break

? ;-)
Feb 20 '07 #6
On 2/20/07, Nathan <ne******@gmail.comwrote:
On 2/19/07, Gabriel Genellina <ga******@yahoo.com.arwrote:
En Mon, 19 Feb 2007 21:50:11 -0300, GiBo <gi**@gentlemail.comescribió:
Grant Edwards wrote:
>On 2007-02-19, GiBo <gi**@gentlemail.comwrote:
>>>
>>Classic situation - I have to process an input stream of unknown length
>>until a I reach its end (EOF, End Of File). How do I check for EOF?The
>>input stream can be anything from opened file through sys.stdin to a
>>network socket. And it's binary and potentially huge (gigabytes), thus
>>"for line in stream.readlines()" isn't really a way to go.
>>>
>>For now I have roughly:
>>>
>>stream = sys.stdin
>>while True:
>> data = stream.read(1024)
> if len(data) == 0:
> break #EOF
>> process_data(data)
>
Right, not a big difference though. Isn't there a cleaner / more
intuitive way? Like using some wrapper objects around the streams or
something?
Read the documentation... For a true file object:
read([size]) ... An empty string is returned when EOF is encountered
immediately.
All the other "file-like" objects (like StringIO, socket.makefile, etc)
maintain this behavior.
So this is the way to check for EOF. If you don't like how it was spelled,
try this:

if data=="": break

If your data is made of lines of text, you can use the file as its own
iterator, yielding lines:

for line in stream:
process_line(line)

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Not to beat a dead horse, but I often do this:

data = f.read(bufsize):
while data:
# ... process data.
data = f.read(bufsize)
-The only annoying bit it the duplicated line. I find I often follow
this pattern, and I realize python doesn't plan to have any sort of
do-while construct, but even still I prefer this idiom. What's the
concensus here?

What about creating a standard binary-file iterator:

def blocks_of(infile, bufsize = 1024):
data = infile.read(bufsize)
if data:
yield data
-the use would look like this:

for block in blocks_of(myfile, bufsize = 2**16):
process_data(block) # len(block) <= bufsize...

(ahem), make that iterator something that works, like:

def blocks_of(infile, bufsize = 1024):
data = infile.read(bufsize)
while data:
yield data
data = infile.read(bufsize)
Feb 20 '07 #7
On 2/19/07, Gabriel Genellina <ga******@yahoo.com.arwrote:
En Mon, 19 Feb 2007 21:50:11 -0300, GiBo <gi**@gentlemail.comescribió:
Grant Edwards wrote:
On 2007-02-19, GiBo <gi**@gentlemail.comwrote:

Classic situation - I have to process an input stream of unknown length
until a I reach its end (EOF, End Of File). How do I check for EOF? The
input stream can be anything from opened file through sys.stdin to a
network socket. And it's binary and potentially huge (gigabytes), thus
"for line in stream.readlines()" isn't really a way to go.

For now I have roughly:

stream = sys.stdin
while True:
data = stream.read(1024)
if len(data) == 0:
break #EOF
process_data(data)
Right, not a big difference though. Isn't there a cleaner / more
intuitive way? Like using some wrapper objects around the streams or
something?

Read the documentation... For a true file object:
read([size]) ... An empty string is returned when EOF is encountered
immediately.
All the other "file-like" objects (like StringIO, socket.makefile, etc)
maintain this behavior.
So this is the way to check for EOF. If you don't like how it was spelled,
try this:

if data=="": break

If your data is made of lines of text, you can use the file as its own
iterator, yielding lines:

for line in stream:
process_line(line)

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list
Not to beat a dead horse, but I often do this:

data = f.read(bufsize):
while data:
# ... process data.
data = f.read(bufsize)
-The only annoying bit it the duplicated line. I find I often follow
this pattern, and I realize python doesn't plan to have any sort of
do-while construct, but even still I prefer this idiom. What's the
concensus here?

What about creating a standard binary-file iterator:

def blocks_of(infile, bufsize = 1024):
data = infile.read(bufsize)
if data:
yield data
-the use would look like this:

for block in blocks_of(myfile, bufsize = 2**16):
process_data(block) # len(block) <= bufsize...
Feb 20 '07 #8
On Feb 19, 6:58 pm, GiBo <g...@gentlemail.comwrote:
Hi!

Classic situation - I have to process an input stream of unknown length
until a I reach its end (EOF, End Of File). How do I check for EOF? The
input stream can be anything from opened file through sys.stdin to a
network socket. And it's binary and potentially huge (gigabytes), thus
"for line in stream.readlines()" isn't really a way to go.
Could you use xreadlines()? It's a lazily-evaluated stream reader.
For now I have roughly:

stream = sys.stdin
while True:
data = stream.read(1024)
process_data(data)
if len(data) < 1024: ## (*)
break

I smell a fragile point at (*) because as far as I know e.g. network
sockets streams may return less data than requested even when the socket
is still open.
Well it depends on a lot of things. Is the stream blocking or non-
blocking (on sockets and some other sorts of streams, you can pick
this yourself)? What are the underlying semantics (reliable-and-
blocking TCP or dropping-and-unordered-UDP)? Unfortunately, you really
need to just know what you're working with (and there's really no
better solution; trying to hide the underlying semantics under a
proscribed overlaid set of semantics can only lead to badness in the
long run).
I'd better like something like:

while not stream.eof():
...

but there is not eof() method :-(

This is probably a trivial problem but I haven't found a decent solution.
For your case, it's not so hard:
http://pyref.infogami.com/EOFError says "read() and readline() methods
of file objects return an empty string when they hit EOF." so you
should assume that if something is claiming to be a file-like object
that it will work this way.
Any hints?
So:
stream = sys.stdin
while True:
data = stream.read(1024)
if data=="":
break
process_data(data)

Feb 27 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Fraser Ross | last post by:
Can I do several reading operations and then check fail() or do I need to check fail() after every reading operation. Is it possible than a read operation can set fail() and the next not? I am...
14
by: Leslaw Bieniasz | last post by:
Cracow, 3.01.2005 Hello, When opening a file stream in the append mode, either a new file is created (if a specified file does not exist), or an existing file is opened for adding stuff. Is...
11
by: Jason Heyes | last post by:
I would like to be able to extract an integer from a stream without having to write a test when I want the integer within some range. Unfortunately there is no range-checked integer type in the...
2
by: Lionel B | last post by:
I know this has probably come up frequently, but couldn't find a satisfactory reference... I have some code which needs to read from stdin but must not block waiting for input if there is no input...
2
by: pauldepstein | last post by:
How should I check that a stream (for example a .txt file or the screen -- std::cout ) is open and ready to receive input. To declare the stream as an instantiation of the ofstream class, I...
4
by: Edd | last post by:
Hello, I have an array of strings containing filenames. I must open each in turn and parse the data within. However, if a filename appears multiple times in the list it must still only be read...
2
by: Chris | last post by:
Hi, What is the most easy way to check on EOF while reading a binary file with all integers ? In lot of examples the read data are first stored in a string, and afterwards the string is...
5
by: Lyle | last post by:
Hi. What is the best way to check for EOF when using the streamreader? I am using a do loop. Following is what I have tried but all end in an error when eof is reached. 1. Adding 'until line...
125
by: jacob navia | last post by:
We hear very often in this discussion group that bounds checking, or safety tests are too expensive to be used in C. Several researchers of UCSD have published an interesting paper about this...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.