By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,818 Members | 1,223 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,818 IT Pros & Developers. It's quick & easy.

Reading from file up to arbitrary byte.

P: n/a
Hello!

Is there a simple way of reading from a file object up to a specific
byte value? I would like to do this without reading one character at a
time, or reading in chunks and holding a remainder over.

Failing that, as a lightheartedly proposed extension, what do people
think of:

my_file.readline(line_terminator = "\x7f")
or
my_file.line_terminator = "\x7f"
Does anyone know of a publically available subclass of file that would
allow such, or similar behaviour?

Thanks,
David.

Jul 18 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
"David M. Wilson" wrote:

Is there a simple way of reading from a file object up to a specific
byte value? I would like to do this without reading one character at a
time, or reading in chunks and holding a remainder over.


I'd say the simple way to do it would be to read in chunks and hold a
remainder over. It's not complicated: why don't you like that way?

Or even simpler: just use read()[:chunksize] and not worry about the
fact that you're reading all the data and throwing some away.
Performance-wise, this probably beats the pants off most alternatives,
if performance is what concerns you, and unless your file is really
big and chunksize is small, who cares about the memory that is wasted
for a few microseconds?

It might also help respondents if you describe the reason for wanting
to read the first part of the file like that. Maybe there's a more
suitable approach.

-Peter
Jul 18 '05 #2

P: n/a
Peter Hansen wrote:
I'd say the simple way to do it would be to read in chunks and hold a
remainder over. It's not complicated: why don't you like that way? Or even simpler: just use read()[:chunksize] and not worry about the
fact that you're reading all the data and throwing some away.
Performance-wise, this probably beats the pants off most alternatives,
if performance is what concerns you, and unless your file is really
big and chunksize is small, who cares about the memory that is wasted
for a few microseconds? It might also help respondents if you describe the reason for wanting
to read the first part of the file like that. Maybe there's a more
suitable approach.


Hi Peter, thanks for your reply.

I wanted to avoid keeping a remainder as I would have thought the
underlying implementation would have to do this anyway when doing
readline(). The tool I am working on reads the UK Postal Address File
(1.5gb of data), to be deployed on a small 800mhz VIA C3 server.

I have created a minimalist module for reading in the tabular data, in a
way that is as close to 'wire speed' as possible. Previously I have used
the Python 2.3 CSV module, and a C implementation of a CSV reader I
found on the web, however the data set I am dealing with has a very
basic structure, and I found the two CSV modules overly complicated for
the task.

I failed to produce something that is clean, but it does exactly what it
says on the tin and that's all I need. If you care for a nosey:

http://botanicus.net/dw/IDTDR.py.txt
David.

Jul 18 '05 #3

P: n/a
"David M. Wilson" wrote:

I failed to produce something that is clean, but it does exactly what it
says on the tin and that's all I need. If you care for a nosey:

http://botanicus.net/dw/IDTDR.py.txt


This gives a "file not found" error.
Jul 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.