472,127 Members | 1,639 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,127 software developers and data experts.

Making a copy (not reference) of a file handle,or starting stdin over at line 0

I wrote a script which will convert a tab-delimited file to a
fixed-width file, or a fixed-width file into a tab-delimited. It reads
a config file which defines the field lengths, and uses it to convert
either way.

Here's an example of the config file:

1:6,7:1,8:9,17:15,32:10

This converts a fixed-width file to a tab-delimited where the first
field is the first six characters of the file, the second is the
seventh, etc. Conversely, it converts a tab-delimited file to a file
where the first six characters are the first tab field, right-padded
with spaces, and so on.

What I want to do is look at the file and decide whether to run the
function to convert the file to tab or FW. Here is what works
(mostly):

x = inputFile.readline().split("\t")
inputFile.seek(0)

if len(x) 1:
toFW(inputFile)
else:
toTab(inputFile)
The problem is that my file accepts the input file via stdin (pipe) or
as an argument to the script. If I send the filename as an argument,
everything works perfectly.

If I pipe the input file into the script, it is unable to seek() it. I
tried making a copy of inputFile and doing a readline() from it, but
being a reference, it makes no difference.

How can I check a line (or two) from my input file (or stdin stream)
and still be able to process all the records with my function?

Thanks,
Shawn
Aug 17 '07 #1
1 2012
Shawn Milochik wrote:
How can I check a line (or two) from my input file (or stdin stream)
and still be able to process all the records with my function?
One way:

from itertools import chain
firstline = instream.next()
head = [firstline]

# loop over entire file
for line in chain(head, instream):
process(line)
You can of course read more than one line as long as you append it to the
head list. Here's an alternative:

from itertools import tee
a, b = tee(instream)

for line in a:
# determine file format,
# break when done

# this is crucial for memory efficiency
# but may have no effect in implementations
# other than CPython
del a

# loop over entire file
for line in b:
# process line
Peter

Aug 17 '07 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Mark Adams | last post: by
reply views Thread by Joshua Ginsberg | last post: by
reply views Thread by ezra epstein | last post: by
9 posts views Thread by Adi | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.