By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,089 Members | 2,418 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,089 IT Pros & Developers. It's quick & easy.

Making a copy (not reference) of a file handle,or starting stdin over at line 0

P: n/a
I wrote a script which will convert a tab-delimited file to a
fixed-width file, or a fixed-width file into a tab-delimited. It reads
a config file which defines the field lengths, and uses it to convert
either way.

Here's an example of the config file:

1:6,7:1,8:9,17:15,32:10

This converts a fixed-width file to a tab-delimited where the first
field is the first six characters of the file, the second is the
seventh, etc. Conversely, it converts a tab-delimited file to a file
where the first six characters are the first tab field, right-padded
with spaces, and so on.

What I want to do is look at the file and decide whether to run the
function to convert the file to tab or FW. Here is what works
(mostly):

x = inputFile.readline().split("\t")
inputFile.seek(0)

if len(x) 1:
toFW(inputFile)
else:
toTab(inputFile)
The problem is that my file accepts the input file via stdin (pipe) or
as an argument to the script. If I send the filename as an argument,
everything works perfectly.

If I pipe the input file into the script, it is unable to seek() it. I
tried making a copy of inputFile and doing a readline() from it, but
being a reference, it makes no difference.

How can I check a line (or two) from my input file (or stdin stream)
and still be able to process all the records with my function?

Thanks,
Shawn
Aug 17 '07 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Shawn Milochik wrote:
How can I check a line (or two) from my input file (or stdin stream)
and still be able to process all the records with my function?
One way:

from itertools import chain
firstline = instream.next()
head = [firstline]

# loop over entire file
for line in chain(head, instream):
process(line)
You can of course read more than one line as long as you append it to the
head list. Here's an alternative:

from itertools import tee
a, b = tee(instream)

for line in a:
# determine file format,
# break when done

# this is crucial for memory efficiency
# but may have no effect in implementations
# other than CPython
del a

# loop over entire file
for line in b:
# process line
Peter

Aug 17 '07 #2

This discussion thread is closed

Replies have been disabled for this discussion.