By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,402 Members | 1,047 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,402 IT Pros & Developers. It's quick & easy.

file tell in a for-loop

P: n/a
I was trying to map various locations in a file to a dictionary. At
first I read through the file using a for-loop, but tell() gave back
weird results, so I switched to while, then it worked.

The for-loop version was something like:
d = {}
for line in f:
if line.startswith('>'): d[line] = f.tell()

And the while version was:
d = {}
while 1:
line = f.readline()
if len(line) == 0: break
if line.startswith('>'): d[line] = f.tell()
In the for-loop version, f.tell() would sometimes return the same
result multiple times consecutively, even though the for-loop
apparently progressed the file descriptor. I don't have a clue why
this happened, but I switched to while loop and then it worked.

Does anyone have any ideas as to why this is so?

Thanks,
Magdoll
Nov 18 '08 #1
Share this Question
Share on Google+
3 Replies


P: n/a
On Nov 19, 7:00*am, Magdoll <magd...@gmail.comwrote:
I was trying to map various locations in a file to a dictionary. At
first I read through the file using a for-loop, but tell() gave back
weird results, so I switched to while, then it worked.

The for-loop version was something like:
* * * * * * * * d = {}
* * * * * * * * for line in f:
* * * * * * * * * * * * *if line.startswith('>'): d[line] = f.tell()

And the while version was:
* * * * * * * * d = {}
* * * * * * * * while 1:
* * * * * * * * * * * * line = f.readline()
* * * * * * * * * * * * if len(line) == 0: break
* * * * * * * * * * * * if line.startswith('>'): d[line] = f.tell()

In the for-loop version, f.tell() would sometimes return the same
result multiple times consecutively, even though the for-loop
apparently progressed the file descriptor. I don't have a clue why
this happened, but I switched to while loop and then it worked.

Does anyone have any ideas as to why this is so?

Thanks,
Magdoll
got bitten by that too a while back
the for line in f reads ahead so your f.tell would not be the position
of the end of the line
had to use a while True loop instead also
Nov 19 '08 #2

P: n/a
Magdoll wrote:
I was trying to map various locations in a file to a dictionary. At
first I read through the file using a for-loop, but tell() gave back
weird results, so I switched to while, then it worked.

The for-loop version was something like:
d = {}
for line in f:
if line.startswith('>'): d[line] = f.tell()

And the while version was:
d = {}
while 1:
line = f.readline()
if len(line) == 0: break
if line.startswith('>'): d[line] = f.tell()
In the for-loop version, f.tell() would sometimes return the same
result multiple times consecutively, even though the for-loop
apparently progressed the file descriptor. I don't have a clue why
this happened, but I switched to while loop and then it worked.

Does anyone have any ideas as to why this is so?
I suspect that at least the iterator version uses internal
buffering, so the tell() call returns the current buffer
read-location, not the current read location. I've also had
problems with tell() returning bogus results while reading
through large non-binary files (in this case about a 530 meg
text-file) once the file-offset passed some point I wasn't able
to identify. It may have to do with newline translation as this
was python2.4 on Win32. Switching to "b"inary mode resolved the
issue for me.

I created the following generator to make my life a little easier:

def offset_iter(fp):
assert 'b' in fp.mode.lower(), \
"offset_iter must have a binary file"
while True:
addr = fp.tell()
line = fp.readline()
if not line: break
yield (addr, line.rstrip('\n\r'))

That way, I can just use

f = file('foo.txt', 'b')
for offset, line in offset_iter(f):
if line.startswith('>'): d[line] = offset

This bookmarks the *beginning* (I think your code notes the
*end*) of each line that starts with ">"

-tkc

Nov 19 '08 #3

P: n/a
Gotcha. Thanks!

Magdoll

On Nov 19, 2:57*am, Tim Chase <python.l...@tim.thechases.comwrote:
Magdoll wrote:
I was trying to map various locations in a file to a dictionary. At
first I read through the file using a for-loop, buttell() gave back
weird results, so I switched to while, then it worked.
The for-loop version was something like:
* * * * * * * * d = {}
* * * * * * * * for line in f:
* * * * * * * * * * * * *if line.startswith('>'): d[line] = f.tell()
And the while version was:
* * * * * * * * d = {}
* * * * * * * * while 1:
* * * * * * * * * * * * line = f.readline()
* * * * * * * * * * * * if len(line) == 0: break
* * * * * * * * * * * * if line.startswith('>'): d[line] = f.tell()
In the for-loop version, f.tell() would sometimes return the same
result multiple times consecutively, even though the for-loop
apparently progressed the file descriptor. I don't have a clue why
this happened, but I switched to while loop and then it worked.
Does anyone have any ideas as to why this is so?

I suspect that at least the iterator version uses internal
buffering, so thetell() call returns the current buffer
read-location, not the current read location. *I've also had
problems withtell() returning bogus results while reading
through large non-binary files (in this case about a 530 meg
text-file) once the file-offset passed some point I wasn't able
to identify. *It may have to do with newline translation as this
was python2.4 on Win32. *Switching to "b"inary mode resolved the
issue for me.

I created the following generator to make my life a little easier:

* *def offset_iter(fp):
* * *assert 'b' in fp.mode.lower(), \
* * * *"offset_iter must have a binary file"
* * *while True:
* * * *addr = fp.tell()
* * * *line = fp.readline()
* * * *if not line: break
* * * *yield (addr, line.rstrip('\n\r'))

That way, I can just use

* *f = file('foo.txt', 'b')
* *for offset, line in offset_iter(f):
* * *if line.startswith('>'): d[line] = offset

This bookmarks the *beginning* (I think your code notes the
*end*) of each line that starts with ">"

-tkc
Nov 19 '08 #4

This discussion thread is closed

Replies have been disabled for this discussion.