471,337 Members | 1,025 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,337 software developers and data experts.

enumerate overflow

Hello all,

in python2.4, i read lines from a file with

for lineNum, line in enumerate(f): ...

However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?

Cheers.

Oct 3 '07 #1
11 1425
cr**@post.cz schrieb:
Hello all,

in python2.4, i read lines from a file with

for lineNum, line in enumerate(f): ...

However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?
Most probably you can't, because it is a C-written function I presume.

But as python 2.4 has generators, it's ease to create an enumerate yourself:
def lenumerate(f):
i = 0
for line in f:
yield i, line
i += 1

Diez
Oct 3 '07 #2
cr**@post.cz wrote:
Hello all,

in python2.4, i read lines from a file with

for lineNum, line in enumerate(f): ...

However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?
Just how "soon" exactly do you read sys.maxint lines from a file? I
should have thought that it would take a significant amount of time to
read 2,147,483,647 lines ...

But it is true that Python 2.5 uses an enumobject representation that
limits the index to a (C) long:

typedef struct {
PyObject_HEAD
long en_index; /* current index of enumeration */
PyObject* en_sit; /* secondary iterator of enumeration */
PyObject* en_result; /* result tuple */
} enumobject;

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline

Oct 3 '07 #3
>for lineNum, line in enumerate(f): ...
>>
However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?
Just how "soon" exactly do you read sys.maxint lines from a file? I
should have thought that it would take a significant amount of time to
read 2,147,483,647 lines ...
A modestly (but not overwhelmingly) long time:

(defining our own xrange-ish generator that can handle things
larger than longs)
>>def xxrange(x):
.... i = 0
.... while i < x:
.... yield i
.... i += 1
....
>>for i,j in enumerate(xxrange(2**33)): assert i==j
....
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AssertionError
It took me about an 60-90 minutes to hit the assertion on a
dual-core 2.8ghz machine under otherwise-light-load. If
batch-processing lengthy log files or other large data such as
genetic data, it's entirely possible to hit this limit as the OP
discovered.

-tkc

Oct 3 '07 #4
Tim Chase wrote:
>>for lineNum, line in enumerate(f): ...

However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?
Just how "soon" exactly do you read sys.maxint lines from a file? I
should have thought that it would take a significant amount of time to
read 2,147,483,647 lines ...

A modestly (but not overwhelmingly) long time:

(defining our own xrange-ish generator that can handle things larger
than longs)
>>def xxrange(x):
... i = 0
... while i < x:
... yield i
... i += 1
...
>>for i,j in enumerate(xxrange(2**33)): assert i==j
...
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AssertionError
It took me about an 60-90 minutes to hit the assertion on a dual-core
2.8ghz machine under otherwise-light-load. If batch-processing lengthy
log files or other large data such as genetic data, it's entirely
possible to hit this limit as the OP discovered.
I wouldn't dream of suggesting it's impossible. I just regard "soon" as
less than an hour in commuter's terms, I suppose.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline
Oct 3 '07 #5
Steve Holden wrote:
I wouldn't dream of suggesting it's impossible.
I just regard "soon" as less than an hour in
commuter's terms, I suppose.
Sadly, speaking as a Londoner, an hour is indeed
"soon" in commuter terms.

TJG

Oct 3 '07 #6
[Paul Rubin]
I hope in 3.0 there's a real fix, i.e. the count should promote to
long.
In Py2.6, I will mostly likely put in an automatic promotion to long
for both enumerate() and count(). It took a while to figure-out how
to do this without killing the performance for normal cases (ones used
in real programs, not examples contrived to say, "omg, see what
*could* happen").
Raymond

Oct 3 '07 #7
Raymond Hettinger <py****@rcn.comwrites:
In Py2.6, I will mostly likely put in an automatic promotion to long
for both enumerate() and count(). It took a while to figure-out how
to do this without killing the performance for normal cases (ones used
in real programs, not examples contrived to say, "omg, see what
*could* happen").
Great, this is good to hear. I think it's ok if the enumeration slows
down after fixnum overflow is reached. So it's just a matter of
replacing the overflow signal with consing up a long. The fixnum case
would be the same as it is now. To be fancy, the count could be
stored in two C ints (or a gcc long long) so it would go up to 64 bits
but I don't think it's worth it, especially for itertools.count which
should be able to take arbitrary (i.e. larger than 64 bits) initializers.

As for real programs, well, the Y2038 bug is slowly creeping up on us.
That's when Unix timestamps overflow a signed 32-bit counter. It's
already caused an actual system failure, in 2006:

http://worsethanfailure.com/Articles...he_Epoch_.aspx

Really, the whole idea of int/long unification is so we can stop
worrying about "omg, that could happen". We want to write programs
without special consideration or "omg" about those possibilities, and
still have them keep working smoothly if that DOES happen. Just about
all of us these days have 100's of GB's or more of disk space on our
systems, and files with over 2**32 bytes or lines are not even
slightly unreasonable. We shouldn't have to write special generators
to deal with them, the library should instead just do the right thing.
Oct 3 '07 #8
Raymond Hettinger <py****@rcn.comwrites:
[Paul Rubin]
>I hope in 3.0 there's a real fix, i.e. the count should promote to
long.

In Py2.6, I will mostly likely put in an automatic promotion to long
for both enumerate() and count(). It took a while to figure-out how
to do this without killing the performance for normal cases (ones
used in real programs, not examples contrived to say, "omg, see what
*could* happen").
Using PY_LONG_LONG for the counter, and PyLong_FromLongLong to create
the Python number should work well for huge sequences without
(visibly) slowing down the normal case.
Oct 3 '07 #9
On Oct 3, 7:22 pm, Raymond Hettinger <pyt...@rcn.comwrote:
In Py2.6, I will mostly likely put in an automatic promotion to long
for both enumerate() and count(). It took a while to figure-out how
to do this without killing the performance for normal cases (ones used
in real programs, not examples contrived to say, "omg, see what
*could* happen").

Raymond

Thanks everybody for the reply and suggestions, I'm glad to see the
issues's already been discovered/discussed/almostresolved.

By the way, I do not consider my programs in any way 'unreal'.

Oct 3 '07 #10
On Oct 3, 12:52 pm, koara <ko...@atlas.czwrote:
Thanks everybody for the reply and suggestions, I'm glad to see the
issues's already been discovered/discussed/almostresolved.
The new code is checked-in. In Py2.6, enumerate() will no longer
raise an OverflowError and it will automatically shift from ints to
longs. Will check in something similar for itertools.count() when I
get a chance.
Raymond
Oct 3 '07 #11
En Wed, 03 Oct 2007 08:46:31 -0300, <cr**@post.czescribi�:
in python2.4, i read lines from a file with

for lineNum, line in enumerate(f): ...

However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?
(what kind of files are you using? enumerate overlows after more than two
billion lines... is that "soon" for you?)

I'm afraid neither iterate nor itertools.count will generate a long
integer; upgrading to Python 2.5 won't help. I think the only way is to
roll your own counter:

lineNum = 0
for line in f:
...
lineNum += 1

--
Gabriel Genellina

Oct 4 '07 #12

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Pekka Niiranen | last post: by
5 posts views Thread by HL | last post: by
1 post views Thread by smichr | last post: by
6 posts views Thread by Gregory Petrosyan | last post: by
2 posts views Thread by eight02645999 | last post: by
8 posts views Thread by Dustan | last post: by
21 posts views Thread by James Stroud | last post: by
42 posts views Thread by thomas.mertes | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.