By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,587 Members | 1,575 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,587 IT Pros & Developers. It's quick & easy.

preallocate list

P: n/a
Jim
Hi all

Is this the best way to preallocate a list of integers?
listName = range(0,length)

What about non integers?

I've just claimed in the newsgroup above that pre-allocating helps but I
might be getting confused with matlab ;)

If I have a file with a floating point number on each line, what is the
best way of reading them into a list (or other ordered structure)?

I was iterating with readline and appending to a list but it is taking ages.

Jim
Jul 18 '05 #1
Share this Question
Share on Google+
20 Replies


P: n/a
rbt
Jim wrote:
If I have a file with a floating point number on each line, what is the
best way of reading them into a list (or other ordered structure)?

I was iterating with readline and appending to a list but it is taking
ages.


Perhaps you should use readlines (notice the s) instead of readline.
Jul 18 '05 #2

P: n/a
On 4/13/05, Jim <jb*@cannedham.ee.ed.ac.uk> wrote:
Hi all

Is this the best way to preallocate a list of integers?
listName = range(0,length)

the 0 is unnecessary; range(length) does the same thing.
What about non integers?

arr = [myobject() for i in range(length)]
I've just claimed in the newsgroup above that pre-allocating helps but I
might be getting confused with matlab ;)

If I have a file with a floating point number on each line, what is the
best way of reading them into a list (or other ordered structure)?

I was iterating with readline and appending to a list but it is taking ages.


I would profile your app to see that it's your append which is taking
ages, but to preallocate a list of strings would look like:

["This is an average length string" for i in range(approx_length)]

My guess is that it won't help to preallocate, but time it and let us
know. A test to back my guess:

import timeit, math

def test1():
lst = [0 for i in range(100000)]
for i in xrange(100000):
lst[i] = math.sin(i) * i

def test2():
lst = []
for i in xrange(100000):
lst.append(math.sin(i) * i)

t1 = timeit.Timer('test1()', 'from __main__ import test1')
t2 = timeit.Timer('test2()', 'from __main__ import test2')
print "time1: %f" % t1.timeit(100)
print "time2: %f" % t2.timeit(100)

09:09 AM ~$ python test.py
time1: 12.435000
time2: 12.385000

Peace
Bill Mill
bill.mill at gmail.com
Jul 18 '05 #3

P: n/a
Jim
rbt wrote:
Jim wrote:
If I have a file with a floating point number on each line, what is
the best way of reading them into a list (or other ordered structure)?

I was iterating with readline and appending to a list but it is taking
ages.

Perhaps you should use readlines (notice the s) instead of readline.


I don't know if I thought of that, but I'm tokenizing each line before
adding to a list of lists.

for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(factor)

Is this nasty?

Jim
Jul 18 '05 #4

P: n/a
Just a correction:

<snip>
I would profile your app to see that it's your append which is taking
ages, but to preallocate a list of strings would look like:

["This is an average length string" for i in range(approx_length)]

My guess is that it won't help to preallocate, but time it and let us
know. A test to back my guess:

import timeit, math

def test1():
lst = [0 for i in range(100000)]
for i in xrange(100000):
lst[i] = math.sin(i) * i

def test2():
lst = []
for i in xrange(100000):
lst.append(math.sin(i) * i)

t1 = timeit.Timer('test1()', 'from __main__ import test1')
t2 = timeit.Timer('test2()', 'from __main__ import test2')
print "time1: %f" % t1.timeit(100)
print "time2: %f" % t2.timeit(100)


The results change slightly when I actually insert an integer, instead
of a float, with lst[i] = i and lst.append(i):

09:14 AM ~$ python test.py
time1: 3.352000
time2: 3.672000

The preallocated list is slightly faster in most of my tests, but I
still don't think it'll bring a large performance benefit with it
unless you're making a truly huge list.

I need to wake up before pressing "send".

Peace
Bill Mill
Jul 18 '05 #5

P: n/a
Jim
Thanks for the suggestions. I guess I must ensure that this is my bottle
neck.
<code>
def readFactorsIntoList(self,filename,numberLoads):
factors = []
f = open(self.basedir + filename,'r')
line = f.readline()
tokens = line.split()
columns = len(tokens)
if int(columns) == number:
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(loadFactor)
else:
for line in f:
tokens = line.split()
factors.append([float(tokens[0])] * number)
return factors
</code>

OK. I've just tried with 4 lines and the code works. With 11000 lines it
uses all CPU for at least 30 secs. There must be a better way.

Jim
Jul 18 '05 #6

P: n/a
Jim wrote:
Thanks for the suggestions. I guess I must ensure that this is my
bottle neck.
....
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(loadFactor)

....

You might try:

factors = [ [float(item) for item in line.split()] for line in f ]

avoiding the extra statements for appending to the lists. Also might try:

factors = [ map(float, line.split()) for line in f ]

though it uses the out-of-favour functional form for the mapping.

Good luck,
Mike

________________________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://www.vrplumber.com
http://blog.vrplumber.com

Jul 18 '05 #7

P: n/a

what about :

factors = [map(float, line.split()) for line in file]

should be a hell of a lot faster and nicer.
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(factor)

Is this nasty?

Jim


Jul 18 '05 #8

P: n/a
Jim wrote:
Thanks for the suggestions. I guess I must ensure that this is my bottle
neck.
<code>
def readFactorsIntoList(self,filename,numberLoads):
factors = []
f = open(self.basedir + filename,'r')
line = f.readline()
tokens = line.split()
columns = len(tokens)
if int(columns) == number:
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(loadFactor)
else:
for line in f:
tokens = line.split()
factors.append([float(tokens[0])] * number)
return factors
</code>

OK. I've just tried with 4 lines and the code works. With 11000 lines it
uses all CPU for at least 30 secs. There must be a better way.


Was your test on *just* this function? Or were you doing something with
the list produced by this function as well?

STeVe
Jul 18 '05 #9

P: n/a
Jim
Steven Bethard wrote:
Jim wrote:
Thanks for the suggestions. I guess I must ensure that this is my
bottle neck.
<code>
def readFactorsIntoList(self,filename,numberLoads):
factors = []
f = open(self.basedir + filename,'r')
line = f.readline()
tokens = line.split()
columns = len(tokens)
if int(columns) == number:
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(loadFactor)
else:
for line in f:
tokens = line.split()
factors.append([float(tokens[0])] * number)
return factors
</code>

OK. I've just tried with 4 lines and the code works. With 11000 lines
it uses all CPU for at least 30 secs. There must be a better way.

Was your test on *just* this function? Or were you doing something with
the list produced by this function as well?


Just this. I had a breakpoint on the return.

I'm going to try peufeu's line of code and I'll report back.

Jim
Jul 18 '05 #10

P: n/a
Jim
pe****@free.fr wrote:

what about :

factors = [map(float, line.split()) for line in file]

should be a hell of a lot faster and nicer.
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(factor)

Is this nasty?

Jim


Oh the relief :)

Of course, line.split() is already a list.

Couple of seconds for the 10000 line file.

Thanks.

What I really want is a Numeric array but I don't think Numeric supports
importing files.

Jim
Jul 18 '05 #11

P: n/a
Jim
Steven Bethard wrote:
Jim wrote:

...
OK. I've just tried with 4 lines and the code works. With 11000 lines
it uses all CPU for at least 30 secs. There must be a better way.

Was your test on *just* this function? Or were you doing something with
the list produced by this function as well?

STeVe


Well it's fast enough now. Thanks for having a look.

Jim
Jul 18 '05 #12

P: n/a
Jim wrote:
What I really want is a Numeric array but I don't think Numeric supports
importing files.


Hmmm... Maybe the scipy package?

I think scipy.io.read_array might help, but I've never used it.

STeVe
Jul 18 '05 #13

P: n/a
Le Wed, 13 Apr 2005 16:46:53 +0100, Jim a écrit :

What I really want is a Numeric array but I don't think Numeric supports
importing files. Numeric arrays can be serialized from/to files through pickles :
import Numeric as N
help(N.load)
help(N.dump)
(and it is space efficient)
Jim

Jul 18 '05 #14

P: n/a
Jim wrote:
Hi all

Is this the best way to preallocate a list of integers?
listName = range(0,length)


For serious numerical work you should use Numeric or Numarray, as
others suggested. When I do allocate lists the initial values 0:n-1 are
rarely what I want, so I use

ivec = n*[None]

so that if I use a list element before intializing it, for example

ivec[0] += 1

I get an error message

File "xxnone.py", line 2, in ?
ivec[0] += 1
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'

This is in the same spirit as Python's (welcome) termination of a
program when one tries to use an uninitalized scalar variable.

Jul 18 '05 #15

P: n/a
Jim
F. Petitjean wrote:
Le Wed, 13 Apr 2005 16:46:53 +0100, Jim a écrit :
What I really want is a Numeric array but I don't think Numeric supports
importing files.


Numeric arrays can be serialized from/to files through pickles :
import Numeric as N
help(N.load)
help(N.dump)
(and it is space efficient)
Jim

Yeah thanks. I'm generating them using Matlab though so I'd have to get
the format the same. I use Matlab because I get the results I want. When
I get to know Python + scipy etc. better I might remove that step.

Thanks again

Jim
Jul 18 '05 #16

P: n/a
Jim
Steven Bethard wrote:
Jim wrote:
What I really want is a Numeric array but I don't think Numeric
supports importing files.

Hmmm... Maybe the scipy package?

I think scipy.io.read_array might help, but I've never used it.

STeVe

Sounds promising.

I only got Numeric because I wanted scipy but I've hardly explored it as
I kept running into problems even with the complicated examples cut and
paste into a file ;)

Oh yeah, I wanted to explore the GA module but no docs :( and I got busy
doing other stuff.

Thanks

Jim
Jul 18 '05 #17

P: n/a
Jim
ivec = n*[None]

so that if I use a list element before intializing it, for example

ivec[0] += 1

I get an error message

File "xxnone.py", line 2, in ?
ivec[0] += 1
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'

This is in the same spirit as Python's (welcome) termination of a
program when one tries to use an uninitalized scalar variable.


I feel foolish that I forgot about *. I've just started with Python then
took 2 weeks off. I'll explore pre-allocation when I'm back up to speed.

Yep, I use None a lot.

Thanks

Jim
Jul 18 '05 #18

P: n/a
Bill Mill <bi*******@gmail.com> writes:
Bill Mill <bi*******@gmail.com> writes:
I would profile your app to see that it's your append which is taking
ages, but to preallocate a list of strings would look like:

["This is an average length string" for i in range(approx_length)]
I don't think there's any point putting strings into the preallocated
list. A list is just an array of pointers to objects, so any object
will do fine for preallocation, no matter what the list will be used for.
My guess is that it won't help to preallocate, but time it and let us
know. A test to back my guess:

import timeit, math

def test1():
lst = [0 for i in range(100000)]
for i in xrange(100000):
lst[i] = math.sin(i) * i

def test2():
lst = []
for i in xrange(100000):
lst.append(math.sin(i) * i)

....
The results change slightly when I actually insert an integer, instead
of a float, with lst[i] = i and lst.append(i):

09:14 AM ~$ python test.py
time1: 3.352000
time2: 3.672000


If you use

lst = range(100000)

or even better

lst = [None]*100000

then test1 is more than twice as fast as test2:

time1: 2.437730
time2: 5.308054

(using python 2.4).

Your code

lst = [0 for i in range(100000)]

made python do an extra 100000-iteration loop.

Dan
Jul 18 '05 #19

P: n/a
On Wed, 13 Apr 2005 14:28:51 +0100, Jim <jb*@cannedham.ee.ed.ac.uk>
wrote:
Thanks for the suggestions. I guess I must ensure that this is my bottle
neck.
<code>
def readFactorsIntoList(self,filename,numberLoads):
1. "numberLoads" is not used.
factors = []
f = open(self.basedir + filename,'r')
line = f.readline()
tokens = line.split()
columns = len(tokens)
if int(columns) == number:
2. "columns" is already an int (unless of course you've redefined
"len"!). Doing int(columns) is pointless.
3. What is "number"? Same as "numberLoads"?
4. Please explain in general what is the layout of your file and in
particular, what is the significance of the first line of the file and
of the above "if" test.
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
4. "factor" is built and then not used any more??
factors.append(loadFactor)
5. What is "loadFactor"? Same as "factor"?
else:
for line in f:
tokens = line.split()
factors.append([float(tokens[0])] * number)
6. You throw away any tokens in the line after the first??
return factors
</code>

OK. I've just tried with 4 lines and the code works.
Which code works? The code you posted? Please define "works".

With 11000 lines it
uses all CPU for at least 30 secs. There must be a better way.


Perhaps after you post the code that you've actually run, and
explained what your file layout is, and what you are trying to
achieve, then we can give you some meaningful help.

Cheers,

John

Jul 18 '05 #20

P: n/a
Jim
John Machin wrote:
On Wed, 13 Apr 2005 14:28:51 +0100, Jim <jb*@cannedham.ee.ed.ac.uk>
wrote:

Thanks for the suggestions. I guess I must ensure that this is my bottle
neck.
<code>
def readFactorsIntoList(self,filename,numberLoads):

1. "numberLoads" is not used.

factors = []
f = open(self.basedir + filename,'r')
line = f.readline()
tokens = line.split()
columns = len(tokens)
if int(columns) == number:

2. "columns" is already an int (unless of course you've redefined
"len"!). Doing int(columns) is pointless.
3. What is "number"? Same as "numberLoads"?
4. Please explain in general what is the layout of your file and in
particular, what is the significance of the first line of the file and
of the above "if" test.

for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))

4. "factor" is built and then not used any more??

factors.append(loadFactor)

5. What is "loadFactor"? Same as "factor"?

else:
for line in f:
tokens = line.split()
factors.append([float(tokens[0])] * number)

6. You throw away any tokens in the line after the first??

return factors
</code>

OK. I've just tried with 4 lines and the code works.

Which code works? The code you posted? Please define "works".
With 11000 lines it
uses all CPU for at least 30 secs. There must be a better way.

Perhaps after you post the code that you've actually run, and
explained what your file layout is, and what you are trying to
achieve, then we can give you some meaningful help.

Cheers,

John


Thanks for looking John. For that I should take a little time to explain.

I tried to rename the variables, some of them were four words long. I
got a couple of the renames wrong. Sorry.

Regarding 'works'. I meant that with a text file of four lines the code
completed. With my desired size 11000 lines it didn't complete within
the limits of my patience. I didn't try any other size.

Also I perhaps wrongly use the newsgroup threads paradigm in trying to
restart my query with extra information (that turned out a little faulty).

Luckily the other branches yielded fruit.

Thanks again
Jim

Jul 18 '05 #21

This discussion thread is closed

Replies have been disabled for this discussion.