By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,171 Members | 809 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,171 IT Pros & Developers. It's quick & easy.

Memory problem

P: n/a
Hi,

I need to read a large amount of data into a list. So I am trying to
see if I'll have any memory problem. When I do
x=range(2700*2700*3) I got the following message:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
MemoryError

Any way to get around this problem? I have a machine of 4G memory. The
total number of data points (float) that I need to read is in the order
of 200-300 millions.

Thanks.

Aug 14 '06 #1
Share this Question
Share on Google+
13 Replies


P: n/a
Yi Xing wrote:
I need to read a large amount of data into a list. So I am trying to
see if I'll have any memory problem. When I do
x=range(2700*2700*3) I got the following message:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
MemoryError
Any way to get around this problem? I have a machine of 4G memory. The
total number of data points (float) that I need to read is in the order
of 200-300 millions.
If you know that you need floats only, then you can use a typed array
(an array.array) instead of an untyped array (a Python list):

import array
a = array.array("f")

You can also try with a numerical library like scipy, it may support up
to 2 GB long arrays.

Bye,
bearophile

Aug 14 '06 #2

P: n/a
Yi Xing wrote:
Hi,

I need to read a large amount of data into a list. So I am trying to
see if I'll have any memory problem. When I do
x=range(2700*2700*3) I got the following message:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
MemoryError

Any way to get around this problem? I have a machine of 4G memory. The
total number of data points (float) that I need to read is in the order
of 200-300 millions.
2700*2700*3 is only 21M. Your computer shouldn't have raised a sweat,
let alone MemoryError. Ten times that got me a MemoryError on a 1GB
machine.

A raw Python float takes up 8 bytes. On a 32-bit machine a float object
will have another 8 bytes of (type, refcount). Instead of a list, you
probably need to use an array.array (which works on homogenous
contents, so it costs 8 bytes each float, not 16), or perhaps
numeric/numpy/scipy/...

HTH,
John

Aug 14 '06 #3

P: n/a

be************@lycos.com wrote:
If you know that you need floats only, then you can use a typed array
(an array.array) instead of an untyped array (a Python list):

import array
a = array.array("f")
Clarification: typecode 'f' stores a Python float (64-bits, equivalent
to a C double) as a 32-bit FP number (equivalent to a C float) -- with
apart from the obvious loss of precision, a little extra time being
required to convert to & fro. You may consider the trade-off
worthwhile.

Cheers,
John

Aug 14 '06 #4

P: n/a
Yi Xing wrote:
Hi,

I need to read a large amount of data into a list. So I am trying to see
if I'll have any memory problem. When I do
x=range(2700*2700*3) I got the following message:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
MemoryError

Any way to get around this problem? I have a machine of 4G memory. The
total number of data points (float) that I need to read is in the order
of 200-300 millions.

Thanks.
On my 1Gb machine this worked just fine, no memory error.

-Larry Bates
Aug 14 '06 #5

P: n/a
On a related question: how do I initialize a list or an array with a
pre-specified number of elements, something like
int p[100] in C? I can do append() for 100 times but this looks silly...

Thanks.

Yi Xing

Aug 14 '06 #6

P: n/a

Yi Xing wrote:
On a related question: how do I initialize a list or an array with a
pre-specified number of elements, something like
int p[100] in C? I can do append() for 100 times but this looks silly...

Thanks.

Yi Xing
You seldom need to do that in python, but it's easy enough:

new_list = [0 for notused in xrange(100)]

or if you already have a list:

my_list.extend(0 for notused in xrange(100))

HTH,
~Simon

Aug 14 '06 #7

P: n/a

Yi Xing wrote:
On a related question: how do I initialize a list or an array with a
pre-specified number of elements, something like
int p[100] in C? I can do append() for 100 times but this looks silly...

Thanks.

Yi Xing
Use [0]*100 for a list.

THN

Aug 14 '06 #8

P: n/a
Yi Xing wrote:
On a related question: how do I initialize a list or an array with a
pre-specified number of elements, something like
int p[100] in C? I can do append() for 100 times but this looks silly...

Thanks.

Yi Xing
Unlike other languages this is seldom done in Python. I think you should
probably be looking at http://numeric.scipy.org/ if you want to have
"traditional" arrays of floats.

-Larry
Aug 14 '06 #9

P: n/a
Thanks! I just found that that I have no problem with
x=[[10.0]*2560*2560]*500, but x=range(1*2560*2560*30) doesn't work.

-Yi
On Aug 14, 2006, at 3:08 PM, Larry Bates wrote:
Yi Xing wrote:
>On a related question: how do I initialize a list or an array with a
pre-specified number of elements, something like
int p[100] in C? I can do append() for 100 times but this looks
silly...

Thanks.

Yi Xing
Unlike other languages this is seldom done in Python. I think you
should
probably be looking at http://numeric.scipy.org/ if you want to have
"traditional" arrays of floats.

-Larry
--
http://mail.python.org/mailman/listinfo/python-list
Aug 14 '06 #10

P: n/a

Yi Xing wrote:
On a related question: how do I initialize a list or an array with a
pre-specified number of elements, something like
int p[100] in C? I can do append() for 100 times but this looks silly...

Thanks.

Yi Xing
In the case of an array, you may wish to consider the fromfile()
method.

Cheers,
John

Aug 14 '06 #11

P: n/a
Yi Xing wrote:
Thanks! I just found that that I have no problem with
x=[[10.0]*2560*2560]*500, but x=range(1*2560*2560*30) doesn't work.
That's no surprise. In the first case, try

x[0][0] = 20.0
print x[1][0]

You have the very same (identical) list of 2560*2560 values in x
500 times.

To create such a structure correctly, do

x = [None] * 500
for i in range(500)
x[i] = [10.0]*2560*2560

In any case, check ulimit(1).

Regards,
Martin
Aug 14 '06 #12

P: n/a
Yi Xing wrote:
Thanks! I just found that that I have no problem with
x=[[10.0]*2560*2560]*500, but x=range(1*2560*2560*30) doesn't work.
range(1*2560*2560*30) is creating a list of 196M *unique* ints.
Assuming 32-bit ints and pointers: that's 4 bytes each for the value, 4
for the type pointer, 4 for the refcount and 4 for the actual list
element (a pointer to the 12-byte object). so that's one chunk of
4x196M = 786MB of contiguous list, plus 196M chunks each whatever size
gets allocated for a request of 12 bytes. Let's guess at 16. So the
total memory you need is 3920M.

Now let's look at [[10.0]*2560*2560]*500.
Firstly that creates a tiny list [10.0]. then you create a list that
contains 2560*2560 = 6.5 M references to that *one* object containing
10.0. That's 26MB. Then you make a list of 500 references to that big
list. This new list costs you 2000 bytes. Total required: about 26.2MB.
The minute you start having non-unique numbers instead of 10.0, this
all falls apart.

In any case, your above comparison is nothing at all to do with the
solution that you need, which as already explained will involve
array.array or numpy.

What you now need to do is answer the questions about your pagefile
etc.

Cheers,
John

Aug 14 '06 #13

P: n/a
I used the array module and loaded all the data into an array.
Everything works fine now.
On Aug 14, 2006, at 4:01 PM, John Machin wrote:
Yi Xing wrote:
>Thanks! I just found that that I have no problem with
x=[[10.0]*2560*2560]*500, but x=range(1*2560*2560*30) doesn't work.

range(1*2560*2560*30) is creating a list of 196M *unique* ints.
Assuming 32-bit ints and pointers: that's 4 bytes each for the value, 4
for the type pointer, 4 for the refcount and 4 for the actual list
element (a pointer to the 12-byte object). so that's one chunk of
4x196M = 786MB of contiguous list, plus 196M chunks each whatever size
gets allocated for a request of 12 bytes. Let's guess at 16. So the
total memory you need is 3920M.

Now let's look at [[10.0]*2560*2560]*500.
Firstly that creates a tiny list [10.0]. then you create a list that
contains 2560*2560 = 6.5 M references to that *one* object containing
10.0. That's 26MB. Then you make a list of 500 references to that big
list. This new list costs you 2000 bytes. Total required: about 26.2MB.
The minute you start having non-unique numbers instead of 10.0, this
all falls apart.

In any case, your above comparison is nothing at all to do with the
solution that you need, which as already explained will involve
array.array or numpy.

What you now need to do is answer the questions about your pagefile
etc.

Cheers,
John

--
http://mail.python.org/mailman/listinfo/python-list
Aug 15 '06 #14

This discussion thread is closed

Replies have been disabled for this discussion.