473,387 Members | 1,561 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Memory problem

Hi,

I need to read a large amount of data into a list. So I am trying to
see if I'll have any memory problem. When I do
x=range(2700*2700*3) I got the following message:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
MemoryError

Any way to get around this problem? I have a machine of 4G memory. The
total number of data points (float) that I need to read is in the order
of 200-300 millions.

Thanks.

Aug 14 '06 #1
13 2400
Yi Xing wrote:
I need to read a large amount of data into a list. So I am trying to
see if I'll have any memory problem. When I do
x=range(2700*2700*3) I got the following message:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
MemoryError
Any way to get around this problem? I have a machine of 4G memory. The
total number of data points (float) that I need to read is in the order
of 200-300 millions.
If you know that you need floats only, then you can use a typed array
(an array.array) instead of an untyped array (a Python list):

import array
a = array.array("f")

You can also try with a numerical library like scipy, it may support up
to 2 GB long arrays.

Bye,
bearophile

Aug 14 '06 #2
Yi Xing wrote:
Hi,

I need to read a large amount of data into a list. So I am trying to
see if I'll have any memory problem. When I do
x=range(2700*2700*3) I got the following message:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
MemoryError

Any way to get around this problem? I have a machine of 4G memory. The
total number of data points (float) that I need to read is in the order
of 200-300 millions.
2700*2700*3 is only 21M. Your computer shouldn't have raised a sweat,
let alone MemoryError. Ten times that got me a MemoryError on a 1GB
machine.

A raw Python float takes up 8 bytes. On a 32-bit machine a float object
will have another 8 bytes of (type, refcount). Instead of a list, you
probably need to use an array.array (which works on homogenous
contents, so it costs 8 bytes each float, not 16), or perhaps
numeric/numpy/scipy/...

HTH,
John

Aug 14 '06 #3

be************@lycos.com wrote:
If you know that you need floats only, then you can use a typed array
(an array.array) instead of an untyped array (a Python list):

import array
a = array.array("f")
Clarification: typecode 'f' stores a Python float (64-bits, equivalent
to a C double) as a 32-bit FP number (equivalent to a C float) -- with
apart from the obvious loss of precision, a little extra time being
required to convert to & fro. You may consider the trade-off
worthwhile.

Cheers,
John

Aug 14 '06 #4
Yi Xing wrote:
Hi,

I need to read a large amount of data into a list. So I am trying to see
if I'll have any memory problem. When I do
x=range(2700*2700*3) I got the following message:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
MemoryError

Any way to get around this problem? I have a machine of 4G memory. The
total number of data points (float) that I need to read is in the order
of 200-300 millions.

Thanks.
On my 1Gb machine this worked just fine, no memory error.

-Larry Bates
Aug 14 '06 #5
On a related question: how do I initialize a list or an array with a
pre-specified number of elements, something like
int p[100] in C? I can do append() for 100 times but this looks silly...

Thanks.

Yi Xing

Aug 14 '06 #6

Yi Xing wrote:
On a related question: how do I initialize a list or an array with a
pre-specified number of elements, something like
int p[100] in C? I can do append() for 100 times but this looks silly...

Thanks.

Yi Xing
You seldom need to do that in python, but it's easy enough:

new_list = [0 for notused in xrange(100)]

or if you already have a list:

my_list.extend(0 for notused in xrange(100))

HTH,
~Simon

Aug 14 '06 #7

Yi Xing wrote:
On a related question: how do I initialize a list or an array with a
pre-specified number of elements, something like
int p[100] in C? I can do append() for 100 times but this looks silly...

Thanks.

Yi Xing
Use [0]*100 for a list.

THN

Aug 14 '06 #8
Yi Xing wrote:
On a related question: how do I initialize a list or an array with a
pre-specified number of elements, something like
int p[100] in C? I can do append() for 100 times but this looks silly...

Thanks.

Yi Xing
Unlike other languages this is seldom done in Python. I think you should
probably be looking at http://numeric.scipy.org/ if you want to have
"traditional" arrays of floats.

-Larry
Aug 14 '06 #9
Thanks! I just found that that I have no problem with
x=[[10.0]*2560*2560]*500, but x=range(1*2560*2560*30) doesn't work.

-Yi
On Aug 14, 2006, at 3:08 PM, Larry Bates wrote:
Yi Xing wrote:
>On a related question: how do I initialize a list or an array with a
pre-specified number of elements, something like
int p[100] in C? I can do append() for 100 times but this looks
silly...

Thanks.

Yi Xing
Unlike other languages this is seldom done in Python. I think you
should
probably be looking at http://numeric.scipy.org/ if you want to have
"traditional" arrays of floats.

-Larry
--
http://mail.python.org/mailman/listinfo/python-list
Aug 14 '06 #10

Yi Xing wrote:
On a related question: how do I initialize a list or an array with a
pre-specified number of elements, something like
int p[100] in C? I can do append() for 100 times but this looks silly...

Thanks.

Yi Xing
In the case of an array, you may wish to consider the fromfile()
method.

Cheers,
John

Aug 14 '06 #11
Yi Xing wrote:
Thanks! I just found that that I have no problem with
x=[[10.0]*2560*2560]*500, but x=range(1*2560*2560*30) doesn't work.
That's no surprise. In the first case, try

x[0][0] = 20.0
print x[1][0]

You have the very same (identical) list of 2560*2560 values in x
500 times.

To create such a structure correctly, do

x = [None] * 500
for i in range(500)
x[i] = [10.0]*2560*2560

In any case, check ulimit(1).

Regards,
Martin
Aug 14 '06 #12
Yi Xing wrote:
Thanks! I just found that that I have no problem with
x=[[10.0]*2560*2560]*500, but x=range(1*2560*2560*30) doesn't work.
range(1*2560*2560*30) is creating a list of 196M *unique* ints.
Assuming 32-bit ints and pointers: that's 4 bytes each for the value, 4
for the type pointer, 4 for the refcount and 4 for the actual list
element (a pointer to the 12-byte object). so that's one chunk of
4x196M = 786MB of contiguous list, plus 196M chunks each whatever size
gets allocated for a request of 12 bytes. Let's guess at 16. So the
total memory you need is 3920M.

Now let's look at [[10.0]*2560*2560]*500.
Firstly that creates a tiny list [10.0]. then you create a list that
contains 2560*2560 = 6.5 M references to that *one* object containing
10.0. That's 26MB. Then you make a list of 500 references to that big
list. This new list costs you 2000 bytes. Total required: about 26.2MB.
The minute you start having non-unique numbers instead of 10.0, this
all falls apart.

In any case, your above comparison is nothing at all to do with the
solution that you need, which as already explained will involve
array.array or numpy.

What you now need to do is answer the questions about your pagefile
etc.

Cheers,
John

Aug 14 '06 #13
I used the array module and loaded all the data into an array.
Everything works fine now.
On Aug 14, 2006, at 4:01 PM, John Machin wrote:
Yi Xing wrote:
>Thanks! I just found that that I have no problem with
x=[[10.0]*2560*2560]*500, but x=range(1*2560*2560*30) doesn't work.

range(1*2560*2560*30) is creating a list of 196M *unique* ints.
Assuming 32-bit ints and pointers: that's 4 bytes each for the value, 4
for the type pointer, 4 for the refcount and 4 for the actual list
element (a pointer to the 12-byte object). so that's one chunk of
4x196M = 786MB of contiguous list, plus 196M chunks each whatever size
gets allocated for a request of 12 bytes. Let's guess at 16. So the
total memory you need is 3920M.

Now let's look at [[10.0]*2560*2560]*500.
Firstly that creates a tiny list [10.0]. then you create a list that
contains 2560*2560 = 6.5 M references to that *one* object containing
10.0. That's 26MB. Then you make a list of 500 references to that big
list. This new list costs you 2000 bytes. Total required: about 26.2MB.
The minute you start having non-unique numbers instead of 10.0, this
all falls apart.

In any case, your above comparison is nothing at all to do with the
solution that you need, which as already explained will involve
array.array or numpy.

What you now need to do is answer the questions about your pagefile
etc.

Cheers,
John

--
http://mail.python.org/mailman/listinfo/python-list
Aug 15 '06 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Salvador | last post by:
Hi, I am using WMI to gather information about different computers (using win2K and win 2K3), checking common classes and also WMI load balance. My application runs every 1 minute and reports...
9
by: Bruno Barberi Gnecco | last post by:
I'm using PHP to run a CLI application. It's a script run by cron that parses some HTML files (with DOM XML), and I ended up using PHP to integrate with the rest of the code that already runs the...
9
by: jeungster | last post by:
Hello, I'm trying to track down a memory issue with a C++ application that I'm working on: In a nutshell, the resident memory usage of my program continues to grow as the program runs. It...
17
by: frederic.pica | last post by:
Greets, I've some troubles getting my memory freed by python, how can I force it to release the memory ? I've tried del and gc.collect() with no success. Here is a code sample, parsing an XML...
1
by: martinsmith160 | last post by:
Hi all I am trying to create a level builder tool for a final year project and im having some problems drawing. I have placed a picture box within a panel so i can scroll around the image which is...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.