Possible to set cpython heap size?

Andy Watson

I have an application that scans and processes a bunch of text files.
The content I'm pulling out and holding in memory is at least 200MB.

I'd love to be able to tell the CPython virtual machine that I need a
heap of, say 300MB up front rather than have it grow as needed. I've
had a scan through the archives of comp.lang.pytho n and the python
docs but cannot find a way to do this. Is this possible to configure
the PVM this way?

Much appreciated,
Andy
--

Feb 22 '07 #1

Subscribe Reply

5216

Diez B. Roggisch

Andy Watson wrote:

I have an application that scans and processes a bunch of text files.
The content I'm pulling out and holding in memory is at least 200MB.

I'd love to be able to tell the CPython virtual machine that I need a
heap of, say 300MB up front rather than have it grow as needed. I've
had a scan through the archives of comp.lang.pytho n and the python
docs but cannot find a way to do this. Is this possible to configure
the PVM this way?

Why do you want that? And no, it is not possible. And to be honest: I have
no idea why e.g. the JVM allows for this.

Diez

Feb 22 '07 #2

Andy Watson

Why do you want that? And no, it is not possible. And to be honest:
I have

no idea why e.g. the JVM allows for this.

Diez

The reason why is simply that I know roughly how much memory I'm going
to need, and cpython seems to be taking a fair amount of time
extending its heap as I read in content incrementally.

Ta,
Andy
--

Feb 22 '07 #3

Diez B. Roggisch

Andy Watson wrote:

Why do you want that? And no, it is not possible. And to be honest:
I have
>no idea why e.g. the JVM allows for this.

The reason why is simply that I know roughly how much memory I'm going
to need, and cpython seems to be taking a fair amount of time
extending its heap as I read in content incrementally.

I'm not an expert in python malloc schemes, I know that _some_ things are
heavily optimized, but I'm not aware that it does some clever
self-management of heap in the general case. Which would be complicated in
the presence of arbitrary C extensions anyway.
However, I'm having doubts that your observation is correct. A simple

python -m timeit -n 1 -r 1 "range(50000000 )"
1 loops, best of 1: 2.38 sec per loop

will create a python-process of half a gig ram - for a split-second - and I
don't consider 2.38 seconds a fair amount of time for heap allocation.

When I used a 4 times larger argument, my machine began swapping. THEN
things became ugly - but I don't see how preallocation will help there...

Diez

Feb 22 '07 #4

Irmen de Jong

Andy Watson wrote:

Why do you want that? And no, it is not possible. And to be honest:
I have
>no idea why e.g. the JVM allows for this.

Diez

The reason why is simply that I know roughly how much memory I'm going
to need, and cpython seems to be taking a fair amount of time

^^^^^

extending its heap as I read in content incrementally.

First make sure this is really the case.
It may be that you are just using an inefficient algorithm.
In my experience allocating extra heap memory is hardly ever
noticeable. Unless your system is out of physical RAM and has
to swap.

--Irmen

Feb 22 '07 #5

Chris Mellon

On 22 Feb 2007 09:52:49 -0800, Andy Watson <al********@gma il.comwrote:

Why do you want that? And no, it is not possible. And to be honest:
I have
no idea why e.g. the JVM allows for this.

Diez

The reason why is simply that I know roughly how much memory I'm going
to need, and cpython seems to be taking a fair amount of time
extending its heap as I read in content incrementally.

To my knowledge, no modern OS actually commits any memory at all to a
process until it is written to. Pre-extending the heap would either a)
do nothing, because it'd be essentially a noop, or b) would take at
least long as doing it incrementally (because Python would need to
fill up all that space with objects), without giving you any actual
performance gain when you fill the object space "for real".

In Java, as I understand it, having a fixed size heap allows some
optimizations in the garbage collector. Pythons GC model is different
and, as far as I know, is unlikely to benefit from this.

Feb 22 '07 #6

Jussi Salmela

Andy Watson kirjoitti:

I have an application that scans and processes a bunch of text files.
The content I'm pulling out and holding in memory is at least 200MB.

I'd love to be able to tell the CPython virtual machine that I need a
heap of, say 300MB up front rather than have it grow as needed. I've
had a scan through the archives of comp.lang.pytho n and the python
docs but cannot find a way to do this. Is this possible to configure
the PVM this way?

Much appreciated,
Andy
--

Others have already suggested swap as a possible cause of slowness. I've
been playing with my portable (dual Intel T2300 @ 1.66 GHz; 1 GB of mem
; Win XP ; Python Scripter IDE)
using the following code:

#============== =========
import datetime

'''
# Create 10 files with sizes 1MB, ..., 10MB
for i in range(1,11):
print 'Writing: ' + 'Bytes_' + str(i*1000000)
f = open('Bytes_' + str(i*1000000), 'w')
f.write(str(i-1)*i*1000000)
f.close()
'''

# Read the files 5 times concatenating the contents
# to one HUGE string
now_1 = datetime.dateti me.now()
s = ''
for count in range(5):
for i in range(1,11):
print 'Reading: ' + 'Bytes_' + str(i*1000000)
f = open('Bytes_' + str(i*1000000), 'r')
s = s + f.read()
f.close()
print 'Size of s is', len(s)
print 's[274999999] = ' + s[274999999]
now_2 = datetime.dateti me.now()
print now_1
print now_2
raw_input('???' )
#============== =========

The part at the start that is commented out is the part I used to create
the 10 files. The second part prints the following output (abbreviated):

Reading: Bytes_1000000
Size of s is 1000000
Reading: Bytes_2000000
Size of s is 3000000
Reading: Bytes_3000000
Size of s is 6000000
Reading: Bytes_4000000
Size of s is 10000000
Reading: Bytes_5000000
Size of s is 15000000
Reading: Bytes_6000000
Size of s is 21000000
Reading: Bytes_7000000
Size of s is 28000000
Reading: Bytes_8000000
Size of s is 36000000
Reading: Bytes_9000000
Size of s is 45000000
Reading: Bytes_10000000
Size of s is 55000000
<snip>
Reading: Bytes_9000000
Size of s is 265000000
Reading: Bytes_10000000
Size of s is 275000000
s[274999999] = 9
2007-02-22 20:23:09.984000
2007-02-22 20:23:21.515000

As can be seen creating a string of 275 MB reading the parts from the
files took less than 12 seconds. I think this is fast enough, but others
might disagree! ;)

Using the Win Task Manager I can see the process to grow to a little
less than 282 MB when it reaches the raw_input call and to drop to less
than 13 MB a little after I've given some input apparently as a result
of PyScripter doing a GC.

Your situation (hardware, file sizes etc.) may differ so that my
experiment does not correspond it, but this was my 2 cents worth!

HTH,
Jussi

Feb 22 '07 #7

Andy Watson

On Feb 22, 10:53 am, a bunch of folks wrote:

Memory is basically free.

This is true if you are simply scanning a file into memory. However,
I'm storing the contents in some in-memory data structures and doing
some data manipulation. This is my speculation:

Several small objects per scanned line get allocated, and then
unreferenced. If the heap is relatively small, GC has to do some work
in order to make space for subsequent scan results. At some point, it
realises it cannot keep up and has to extend the heap. At this point,
VM and physical memory is committed, since it needs to be used. And
this keeps going on. At some point, GC will take a good deal of time
to compact the heap, since I and loading in so much data and creating
a lot of smaller objects.

If I could have a heap that is larger and does not need to be
dynamically extended, then the Python GC could work more efficiently.

Interesting discussion.
Cheers,
Andy
--

Feb 22 '07 #8

Chris Mellon

On 22 Feb 2007 11:28:52 -0800, Andy Watson <al********@gma il.comwrote:

On Feb 22, 10:53 am, a bunch of folks wrote:

Memory is basically free.

This is true if you are simply scanning a file into memory. However,
I'm storing the contents in some in-memory data structures and doing
some data manipulation. This is my speculation:

Several small objects per scanned line get allocated, and then
unreferenced. If the heap is relatively small, GC has to do some work
in order to make space for subsequent scan results. At some point, it
realises it cannot keep up and has to extend the heap. At this point,
VM and physical memory is committed, since it needs to be used. And
this keeps going on. At some point, GC will take a good deal of time
to compact the heap, since I and loading in so much data and creating
a lot of smaller objects.

If I could have a heap that is larger and does not need to be
dynamically extended, then the Python GC could work more efficiently.

I haven't even looked at Python memory management internals since 2.3,
and not in detail then, so I'm sure someone will correct me in the
case that I am wrong.

However, I believe that this is almost exactly how CPython GC does not
work. CPython is refcounted with a generational GC for cycle
detection. There's a memory pool that is used for object allocation
(more than one, I think, for different types of objects) and those can
be extended but they are not, to my knowledge, compacted.

If you're creating the same small objects for each scanned lines, and
especially if they are tuples or new-style objects with __slots__,
then the memory use for those objects should be more or less constant.
Your memory growth is probably related to the information you're
saving, not to your scanned objects, and since those are long-lived
objects I simple don't see how heap pre-allocation could be helpful
there.

Feb 22 '07 #9

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Andy Watson schrieb:

I have an application that scans and processes a bunch of text files.
The content I'm pulling out and holding in memory is at least 200MB.

I'd love to be able to tell the CPython virtual machine that I need a
heap of, say 300MB up front rather than have it grow as needed. I've
had a scan through the archives of comp.lang.pytho n and the python
docs but cannot find a way to do this. Is this possible to configure
the PVM this way?

You can configure your operating system. On Unix, do 'ulimit -m 200000'.

Regards,
Martin

Feb 22 '07 #10

Similar topics

30099

Stack vs. Heap

by: Kevin Grigorenko | last post by:

Hello, I couldn't find an obvious answer to this in the FAQ. My basic question, is: Is there any difference in allocating on the heap versus the stack? If heap or stack implementation is not part of the standard, then just disregard this question. Here's some questions I'm confused about, and if you can add anything else, please do so! Is the stack limited for each program?

C / C++

2653

How can I know size of avaible memory in heap?

by: Dima | last post by:

How can I know size of avaible memory in heap? For example : .... .... // size = N cout << "Size of Heap = " << SizeOfHeap() << endl; int* i = new int; // size = N - sizeof(int) cout << "Size of Heap = " << SizeOfHeap() << endl; ....

C / C++

17688

How to get total heap size of a process

by: ganesh.kundapur | last post by:

Hi, Is there any way to get the total heap size allocated to a process in c on linux platform. Regards, Ganesh

C / C++

37162

Can we determine stack size & Heap size at runtime ?

by: sunny | last post by:

Hi All Is there any way to determine stack and heap size, during runtime. i.e can we predict stack overflow. etc

C / C++

2765

Sort heap question

by: Raj | last post by:

We are on a db2 v8.2 (fix 8) with DPF & intra parllelism. Below are sort related configuration settings ----------------------------------------------------------------------------------------------------------------------------------- Sort heap threshold (4KB) (SHEAPTHRES) = 200000 Sort heap thres for shared sorts (4KB) (SHEAPTHRES_SHR) = (SHEAPTHRES) ...

DB2 Database

1662

A little help using some functions for a Heap

by: wishbone34 | last post by:

Hi, I have a question regarding the use of a couple functions I have for an assignment.. first here is the header file that im trying to use //--------------------------------------------------------------------------------- struct DataType { int key; string name; }; class Heap

C / C++

5629

Heap Size

by: Raman | last post by:

Hi All, Could any one tell me, how can I determine/Change size of heap on per- process basis on Unix based systems. Thanks. Regards

C / C++

24804

Out of private memory heap

by: kumarmdb2 | last post by:

Hi guys, For last few days we are getting out of private memory error. We have a development environment. We tried to figure out the problem but we believe that it might be related to the OS (I am new to Windows so not sure). We are currently bouncing the instance to overcome this error. This generally happen at the end of business day only (So maybe memory might be getting used up?). We have already increased the statement heap & ...

DB2 Database

9594

Heap, how to insert?

by: ggoubb | last post by:

The purpose of the Insert function is to add a new integer in the Heap assuming that it is not already full. If Heap capacity has been reached, it attempts to double the current capacity. If capacity cannot be doubled, it throws FullHeap. Here is the Heap.h file const int MAXSIZE = 4; // Default maximum heap size class Heap // Smart Heap ADT as an array { private: int* ptr; ...

C / C++

9639

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

10308

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10076

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9939

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

7486

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6729

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5507

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

4040

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

2870

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General