473,729 Members | 2,243 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

python gc performance in large apps


Hey guys (thus begins a book of a post :),

I'm in the process of writing a commercial VoIP call monitoring and
recording application suite in python and pyrex. Basically, this
software sits in a VoIP callcenter-type environment (complete with agent
phones and VoIP servers), sniffs voice data off of the network, and
allows users to listen into calls. It can record calls as well. The
project is about a year and 3 months in the making and lately the
codebase has stabilized enough to where it can be used by some of our
clients. The entire project has about 37,000 lines of python and pyrex
code (along with 1-2K lines of unrelated java code).

Now, some disjointed rambling about the architecture of this software.
This software has two long-running server-type components. One
component, the "director" application, is written in pure python and
makes use of the twisted, nevow, and kinterbasdb libraries (which I
realize link to some C extensions). The other component, the
"harvester" , is a mixture of python and pyrex, and makes use of the
twisted library, along with using the C libs libpcap and glib on the
pyrex end. Basically, the director is the "master" component. A single
director process interacts with users of the system through a web and/or
pygtk client application interface and can coordinate 1 to n harvesters
spread about the world. The harvester is the "heavy lifter" component
that sniffs the network traffic and sifts out the voice and signalling
data. It then updates the director of call status changes, and can
provide users of the system access to the data. It records the data to
disk as well. The scalibility of this thing is really cool: given a
single director sitting somewhere coordinating the list of agents,
multiple harvester can be placed anywhere there is voice traffic. A user
that logs into the director can end up seeing the activity of all of
these seperate voice networks presented like a single giant mesh.

Overall, I have been very pleased with python and the 3rd party
libraries that I use (twisted, nevow, kinterbasdb and pygtk). It is a
joy to program with, and I think the python community has done a fine
job. However, as I have been running the software lately and profiling
its memory usage, the one and only Big Problem I have seen is that of
the memory usage. Ideally, the server application(s) should be able to
run indefinitely, but from the results I'm seeing I will end up
exhausting the memory on a 2 GB machine in 2 to 3 days of heavy load.

Now normally I would not raise up an issue like this on this list, but
based on the conversations held on this list lately, and the work done
by Evan Jones (http://evanjones.ca/python-memory.html), I am led to
believe that this memory usage -- while partially due to some probably
leaks in my program -- is largely due to the current python gc. I have
some graphs I made to show the extent of this memory usage growth:

http://public.robbyd.fastmail.fm/iq-graph1.gif

http://public.robbyd.fastmail.fm/iq-...rector-rss.gif

http://public.robbyd.fastmail.fm/iq-graph-harv-rss.gif

The preceding three diagrams are the result of running the 1 director
process and 1 harvester process on the same machine for about 48 hours.
This is the most basic configuration of this software. I was running
this application through /usr/bin/python (CPython) on a Debian 'testing'
box running Linux 2.4 with 2GB of memory and Python version 2.3.5.
During that time, I gathered the resident and virtual memory size of
each component at 120 second intervals. I then imported this data into
MINITAB and did some plots. The first one is a graph of the resident
(RSS) and virtual memory usage of the two applications. The second one
is a zoomed in graph of the director's resident memory usage (complete
with a best fit quadratic), and the 3rd one is a zoomed in graph of the
harvester's resident memory usage.

To give you an idea of the network load these apps were undergoing
during this sampling time, by the time 48 hours had passed, the
harvester had gathered and parsed about 900 million packets. During the
day there will be 50-70 agents talking. This number goes to 10-30 at night.

In the diagrams above, one can see the night-day separation clearly. At
night, the memory usage growth seemed to all but stop, but with the
increased call volume of the day, it started shooting off again. When I
first started gathering this data, I was hoping for a logarithmic curve,
but at least after 48 hours, it looks like the usage increase is almost
linear. (Although logarithmic may still be the case after it exceeds a
gig or two of used memory. :) I'm not sure if this is something that I
should expect from the current gc, and when it would stop.

Now, as I stated above, I am certain that at least some of this
increased memory usage is due to either un-collectable objects in the
python code, or memory leaks in the pyrex code (where I make some use of
malloc/free). I am working on finding and removing these issues, but
from what I've seen with the help of gc UNCOLLECTABLE traces, there are
not many un-collectable reference issues at least. Yes, there are some
but definitely not enough to justify growth like I am seeing. The pyrex
side should not be leaking too much, I'm very good about freeing what I
allocate in pyrex/C land. I will be running that linked to a memory leak
finding library in the next few days. Past the code reviews I've done,
what makes me think that I don't have any *wild* leaks going on at least
with the pyrex code is that I am seeing the same type of growth patterns
in both apps, and I don't use any pyrex with the director. Yes, the
harvester is consuming much more memory, but it also does the majority
of the heavy lifting.

I am alright with the app not freeing all the memory it can between high
and low activity times, but what puzzles me is how the memory usage just
keeps on growing and growing. Will it ever stop?

What I would like to know if others on this list have had similar
problems with python's gc in long running, larger python applications.
Am I crazy or is this a real problem with python's gc itself? If it's a
python gc issue, then it's my opinion that we will need to enhance the
gc before python can really gain leverage as a language suitable for
"enterprise-class" applications. I have surprised many other programmers
that I'm writing an application like this in python/pyrex that works
just as well and even more efficiently than the C/C++/Java competitors.
The only thing I have left to show is that the app lasts as long between
restarts. ;)
Robby
Oct 21 '05 #1
0 1596

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
1819
by: Michael L. Labbe | last post by:
Hello. I'm an experienced programmer who is evaluating learning Python if it is applicable to a few projects. The programs I am going to list are production software - they are not throwaway toy projects, but will actually have real world application. I've spent years writing C++ and Perl code, but I would like to try "this Python language" out in their implementation, as I have the luxury of slack deadlines for the next little while. ...
8
2111
by: Sridhar R | last post by:
Hi, I am a little experienced python programmer (2 months). I am somewhat experienced in C/C++. I am planning (now in design stage) to write an IDE in python. The IDE will not be a simple one. I had an idea of writing the IDE in C/C++, as it is a big project, bcoz of the following 1. if python is used, then the memory required for running the IDE will be high.
9
2421
by: limor | last post by:
Hi, I am considering using Python in a new testing tool application we intend to build for out product. I must get references before starting develope in this language , since although lots of good things are said about this language , I still have my doubts how can it compete with languages like C++ and Java. I have found te sytax and some features if the language not as problematic in maintenance prospective (although I keep reading...
7
2402
by: Ixokai | last post by:
Hello all. :) I've been a long time Python fan, and have fairly recently (with the support of a coworker who surprised me by mentioning my pet language one day) convinced my company to begin the colossal task of basically rewriting all of our software in Python. Woohoo. Previously we used a few different development environments, mostly Borland, for different products in our 'system' of thick clients sort of operating with eachother as...
81
4723
by: julio | last post by:
Sorry but there is no another way, c# .net and mono are going to rip python, not because python is a bad lenguage, but because is to darn old and it refuses to innovate things, to fix wrong things, just because retarded backwards compatibility and because the python comunity and developers refuses to consider tools as being almost as important as the language itself. What does c# .net has that python doesnt ? (significant features) --...
9
3767
by: F. GEIGER | last post by:
I've dev'ed a Python prototype of an app, that besides the internals making it up has a gui. While test-driven dev'ing the app's internals in Python is fun as usual, dev'ing the GUI is not so funny, at least for me. I guess dev'ing a GUI in a test-driven way is not possible, or is it? I'm using wxPython, so if anyone has an idea... For now most of the time I extend and change the gui things, then run it, do the clicks to go thru the...
114
9867
by: Maurice LING | last post by:
This may be a dumb thing to ask, but besides the penalty for dynamic typing, is there any other real reasons that Python is slower than Java? maurice
113
5294
by: John Nagle | last post by:
The major complaint I have about Python is that the packages which connect it to other software components all seem to have serious problems. As long as you don't need to talk to anything outside the Python world, you're fine. But once you do, things go downhill. MySQLdb has version and platform compatibility problems. So does M2Crypto. The built-in SSL support is weak. Even basic sockets don't quite work right; the socket module...
34
4366
by: Victor Kryukov | last post by:
Hello list, our team is going to rewrite our existing web-site, which has a lot of dynamic content and was quickly prototyped some time ago. Today, as we get better idea of what we need, we're going to re-write everything from scratch. Python is an obvious candidate for our team: everybody knows it, everybody likes it, it has *real* objects, nice clean syntax etc.
0
8913
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8761
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9280
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9200
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9142
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8144
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4525
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4795
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2162
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.