473,756 Members | 3,566 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

import data.py using massive amounts of memory

I've been dumping a database in a python code format (for use with
Python on S60 mobile phone actually) and I've noticed that it uses
absolutely tons of memory as compared to how much the data structure
actually needs once it is loaded in memory.

The programs below create a file (z.py) with a data structure in which
looks like this

-- z.py ----------------------------------------------------
z = {
0 : (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19),
1 : (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
2 : (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21),
[snip]
998 : (998, 999, 1000, 1001, 1002, ..., 1012, 1013, 1014, 1015, 1016, 1017),
999 : (999, 1000, 1001, 1002, 1003, ..., 1013, 1014, 1015, 1016, 1017, 1018),
}
------------------------------------------------------------

Under python2.2-python2.4 "import z" uses 8 MB, whereas loading a
pickled dump of the file only takes 450kB. This has been improved in
python2.5 so it only takes 2.2 MB.

$ python2.5 memory_usage.py
Memory used to import is 2284 kB
Total size of repr(z) is 105215
Memory used to unpickle is 424 kB
Total size of repr(z) is 105215

$ python2.4 memory_usage.py
Memory used to import is 8360 kB
Total size of repr(z) is 105215
Memory used to unpickle is 456 kB
Total size of repr(z) is 105215

$ python2.3 memory_usage.py
Memory used to import is 8436 kB
Total size of repr(z) is 105215
Memory used to unpickle is 456 kB
Total size of repr(z) is 105215

$ python2.2 memory_usage.py
Memory used to import is 8568 kB
Total size of repr(z) is 105215
Memory used to unpickle is 392 kB
Total size of repr(z) is 105215

$ python2.1 memory_usage.py
Memory used to import is 10756 kB
Total size of repr(z) is 105215
Memory used to unpickle is 384 kB
Total size of repr(z) is 105215

Why does it take so much memory? Is it some consequence of the way
the datastructure is parsed?

Note that once it has made the .pyc file the subsequent runs take even
less memory than the cpickle import.

S60 python is version 2.2.1. It doesn't have pickle unfortunately, but
it does have marshal and the datastructures I need are marshal-able so
that provides a good solution to my actual problem.

Save the two programs below with the names given to demonstrate the
problem. Note that these use some linux-isms to measure the memory
used by the current process which will need to be adapted if you don't
run it on linux!

-- memory_usage.py -----------------------------------------

import os
import sys
import re
from cPickle import dump

def memory():
"""Returns memory used (RSS) in kB"""
status = open("/proc/self/status").read()
match = re.search(r"(?m )^VmRSS:\s+(\d+ )", status)
memory = 0
if match:
memory = int(match.group (1))
return memory

def write_file():
"""Write the file to be imported"""
fd = open("z.py", "w")
fd.write("z = {\n")
for i in xrange(1000):
fd.write(" %d : %r,\n" % (i, tuple(range(i,i +20))))
fd.write("}\n")
fd.close()

def main():
write_file()
before = memory()
from z import z
after = memory()
print "Memory used to import is %s kB" % (after-before)
print "Total size of repr(z) is ",len(repr( z))

# Save a pickled copy for later
dump(z, open("z.bin", "wb"))

# Run the next part
os.system("%s memory_usage1.p y" % sys.executable)

if __name__ == "__main__":
main()

-- memory_usage1.p y ----------------------------------------

from memory_usage import memory
from cPickle import load

before = memory()
z = load(open("z.bi n", "rb"))
after = memory()
print "Memory used to unpickle is %s kB" % (after-before)
print "Total size of repr(z) is ",len(repr( z))

------------------------------------------------------------

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Jun 27 '07 #1
1 3015
Note that once it has made the .pyc file the subsequent runs take even
less memory than the cpickle import.
Could that be the compiler compiling?

Without knowing to much details about that process, but from 2.4 to
2.5 the compiler was totally exchanged, <whateverto AST.
That would explain the drop from 8Meg -2.2 Meg.

Harald

Jun 27 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
4009
by: Unforgiven | last post by:
I have an application, where I continuously get new binary data input, in the form of a char*. This data comes from the Windows Multimedia wave input functions, but that's not important. What it means is that every 2 seconds, I need to add 22050 bytes to an ever expanding buffer. I have no idea at the beginning how large this buffer would need to be. Now there are several possibilities to do is, as I see it: 1. Just make the buffer a...
5
2451
by: Framework fan | last post by:
Hello, If I wrote the next ebay (yes I know, yawn-snore) and I had a database with 5 million auction items in it, what would be a really good strategy to get a search done very quickly? Would it involve something called OLAP and/or "data mining"? The only technology I am familiar with is simply SQL Server databases with stored procedures. I think I'd be guessing correctly and say that this technology simply wouldn't be fast enough *on...
0
1573
by: Doug R | last post by:
Hello, I have a system that I am writing to automaticly import Credit Transaction data into a SQL Server 2000 Database. I am using a VB.Net application to detect when the file arives and prep it for parsing. The file is aproximately 10Mb of relatively complex hierarchal data that is defined by 2 character tokens at the begining of each data line. The structure breaks out into aproximately 6 parent-child related tables with numerous...
3
2852
by: Wayne Marsh | last post by:
Hi all. I am working on an audio application which needs reasonably fast access to large amounts of data. For example, the program may load a 120 second stereo sound sample stored at 4bytes per sample, which would mean over 40MB of data at a 44100Hz sampling rate. Now, what would be a good way to handle all of this data? Ideally, for the sake of my own sanity and the algorithms within directly functional portions of the code, I'd like...
5
1423
by: Scott Reynolds | last post by:
Hello! I developed a web application to display results from the database. Now I need to add search function, to search, sort and filter data. My question is, which way is better... 1) Store all related data in a single DataSet and use DataView to filter and sort data
7
10827
by: =?Utf-8?B?TW9iaWxlTWFu?= | last post by:
Hello everyone: I am looking for everyone's thoughts on moving large amounts (actually, not very large, but large enough that I'm throwing exceptions using the default configurations). We're doing a proof-of-concept on WCF whereby we have a Windows form client and a Server. Our server is a middle-tier that interfaces with our SQL 05 database server.
19
6343
by: Zytan | last post by:
I want multiple instances of the same .exe to run and share the same data. I know they all can access the same file at the same time, no problem, but I'd like to have this data in RAM, which they can all access. It seems like a needless waste of memory to make them all maintain their own copy of the same data in RAM at the same time. What's the best way to achieve this? I've heard of memory mapped files, so maybe that's the answer. ...
0
1150
by: volt9000 | last post by:
I'm using PdfSharp (an open-source PDF manipulation library) to generate a very large PDF ( 1500+ pages.) My program crashes before reaching the end because of the massive amounts of memory being used (after 750 entries the memory footprint is ONE GIGABYTE.) So I've gotten around this by splitting up the word: every X number of entries, I close the PDF and start a new one, with the intention of combining the PDFs at the end. The problem...
0
1761
by: Ben Lee | last post by:
hi folks -- a quick python and sqlite3 performance question. i find that inserting a million rows of in-memory data into an in-memory database via a single executemany() is about 30% slower than using the sqlite3 CLI and the .import command (reading the same data from a disk file, even.) i find this surprising, executemany() i assume is using a prepared statement and this is exactly what the .import command does (based on my quick...
0
9275
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
9843
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8713
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7248
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6534
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5142
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3805
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3358
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2666
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.