473,769 Members | 4,591 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Performance of cPickle module

sh
Hi guys,

Well, I have a (maybe dumb) question.

I want to write my own little blog using Python (as a fairly small but doable
project for myself to learn more deaply Python in a web context).

I don't want so far to use a database as a backend, I'd prefer use XML which is
enough for a small amount of data the blog would have to deal with.

My problem is that, as HTTP is stateless, I can't keep some objects alive across
multiple requests like for instance an instance of a Users class which is an
interface to manage my users.

Let's say for example, that an user wants to access a resource (a simple web
page), my code would call that Users class though an instance of it and would
call something like getUser(self,lo gin) returning an instance of an UserData
class which would provide me with all the details of that user (name, email,
etc.)

I want to save my users not in a database like I said but in an XML file on the
server.

As HTTP is stateless, I believe that I will have to create again and again the
Users object for every requests.

I don't want to parse the xml file each time, instead I want to save the Users
object (that keeps a map to all my UserData objects) into a file using the
cPickle module.

My question therefore is, is my architecture efficient enough ?

If I had to use a database, the database would keep track of my users and I
would only need to do a SQL statement. Would the cPickle more efficient in my
case than a database ?

To give a bit of code let's say that I have something like :

import cPickle

class UserData:
def __init__(self,n ame,email):
self.name = name
self.email = email

class Users:
def __init__(self):
self.users = {}
self.hasChanged = false

def _deserialize(se lf):
if self.hasChanged == false:
self.users = cPickle.load('u sers.dat')
else:
#parse the xml file...

Is it an efficient method ?

Thanks
- Sylvain

Jul 18 '05 #1
2 2139
sh@defuze.org wrote:
[...]
If I had to use a database, the database would keep track of my users and I
would only need to do a SQL statement. Would the cPickle more efficient in my
case than a database ?

To give a bit of code let's say that I have something like :

import cPickle

class UserData:
def __init__(self,n ame,email):
self.name = name
self.email = email

class Users:
def __init__(self):
self.users = {}
self.hasChanged = false

def _deserialize(se lf):
if self.hasChanged == false:
self.users = cPickle.load('u sers.dat')
else:
#parse the xml file...

Is it an efficient method ?

Thanks
- Sylvain


Hi,

this may be interesting for others, too.
I modified the example given above a little, entered
1000 users and saved and loaded the Users object
1000 times using cPickle on an Athlon 1GHz.
The results are:

[...]
995
996
997
998
999

real 5m26.115s
user 3m59.570s
sys 0m5.060s

That are 0.326s per save/load-roundtrip.

-rw-r--r-- 1 holger users 173972 2004-05-11 15:23 test.pickle
For 10 users:

[...]
995
996
997
998
999

real 0m5.148s
user 0m2.710s
sys 0m0.740s

0.005s per roundtrip.

-rw-r--r-- 1 holger users 1708 2004-05-11 15:25 test.pickle
That should be fast enough for a weblog application.
The http/cgi-overhead and the concurrent access on the
pickled objects when writing them will probably be
the harder problems.

Greetings,

Holger
Here's the program:
#!/usr/bin/python

import string, random, cPickle

def randString (l):
return "".join ([string.letters [random.randrang e (l)] for i in range (l)])

class UserData:
def __init__(self,n ame,email):
self.name = name
self.email = email

class Users:
def __init__(self):
self.users = {}

u = Users ()

for a in range (1000):
u.users [randString (20)] = (UserData (randString (40), randString (40)))

for a in range (1000):
print a

f = open ("test.pickl e", "w")
p = cPickle.Pickler (f)
p.dump (u)
f.close ()

f = open ("test.pickl e", "r")
p = cPickle.Unpickl er (f)
u = p.load ()
f.close ()
Jul 18 '05 #2
sh@defuze.org writes:
If I had to use a database, the database would keep track of my users and I
would only need to do a SQL statement. Would the cPickle more efficient in my
case than a database ?


Not if you had more than a few users. Why don't you look at the dbm
or shelve modules. The dbm module lets you store strings (including
pickles) in a disk file that works like a hash table (much less hassle
than messing with an SQL server). The shelve module uses dbm and
handles the pickling automatically. Note that all these approaches
have a terrible pitfall, which is what happens if the web page needs
to update the database, say you want to let people automatically
create their own user accounts through the site? If two people try to
update the dbm file (or an xml file) simultaneously, things can get
completely screwed up unless you're careful. The idea of a real
database is to take care of those issues for you.

Another thing you could do is put the session state in a browser
cookie. Be careful when you do that though, since a malicious user
could concoct a cookie that lets him seize some other user's session,
or even takes over your server if you unpickle the cookie. The best
way to handle that is encrypt the cookies. See

http://www.nightsong.com/phr/crypto/p3.py

for a simple encryption function that should be sufficient for this
purpose.
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
3684
by: Drochom | last post by:
Hello, I have a huge problem with loading very simple structure into memory it is a list of tuples, it has 6MB and consists of 100000 elements >import cPickle >plik = open("mealy","r") >mealy = cPickle.load(plik) >plik.close()
0
2042
by: Guenter Walser | last post by:
Hello, When using the codeline: pickle.dump(self, file, pickle.HIGHEST_PROTOCOL) my program runs perfectly fine: --------------------------- Testprotocol -----------------------------
5
2033
by: Bill Mill | last post by:
Hello all, I've pickled a numarray array object using cPickle like so: pickle = cPickle.Pickler(fout, -1) pickle.dump((myarray, list1, list2)) and this seems to work fine, until I try to load the array back into memory:
5
9453
by: Marcus Lowland | last post by:
Hello, I'm fairly new to python and have read about and wanted to begin experimenting with cpickle. As I understand, this should be a native module in the python library. I have python 2.3 and now just installed 2.4, but am not able to import or find cpickle.py in any directory of the install, or in the previous version (pickle.py is present and imports correctly). Is there a seperate module package that must be downloaded and installed......
15
5564
by: Sion Arrowsmith | last post by:
I've got an established client-server application here where there is now a need to shovel huge amounts of data (structured as lists of lists) between the two, and the performance bottleneck has become the amount of time spent parsing XML (it's taking 100% CPU on one or other end of the connection and accounting for well over 50% of the total call time, to the extent that it's having a greater impact on performance than user interaction)....
1
1373
by: Carl J. Van Arsdall | last post by:
Hey everyone, cPickle is raising an ImportError that I just don't quite understand. Before I paste the code, let me explain the application. Basically the part of the application that failed is a function that loads a list of objects from a file using cPickle. This list is a queue of requests. I've done some research and it looks like cPickle tries to load some modules as some kind of test. From what I can tell the module that cPickle...
1
1312
by: Conrado PLG | last post by:
Say you have this structure: pna/ __init__.py model.py __init__.py is empty. model.py is:
3
1483
by: Eric Jonas | last post by:
Hello, I've done some benchmarking while attempting to serialize my (large) graph data structure with cPickle; I'm seeing superlinear performance (plotting it seems to suggest n^2 where n is the number of nodes of my graph), in the duration of the pickle.dump calls and I can't quite figure out why. The connectivity of the graph is such that the number of nodes is ~ number of edges, so I don't think this is a problem of edge count...
0
863
by: Calvin Spealman | last post by:
If you are getting to the point where your data is large enough to really care about the speed of cPickle, then maybe its time you moved past pickles for your storage format? 2.5 includes sqlite, so you could persist them in a nice, indexed table or something. Just a suggestion. On Jun 12, 2008, at 2:25 PM, Eric Jonas wrote:
0
9424
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10223
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10051
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10000
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7413
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5310
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5448
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3968
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3571
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.