473,386 Members | 1,997 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Performance of cPickle module

sh
Hi guys,

Well, I have a (maybe dumb) question.

I want to write my own little blog using Python (as a fairly small but doable
project for myself to learn more deaply Python in a web context).

I don't want so far to use a database as a backend, I'd prefer use XML which is
enough for a small amount of data the blog would have to deal with.

My problem is that, as HTTP is stateless, I can't keep some objects alive across
multiple requests like for instance an instance of a Users class which is an
interface to manage my users.

Let's say for example, that an user wants to access a resource (a simple web
page), my code would call that Users class though an instance of it and would
call something like getUser(self,login) returning an instance of an UserData
class which would provide me with all the details of that user (name, email,
etc.)

I want to save my users not in a database like I said but in an XML file on the
server.

As HTTP is stateless, I believe that I will have to create again and again the
Users object for every requests.

I don't want to parse the xml file each time, instead I want to save the Users
object (that keeps a map to all my UserData objects) into a file using the
cPickle module.

My question therefore is, is my architecture efficient enough ?

If I had to use a database, the database would keep track of my users and I
would only need to do a SQL statement. Would the cPickle more efficient in my
case than a database ?

To give a bit of code let's say that I have something like :

import cPickle

class UserData:
def __init__(self,name,email):
self.name = name
self.email = email

class Users:
def __init__(self):
self.users = {}
self.hasChanged = false

def _deserialize(self):
if self.hasChanged == false:
self.users = cPickle.load('users.dat')
else:
#parse the xml file...

Is it an efficient method ?

Thanks
- Sylvain

Jul 18 '05 #1
2 2111
sh@defuze.org wrote:
[...]
If I had to use a database, the database would keep track of my users and I
would only need to do a SQL statement. Would the cPickle more efficient in my
case than a database ?

To give a bit of code let's say that I have something like :

import cPickle

class UserData:
def __init__(self,name,email):
self.name = name
self.email = email

class Users:
def __init__(self):
self.users = {}
self.hasChanged = false

def _deserialize(self):
if self.hasChanged == false:
self.users = cPickle.load('users.dat')
else:
#parse the xml file...

Is it an efficient method ?

Thanks
- Sylvain


Hi,

this may be interesting for others, too.
I modified the example given above a little, entered
1000 users and saved and loaded the Users object
1000 times using cPickle on an Athlon 1GHz.
The results are:

[...]
995
996
997
998
999

real 5m26.115s
user 3m59.570s
sys 0m5.060s

That are 0.326s per save/load-roundtrip.

-rw-r--r-- 1 holger users 173972 2004-05-11 15:23 test.pickle
For 10 users:

[...]
995
996
997
998
999

real 0m5.148s
user 0m2.710s
sys 0m0.740s

0.005s per roundtrip.

-rw-r--r-- 1 holger users 1708 2004-05-11 15:25 test.pickle
That should be fast enough for a weblog application.
The http/cgi-overhead and the concurrent access on the
pickled objects when writing them will probably be
the harder problems.

Greetings,

Holger
Here's the program:
#!/usr/bin/python

import string, random, cPickle

def randString (l):
return "".join ([string.letters [random.randrange (l)] for i in range (l)])

class UserData:
def __init__(self,name,email):
self.name = name
self.email = email

class Users:
def __init__(self):
self.users = {}

u = Users ()

for a in range (1000):
u.users [randString (20)] = (UserData (randString (40), randString (40)))

for a in range (1000):
print a

f = open ("test.pickle", "w")
p = cPickle.Pickler (f)
p.dump (u)
f.close ()

f = open ("test.pickle", "r")
p = cPickle.Unpickler (f)
u = p.load ()
f.close ()
Jul 18 '05 #2
sh@defuze.org writes:
If I had to use a database, the database would keep track of my users and I
would only need to do a SQL statement. Would the cPickle more efficient in my
case than a database ?


Not if you had more than a few users. Why don't you look at the dbm
or shelve modules. The dbm module lets you store strings (including
pickles) in a disk file that works like a hash table (much less hassle
than messing with an SQL server). The shelve module uses dbm and
handles the pickling automatically. Note that all these approaches
have a terrible pitfall, which is what happens if the web page needs
to update the database, say you want to let people automatically
create their own user accounts through the site? If two people try to
update the dbm file (or an xml file) simultaneously, things can get
completely screwed up unless you're careful. The idea of a real
database is to take care of those issues for you.

Another thing you could do is put the session state in a browser
cookie. Be careful when you do that though, since a malicious user
could concoct a cookie that lets him seize some other user's session,
or even takes over your server if you unpickle the cookie. The best
way to handle that is encrypt the cookies. See

http://www.nightsong.com/phr/crypto/p3.py

for a simple encryption function that should be sufficient for this
purpose.
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: Drochom | last post by:
Hello, I have a huge problem with loading very simple structure into memory it is a list of tuples, it has 6MB and consists of 100000 elements >import cPickle >plik = open("mealy","r")...
0
by: Guenter Walser | last post by:
Hello, When using the codeline: pickle.dump(self, file, pickle.HIGHEST_PROTOCOL) my program runs perfectly fine: --------------------------- Testprotocol -----------------------------
5
by: Bill Mill | last post by:
Hello all, I've pickled a numarray array object using cPickle like so: pickle = cPickle.Pickler(fout, -1) pickle.dump((myarray, list1, list2)) and this seems to work fine, until I try to...
5
by: Marcus Lowland | last post by:
Hello, I'm fairly new to python and have read about and wanted to begin experimenting with cpickle. As I understand, this should be a native module in the python library. I have python 2.3 and now...
15
by: Sion Arrowsmith | last post by:
I've got an established client-server application here where there is now a need to shovel huge amounts of data (structured as lists of lists) between the two, and the performance bottleneck has...
1
by: Carl J. Van Arsdall | last post by:
Hey everyone, cPickle is raising an ImportError that I just don't quite understand. Before I paste the code, let me explain the application. Basically the part of the application that failed is a...
1
by: Conrado PLG | last post by:
Say you have this structure: pna/ __init__.py model.py __init__.py is empty. model.py is:
3
by: Eric Jonas | last post by:
Hello, I've done some benchmarking while attempting to serialize my (large) graph data structure with cPickle; I'm seeing superlinear performance (plotting it seems to suggest n^2 where n is the...
0
by: Calvin Spealman | last post by:
If you are getting to the point where your data is large enough to really care about the speed of cPickle, then maybe its time you moved past pickles for your storage format? 2.5 includes sqlite,...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.