Performance of cPickle module

Hi guys,

Well, I have a (maybe dumb) question.

I want to write my own little blog using Python (as a fairly small but doable
project for myself to learn more deaply Python in a web context).

I don't want so far to use a database as a backend, I'd prefer use XML which is
enough for a small amount of data the blog would have to deal with.

My problem is that, as HTTP is stateless, I can't keep some objects alive across
multiple requests like for instance an instance of a Users class which is an
interface to manage my users.

Let's say for example, that an user wants to access a resource (a simple web
page), my code would call that Users class though an instance of it and would
call something like getUser(self,login) returning an instance of an UserData
class which would provide me with all the details of that user (name, email,
etc.)

I want to save my users not in a database like I said but in an XML file on the
server.

As HTTP is stateless, I believe that I will have to create again and again the
Users object for every requests.

I don't want to parse the xml file each time, instead I want to save the Users
object (that keeps a map to all my UserData objects) into a file using the
cPickle module.

My question therefore is, is my architecture efficient enough ?

If I had to use a database, the database would keep track of my users and I
would only need to do a SQL statement. Would the cPickle more efficient in my
case than a database ?

To give a bit of code let's say that I have something like :

import cPickle

class UserData:
def __init__(self,name,email):
self.name = name
self.email = email

class Users:
def __init__(self):
self.users = {}
self.hasChanged = false

def _deserialize(self):
if self.hasChanged == false:
self.users = cPickle.load('users.dat')
else:
#parse the xml file...

Is it an efficient method ?

Thanks
- Sylvain

Jul 18 '05 #1

Subscribe Post Reply

2111

Holger Türk

sh@defuze.org wrote:

[...]
If I had to use a database, the database would keep track of my users and I
would only need to do a SQL statement. Would the cPickle more efficient in my
case than a database ?

To give a bit of code let's say that I have something like :

import cPickle

class UserData:
def __init__(self,name,email):
self.name = name
self.email = email

class Users:
def __init__(self):
self.users = {}
self.hasChanged = false

def _deserialize(self):
if self.hasChanged == false:
self.users = cPickle.load('users.dat')
else:
#parse the xml file...

Is it an efficient method ?

Thanks
- Sylvain

Hi,

this may be interesting for others, too.
I modified the example given above a little, entered
1000 users and saved and loaded the Users object
1000 times using cPickle on an Athlon 1GHz.
The results are:

[...]
995
996
997
998
999

real 5m26.115s
user 3m59.570s
sys 0m5.060s

That are 0.326s per save/load-roundtrip.

-rw-r--r-- 1 holger users 173972 2004-05-11 15:23 test.pickle
For 10 users:

[...]
995
996
997
998
999

real 0m5.148s
user 0m2.710s
sys 0m0.740s

0.005s per roundtrip.

-rw-r--r-- 1 holger users 1708 2004-05-11 15:25 test.pickle
That should be fast enough for a weblog application.
The http/cgi-overhead and the concurrent access on the
pickled objects when writing them will probably be
the harder problems.

Greetings,

Holger
Here's the program:
#!/usr/bin/python

import string, random, cPickle

def randString (l):
return "".join ([string.letters [random.randrange (l)] for i in range (l)])

class UserData:
def __init__(self,name,email):
self.name = name
self.email = email

class Users:
def __init__(self):
self.users = {}

u = Users ()

for a in range (1000):
u.users [randString (20)] = (UserData (randString (40), randString (40)))

for a in range (1000):
print a

f = open ("test.pickle", "w")
p = cPickle.Pickler (f)
p.dump (u)
f.close ()

f = open ("test.pickle", "r")
p = cPickle.Unpickler (f)
u = p.load ()
f.close ()

Jul 18 '05 #2

Paul Rubin

sh@defuze.org writes:

If I had to use a database, the database would keep track of my users and I
would only need to do a SQL statement. Would the cPickle more efficient in my
case than a database ?

Not if you had more than a few users. Why don't you look at the dbm
or shelve modules. The dbm module lets you store strings (including
pickles) in a disk file that works like a hash table (much less hassle
than messing with an SQL server). The shelve module uses dbm and
handles the pickling automatically. Note that all these approaches
have a terrible pitfall, which is what happens if the web page needs
to update the database, say you want to let people automatically
create their own user accounts through the site? If two people try to
update the dbm file (or an xml file) simultaneously, things can get
completely screwed up unless you're careful. The idea of a real
database is to take care of those issues for you.

Another thing you could do is put the session state in a browser
cookie. Be careful when you do that though, since a malicious user
could concoct a cookie that lets him seize some other user's session,
or even takes over your server if you unpickle the cookie. The best
way to handle that is encrypt the cookies. See

http://www.nightsong.com/phr/crypto/p3.py

for a simple encryption function that should be sufficient for this
purpose.

Jul 18 '05 #3

Similar topics

cPickle alternative?

by: Drochom | last post by:

Hello, I have a huge problem with loading very simple structure into memory it is a list of tuples, it has 6MB and consists of 100000 elements >import cPickle >plik = open("mealy","r")...

Python

Is there a difference between cPickle / pickle for dump?

by: Guenter Walser | last post by:

Hello, When using the codeline: pickle.dump(self, file, pickle.HIGHEST_PROTOCOL) my program runs perfectly fine: --------------------------- Testprotocol -----------------------------

Python

numarray + cPickle problem

by: Bill Mill | last post by:

Hello all, I've pickled a numarray array object using cPickle like so: pickle = cPickle.Pickler(fout, -1) pickle.dump((myarray, list1, list2)) and this seems to work fine, until I try to...

Python

Cpickle module... not in Lib installs

by: Marcus Lowland | last post by:

Hello, I'm fairly new to python and have read about and wanted to begin experimenting with cpickle. As I understand, this should be a native module in the python library. I have python 2.3 and now...

Python

Can XML-RPC performance be improved?

by: Sion Arrowsmith | last post by:

I've got an established client-server application here where there is now a need to shovel huge amounts of data (structured as lists of lists) between the two, and the performance bottleneck has...

Python

need helping tracking down weird bug in cPickle

by: Carl J. Van Arsdall | last post by:

Hey everyone, cPickle is raising an ImportError that I just don't quite understand. Before I paste the code, let me explain the application. Basically the part of the application that failed is a...

Python

Bug in cPickle with packages and 'object' inherited class

by: Conrado PLG | last post by:

Say you have this structure: pna/ __init__.py model.py __init__.py is empty. model.py is:

Python

cPickle asymptotic performance?

by: Eric Jonas | last post by:

Hello, I've done some benchmarking while attempting to serialize my (large) graph data structure with cPickle; I'm seeing superlinear performance (plotting it seems to suggest n^2 where n is the...

Python

Re: cPickle asymptotic performance?

by: Calvin Spealman | last post by:

If you are getting to the point where your data is large enough to really care about the speed of cPickle, then maybe its time you moved past pickles for your storage format? 2.5 includes sqlite,...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing