473,769 Members | 5,449 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

cPickle alternative?

Hello,
I have a huge problem with loading very simple structure into memory
it is a list of tuples, it has 6MB and consists of 100000 elements
import cPickle plik = open("mealy","r ")
mealy = cPickle.load(pl ik)
plik.close()


this takes about 30 seconds!
How can I accelerate it?

Thanks in adv.

Jul 18 '05 #1
13 3684
Drochom wrote:
Hello,
I have a huge problem with loading very simple structure into memory
it is a list of tuples, it has 6MB and consists of 100000 elements
import cPickle

plik = open("mealy","r ")
mealy = cPickle.load(pl ik)
plik.close( )


this takes about 30 seconds!
How can I accelerate it?

Thanks in adv.


What protocol did you pickle your data with? The default (protocol 0,
ASCII text) is the slowest. I suggest you upgrade to Python 2.3 and
save your data with the new protocol 2 -- it's likely to be fastest.
Alex

Jul 18 '05 #2
Hi,

I have no idea! I used a similar scheme the other day and made some
benchmarks (I *like* benchmarks!)

About 6 MB took 4 seconds dumping as well as loading on a 800 Mhz P3 Laptop.
When using binary mode it went down to about 1.5 seconds (And space to 2 MB)

THis is o.k., because I generally have problems beeing faster than 1 MB/sec
with my 2" drive, processor and Python ;-)

Python 2.3 seems to have even a more effective "protocoll mode 2".

May be your structures are *very* complex??

Kindly
Michael P

"Drochom" <pe******@gazet a.pl> schrieb im Newsbeitrag
news:bh******** **@atlantis.new s.tpi.pl...
Hello,
I have a huge problem with loading very simple structure into memory
it is a list of tuples, it has 6MB and consists of 100000 elements
import cPickle

plik = open("mealy","r ")
mealy = cPickle.load(pl ik)
plik.close()


this takes about 30 seconds!
How can I accelerate it?

Thanks in adv.

Jul 18 '05 #3
Drochom wrote:
What protocol did you pickle your data with? The default (protocol 0,
ASCII text) is the slowest. I suggest you upgrade to Python 2.3 and
save your data with the new protocol 2 -- it's likely to be fastest.
Alex


Thanks:)
i'm using default protocol, i'm not sure if i can upgrade so simply, because
i'm using many modules for Py2.2


Then use protocol 1 instead -- that has been the binary pickle protocol
for a long time, and works perfectly on Python 2.2.x :-)
(and it's much faster than protocol 0 -- the text protocol)

--Irmen

Jul 18 '05 #4
Drochom wrote:
Thanks for help:)
Here is simple example:
frankly speaking it's a graph with 100000 nodes:
STRUCTURE:
[(('k', 5, 0),), (('*', 0, 0),), (('t', 1, 1),), (('o', 2, 0),), (('t', 3,
0),), (('a', 4, 0), ('o', 2, 0))]


Perhaps this matches your spec:

from random import randrange
import pickle, cPickle, time

source = [(chr(randrange( 33, 127)), randrange(10000 0), randrange(i+50) )
for i in range(100000)]
def timed(module, flag, name='file.tmp' ):
start = time.time()
dest = file(name, 'wb')
module.dump(sou rce, dest, flag)
dest.close()
mid = time.time()
dest = file(name, 'rb')
result = module.load(des t)
dest.close()
stop = time.time()
assert source == result
return mid-start, stop-mid

On 2.2:
timed(pickle, 0): (7.8, 5.5)
timed(pickle, 1): (9.5, 6.2)
timed(cPickle, 0): (0.41, 4.9)
timed(cPickle, 1): (0.15, .53)

On 2.3:
timed(pickle, 0): (6.2, 5.3)
timed(pickle, 1): (6.6, 5.4)
timed(pickle, 2): (6.5, 3.9)

timed(cPickle, 0): (6.2, 5.3)
timed(pickle, 1): (.88, .69)
timed(pickle, 2): (.80, .67)

(Not tightly controlled -- I'd gues 1.5 digits)

-Scott David Daniels
Sc***********@A cm.Org

Jul 18 '05 #5

"Michael Peuser" <mp*****@web.de > wrote in message
news:bh******** *****@news.t-online.com...
o.k - I modified my testprogram - let it run at your machine.
It took 1.5 seconds - I made it 11 Million records to get to 2 Mbyte.
Kindly
Michael
------------------
import cPickle as Pickle
from time import clock

# generate 1.000.000 records
r=[(('k', 5, 0),), (('*', 0, 0),), (('t', 1, 1),), (('o', 2, 0),), (('t', 3, 0),), (('a', 4, 0), ('o', 2, 0))]

x=[]

for i in xrange(1000000) :
x.append(r)

print len(x), "records"

t0=clock()
f=open ("test","w")
Pickle.dump(x,f ,1)
f.close()
print "out=", clock()-t0

t0=clock()
f=open ("test")
x=Pickle.load(f )
f.close()
print "in=", clock()-t0
---------------------


Hi, i'm really grateful for your help,
i've modyfied your code a bit, check your times and tell me what are they

TRY THIS:

import cPickle as Pickle
from time import clock
from random import randrange
x=[]

for i in xrange(20000):
c = []
for j in xrange(randrang e(2,25)):
c.append((chr(r andrange(33,120 )),randrange(1, 100000),randran ge(1,3)))
c = tuple(c)
x.append(c)
if i%1000==0: print i #it will help you to survive waiting...
print len(x), "records"

t0=clock()
f=open ("test","w")
Pickle.dump(x,f ,0)
f.close()
print "out=", clock()-t0
t0=clock()
f=open ("test")
x=Pickle.load(f )
f.close()
print "in=", clock()-t0

Thanks once again:)

Jul 18 '05 #6
Hello,
If speed is important, you may want to do different things depending on e.g., what is in those tuples, and whether they are all the same length, etc. E.g., if they were all fixed length tuples of integers, you could do hugely better than store the data as a list of tuples. Those tuples have different length indeed.
You could store the whole thing in a mmap image, with a length-prefixed pickle string in the front representing index info. If i only knew how do to it...:-)
Find a way to avoid doing it? Or doing much of it?
What are your access needs once the data is accessible?

My structure stores a finite state automaton with polish dictionary (lexicon
to be more precise) and it should be loaded
once but fast!

Thx
Regards,
Przemo Drochomirecki

Jul 18 '05 #7
I forgot to explain you why i use tuples instead of lists
i was squeezing a lexicon => minimalization of automaton => using a
dictionary => using hashable objects =>using tuples(lists aren't hashable)
Regards,
Przemo Drochomirecki
Jul 18 '05 #8

Perhaps this matches your spec:

from random import randrange
import pickle, cPickle, time

source = [(chr(randrange( 33, 127)), randrange(10000 0), randrange(i+50) )
for i in range(100000)]
def timed(module, flag, name='file.tmp' ):
start = time.time()
dest = file(name, 'wb')
module.dump(sou rce, dest, flag)
dest.close()
mid = time.time()
dest = file(name, 'rb')
result = module.load(des t)
dest.close()
stop = time.time()
assert source == result
return mid-start, stop-mid

On 2.2:
timed(pickle, 0): (7.8, 5.5)
timed(pickle, 1): (9.5, 6.2)
timed(cPickle, 0): (0.41, 4.9)
timed(cPickle, 1): (0.15, .53)

On 2.3:
timed(pickle, 0): (6.2, 5.3)
timed(pickle, 1): (6.6, 5.4)
timed(pickle, 2): (6.5, 3.9)

timed(cPickle, 0): (6.2, 5.3)
timed(pickle, 1): (.88, .69)
timed(pickle, 2): (.80, .67)

(Not tightly controlled -- I'd gues 1.5 digits)

-Scott David Daniels
Sc***********@A cm.Org

Hello, and Thanks, your code was extremely helpful:)

Regards
Przemo Drochomirecki
Jul 18 '05 #9
On Sat, 16 Aug 2003 00:41:42 +0200, "Drochom" <pe******@gazet a.pl> wrote:
Hello,
If speed is important, you may want to do different things depending on

e.g.,
what is in those tuples, and whether they are all the same length, etc.

E.g.,
if they were all fixed length tuples of integers, you could do hugely

better
than store the data as a list of tuples.

Those tuples have different length indeed.
You could store the whole thing in a mmap image, with a length-prefixed

pickle
string in the front representing index info.

If i only knew how do to it...:-)
Find a way to avoid doing it? Or doing much of it?
What are your access needs once the data is accessible?

My structure stores a finite state automaton with polish dictionary (lexicon
to be more precise) and it should be loaded
once but fast!

I wonder how much space it would take to store the Polish complete language word
list with one entry each in a Python dictionary. 300k words of 6-7 characters avg?
Say 2MB plus the dict hash stuff. I bet it would be fast.

Is that in effect what you are doing, except sort of like a regex state machine
to match words character by character?

Regards,
Bengt Richter
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
2042
by: Guenter Walser | last post by:
Hello, When using the codeline: pickle.dump(self, file, pickle.HIGHEST_PROTOCOL) my program runs perfectly fine: --------------------------- Testprotocol -----------------------------
0
1197
by: Richard Kessler | last post by:
I am attempting a GUI using BOA Constructor. I have some simple code to pickle an object, but for some reason when I use cPickle it hangs the system, but pickle works just fine. I do not have a clue why cPickle will not work. The code is myObj = MyObject() myObj.SetSomeProperties f = open("my file name",'w')
2
2139
by: sh | last post by:
Hi guys, Well, I have a (maybe dumb) question. I want to write my own little blog using Python (as a fairly small but doable project for myself to learn more deaply Python in a web context). I don't want so far to use a database as a backend, I'd prefer use XML which is enough for a small amount of data the blog would have to deal with.
5
2233
by: Alex Polite | last post by:
I need to put recursive data structures on disc and found out that cPickle doesn't like recursion. What are my options? alex -- Alex Polite http://polite.se
1
3508
by: A.B., Khalid | last post by:
I wonder if someone can explain what is wrong here. I am pickling a list of dictionaries (see code attached) and unpickling it back using the HIGHEST_PROTOCOL of pickle and cPickle. I am getting an error message and trace backs if the list exceeds eight items. Whether I use pickle or cPickle does not matter, i.e., the eight number causes a problem in both modules, although the trace backs are of course dissimilar. This pickling and...
5
9453
by: Marcus Lowland | last post by:
Hello, I'm fairly new to python and have read about and wanted to begin experimenting with cpickle. As I understand, this should be a native module in the python library. I have python 2.3 and now just installed 2.4, but am not able to import or find cpickle.py in any directory of the install, or in the previous version (pickle.py is present and imports correctly). Is there a seperate module package that must be downloaded and installed......
4
2832
by: Mingus Tsai | last post by:
Hello- please help with unpickling problem: I am using Python version 2.3.4 with IDLE version 1.0.3 on a Windows XPhome system. My problem is with using cPickle to deserialize my pickled arrays of datetime.datetime instances. The following is the code I have written: import cPickle, datetime import Numeric
1
1373
by: Carl J. Van Arsdall | last post by:
Hey everyone, cPickle is raising an ImportError that I just don't quite understand. Before I paste the code, let me explain the application. Basically the part of the application that failed is a function that loads a list of objects from a file using cPickle. This list is a queue of requests. I've done some research and it looks like cPickle tries to load some modules as some kind of test. From what I can tell the module that cPickle...
5
1765
by: Victor Kryukov | last post by:
Hello list, The following behavior is completely unexpected. Is it a bug or a by- design feature? Regards, Victor. -----------------
1
10000
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9866
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8879
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7413
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6675
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5310
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5448
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3968
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2815
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.