473,322 Members | 1,522 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

how to build a dict including a large number of data

hi everyone
i'm a newbie to python :)
i have a binary file named test.dat including 9600000 records.
the record format is int a + int b + int c + int d
i want to build a dict like this: key=int a,int b values=int c,int d
i choose using bsddb and it takes about 140 seconds to build the dict.
what can i do if i want to make my program run faster?
or is there another way i can choose?
Thanks in advance.

My Code:
-----------------------------------------------------------------------------------
my_file = file('test.dat','rb')
content = my_file.read()
record_number = len(content) / 16

db = bsddb.btopen('test.dat.db','n',cachesize=500000000 )
for i in range(0,record_number):
a = struct.unpack("IIII",content[i*16:i*16+16])
db['%d_%d' % (a[0],a[1])] = '%d_%d' % (a[2],a[3])

db.close()
my_file.close()
Jan 4 '08 #1
3 1712
On Jan 4, 3:57 pm, wanzathe <wanza...@gmail.comwrote:
hi everyone
i'm a newbie to python :)
i have a binary file named test.dat including 9600000 records.
the record format is int a + int b + int c + int d
i want to build a dict like this: key=int a,int b values=int c,int d
i choose using bsddb and it takes about 140 seconds to build the dict.
what can i do if i want to make my program run faster?
or is there another way i can choose?
Thanks in advance.

My Code:
-----------------------------------------------------------------------------------
my_file = file('test.dat','rb')
content = my_file.read()
record_number = len(content) / 16

db = bsddb.btopen('test.dat.db','n',cachesize=500000000 )
for i in range(0,record_number):
a = struct.unpack("IIII",content[i*16:i*16+16])
db['%d_%d' % (a[0],a[1])] = '%d_%d' % (a[2],a[3])

db.close()
my_file.close()
my_file = file('test.dat','rb')
db = bsddb.btopen('test.dat.db','n',cachesize=500000000 )
content = myfile.read(16)
while content:
a = struct.unpack('IIII',content)
db['%d_%d' % (a[0],a[1])] = '%d_%d' % (a[2],a[3])
content = myfile.read(16)

db.close()
my_file.close()

That would be more memory efficient, as for speed you would need to
time it on your side.
Jan 4 '08 #2
wanzathe wrote:
i have a binary file named test.dat including 9600000 records.
the record format is int a + int b + int c + int d
i want to build a dict like this: key=int a,int b values=int c,int d
i choose using bsddb and it takes about 140 seconds to build the dict.
you're not building a dict, you're populating a persistent database.
storing ~70000 records per second isn't that bad, really...
what can i do if i want to make my program run faster?
or is there another way i can choose?
why not just use a real Python dictionary, and the marshal module for
serialization?

</F>

Jan 4 '08 #3
On 1ÔÂ4ÈÕ, ÏÂÎç10ʱ17·Ö, Fredrik Lundh <fred...@pythonware.comwrote:
wanzathe wrote:
i have a binary file named test.dat including 9600000 records.
the record format is int a + int b + int c + int d
i want to build a dict like this: key=int a,int b values=int c,int d
i choose using bsddb and it takes about 140 seconds to build the dict.

you're not building a dict, you're populating a persistent database.
storing ~70000 records per second isn't that bad, really...
what can i do if i want to make my program run faster?
or is there another way i can choose?

why not just use a real Python dictionary, and the marshal module for
serialization?

</F>
hi,Fredrik Lundn
you are right, i'm populating a persistent database.
i plan to use a real Python dictionary and use cPickle for
serialization at first, but it did not work because the number of
records is too large.
Thanks
Jan 4 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Robin Cull | last post by:
Imagine I have a dict looking something like this: myDict = {"key 1": , "key 2": , "key 3": , "key 4": } That is, a set of keys which have a variable length list of associated values after...
5
by: cpmcdaniel | last post by:
I was wondering if the following two "if" statements compile down to the same bytecode for a standard Dictionary type: m = {"foo": 1, "blah": 2} if "foo" in m: print "sweet" if...
2
by: Dave | last post by:
After much general searching, a JS newbie asks: can anyone point me to where I can find out how to build a page (including form) using Javascript? The reason I'd want to do this is because the page...
3
by: Peter Beattie | last post by:
I was wondering whether certain data structures in Python, e.g. dict, might have limits as to the amount of memory they're allowed to take up. Is there any documentation on that? Why am I...
7
by: George Young | last post by:
I am puzzled that creating large dicts with an explicit iterable of key,value pairs seems to be slow. I thought to save time by doing: palettes = dict((w,set(w)) for w in words) instead of: ...
4
dshimer
by: dshimer | last post by:
I have a file whose structure in strictly generic terms is similar to the following.keyname first keyword1 1.1 keyword2 1.2 keyword3 1.3 keyname second keyword1 2.1 keyword2 2.2 keyword3 2.3...
4
by: bcomeara | last post by:
I am writing a program which needs to include a large amount of data. Basically, the data are p values for different possible outcomes from trials with different number of observations (the p...
3
by: james_027 | last post by:
hi, a_dict = {'name':'apple', 'color':'red', 'texture':'smooth', 'shape':'sphere'} is there any difference between .. for key in a_dict: from
14
by: yxq | last post by:
Hello, I want to build the multi-language application with the xml file, how to do? could anyone tell a sample? Thank you
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.