Hello,
(first i'm sorry for my bad english...)
for a program, i need to have some kind of dictionary which will
contain a huge amount of keys and values. i have tried 'to do it
simple' using a normal dict but when available memory was exhausted,
my computer started to be really slow as it swap-ed.
so i wondered if i can not use some kind of cache, i googled and found
information on LRU, LFU, and LFU interested me : keep only most
frequently used items inside dict (and memory) and writing others on
disk for a possible future reuse.
here is a silly attempt to do it, it still missed some features but
before to continue i would like to have your opinions. i'm interested
in any propositions or refactorisations :) and btw i may have missed
others already/tuned/better solutions...
best.
---
import fileinput, pickle, random, shutil, sys
class sLFU:
TMP = """%s.bak"""
""" a simple kind of Least Frequently Used container"""
def __init__(self, size, ratio, filename = None):
""" filename = a temporary file which will could be deleted"""
self.d = {} # container
self.size, self.ratio = size, ratio
self.freqs, self.fmin = {}, 0
self.filename = filename
if self.filename: open(self.filename, 'w')
def __setitem__(self, obj, v):
self.d[obj] = v
self.freqs[obj] = self.freqs.get(obj, self.fmin) + 1
if len(self.d) > self.size:
""" lfu = { (size / ratio) least frequently used
objects... }"""
freqs = [(f, obj) for (obj, f) in self.freqs.items()]
# maybe should i use a generator () like
# ((f, obj) for (obj, f) in self.freqs.items())
freqs.sort()
lfu = {}
# and here enumerate(sorted(freqs))
for i, (f, obj) in enumerate(freqs):
if i > self.size / self.ratio:
break
lfu[obj] = self.d[obj]
""" next minimal frequency will be max frequency of the
lfu
(maybe it would be better to substract this from others in
self.freqs)"""
self.fmin = f
""" and now delete theses objects..."""
for obj in lfu:
del self.freqs[obj]
del self.d[obj]
if self.filename:
""" now must save lfu to disk...
as i do not managed to do otherwise, copy file to a
tmp
file and read it line by line, updating when
necessary...
there must be plenty rooms for improvement here :("""
shutil.copy(self.filename, self.TMP % self.filename)
fhs = open(self.TMP % self.filename)
fh = open(self.filename, 'w')
""" flag to ignore 'value' line"""
ignoreNext = False
for i, line in enumerate(fhs):
""" first copy old lfu which were already set,
updating
them if necessary..."""
if ignoreNext:
ignoreNext = False
continue
line = line.rstrip()
if i % 2 == 0:
obj = self.loads(line)
if obj not in lfu and obj in self.d:
""" obj available from memory, do not to
keep it on disk..."""
ignoreNext = True
continue
elif obj in lfu:
""" obj was available from memory, but as
a lfu,
was removed and must be updated on disk"""
fh.write("%s\n" % line)
fh.write("%s\n" % self.dumps(lfu[obj]))
del lfu[obj]
ignoreNext = True
continue
""" default behaviour : copy line..."""
fh.write("%s\n" % line)
""" from now lfu should contain only unseen lfu
objects,
add them to file..."""
fh = open(self.filename, 'a')
for obj in lfu:
fh.write("%s\n" % self.dumps(obj))
fh.write("%s\n" % self.dumps(lfu[obj]))
def __getitem__(self, obj):
if obj in self.d:
""" from memory :)"""
return self.d[obj]
if self.filename:
""" if a filename was provided, search inside file if
such an obj is present, then return it ; else raise
IndexErrror."""
fh = open(self.filename)
found = False
for i, line in enumerate(fh):
line = line.rstrip()
if found:
v = self.loads(line)
self.d[obj] = v
self.freqs[obj] = self.fmin + 1
return v
if obj == self.loads(line):
found = True
raise KeyError(obj)
""" maybe class methods would have been better here,
actually i would have liked static + class...
totally arbitrary format, haved choosed to use pickle,
odd line contain 'obj'
(next) even line contain 'value'
"""
def dumps(self, s):
return pickle.dumps(s).replace("\n", "\t")
staticmethod(dumps)
def loads(self, s):
return pickle.loads(s.replace("\t", "\n"))
staticmethod(loads)
lfu = sLFU(2, 2, 'slfu.txt')
lfu[1]
8
= {'1':
1fd2
'a'}
lfu[2] = []
lfu[3] = '3'
lfu[2] = [1,]
lfu[2] = [1,2]
lfu[4] = 4
lfu[5] = '5'
print "#d", len(lfu.d)
print lfu.d
print "".join(open(lfu.filename).readlines()) 2 1838
Joh> so i wondered if i can not use some kind of cache, i googled and
Joh> found information on LRU, LFU, and LFU interested me : keep only
Joh> most frequently used items inside dict (and memory) and writing
Joh> others on disk for a possible future reuse.
I have a Cache class that you might subclass to provide different behavior: http://www.musi-cal.com/~skip/python/Cache.py
What it does now is throw out keys that match the desired criteria. You
could subclass it to stuff those items in a DB file (probably using the
anydbm module) and read them from the DB file if an attempt to access a key
in the cache fails.
Skip
Sounds like you want a database (e.g. on disk storage
of keys and values that get accessed via the key).
Take a look at one of the databases for storing your
key/value pairs.
Larry Bates
Joh wrote: Hello,
(first i'm sorry for my bad english...)
for a program, i need to have some kind of dictionary which will contain a huge amount of keys and values. i have tried 'to do it simple' using a normal dict but when available memory was exhausted, my computer started to be really slow as it swap-ed.
so i wondered if i can not use some kind of cache, i googled and found information on LRU, LFU, and LFU interested me : keep only most frequently used items inside dict (and memory) and writing others on disk for a possible future reuse.
here is a silly attempt to do it, it still missed some features but before to continue i would like to have your opinions. i'm interested in any propositions or refactorisations :) and btw i may have missed others already/tuned/better solutions...
best.
---
import fileinput, pickle, random, shutil, sys
class sLFU: TMP = """%s.bak""" """ a simple kind of Least Frequently Used container""" def __init__(self, size, ratio, filename = None): """ filename = a temporary file which will could be deleted""" self.d = {} # container self.size, self.ratio = size, ratio self.freqs, self.fmin = {}, 0 self.filename = filename if self.filename: open(self.filename, 'w') def __setitem__(self, obj, v): self.d[obj] = v self.freqs[obj] = self.freqs.get(obj, self.fmin) + 1 if len(self.d) > self.size: """ lfu = { (size / ratio) least frequently used objects... }""" freqs = [(f, obj) for (obj, f) in self.freqs.items()] # maybe should i use a generator () like # ((f, obj) for (obj, f) in self.freqs.items()) freqs.sort() lfu = {} # and here enumerate(sorted(freqs)) for i, (f, obj) in enumerate(freqs): if i > self.size / self.ratio: break lfu[obj] = self.d[obj] """ next minimal frequency will be max frequency of the lfu (maybe it would be better to substract this from others in self.freqs)""" self.fmin = f """ and now delete theses objects...""" for obj in lfu: del self.freqs[obj] del self.d[obj] if self.filename: """ now must save lfu to disk... as i do not managed to do otherwise, copy file to a tmp file and read it line by line, updating when necessary... there must be plenty rooms for improvement here :(""" shutil.copy(self.filename, self.TMP % self.filename) fhs = open(self.TMP % self.filename) fh = open(self.filename, 'w') """ flag to ignore 'value' line""" ignoreNext = False for i, line in enumerate(fhs): """ first copy old lfu which were already set, updating them if necessary...""" if ignoreNext: ignoreNext = False continue line = line.rstrip() if i % 2 == 0: obj = self.loads(line) if obj not in lfu and obj in self.d: """ obj available from memory, do not to keep it on disk...""" ignoreNext = True continue elif obj in lfu: """ obj was available from memory, but as a lfu, was removed and must be updated on disk""" fh.write("%s\n" % line) fh.write("%s\n" % self.dumps(lfu[obj])) del lfu[obj] ignoreNext = True continue """ default behaviour : copy line...""" fh.write("%s\n" % line) """ from now lfu should contain only unseen lfu objects, add them to file...""" fh = open(self.filename, 'a') for obj in lfu: fh.write("%s\n" % self.dumps(obj)) fh.write("%s\n" % self.dumps(lfu[obj])) def __getitem__(self, obj): if obj in self.d: """ from memory :)""" return self.d[obj] if self.filename: """ if a filename was provided, search inside file if such an obj is present, then return it ; else raise IndexErrror.""" fh = open(self.filename) found = False for i, line in enumerate(fh): line = line.rstrip() if found: v = self.loads(line) self.d[obj] = v self.freqs[obj] = self.fmin + 1 return v if obj == self.loads(line): found = True raise KeyError(obj) """ maybe class methods would have been better here, actually i would have liked static + class... totally arbitrary format, haved choosed to use pickle, odd line contain 'obj' (next) even line contain 'value' """ def dumps(self, s): return pickle.dumps(s).replace("\n", "\t") staticmethod(dumps) def loads(self, s): return pickle.loads(s.replace("\t", "\n")) staticmethod(loads) lfu = sLFU(2, 2, 'slfu.txt') lfu[1] 8 = {'1': 1fd2 'a'} lfu[2] = [] lfu[3] = '3' lfu[2] = [1,] lfu[2] = [1,2] lfu[4] = 4 lfu[5] = '5'
print "#d", len(lfu.d) print lfu.d print "".join(open(lfu.filename).readlines()) This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: svilen |
last post by:
hello again.
i'm now into using python instead of another language(s) for
describing structures of data, including names, structure,
type-checks, conversions, value-validations, metadata etc....
|
by: bearophile |
last post by:
Ville Vainio:
>It's highly typical for the newbies to suggest improvements to the
>language. They will usually learn that they are wrong, but the
>discussion that ensues can be fruitfull anyway...
|
by: fdsl ysnh |
last post by:
--- python-list-request@python.orgдµÀ:
> Send Python-list mailing list submissions to
> python-list@python.org
>
> To subscribe or unsubscribe via the World Wide Web,
> visit
>...
|
by: Steven T. Hatton |
last post by:
There are probably a zillion ways to do this, but I'm wondering of there's a
C++ convention. I want to have a fixed mapping between keys and values.
It's basically the same thing as a std::map<>,...
|
by: Bengt Richter |
last post by:
Has anyone found a way besides not deriving from dict?
Shouldn't there be a way?
TIA
(need this for what I hope is an improvement on the Larosa/Foord OrderedDict ;-)
I guess I can just document...
| |
by: sandravandale |
last post by:
I can think of several messy ways of making a dict that sets a flag if
it's been altered, but I have a hunch that experienced python
programmers would probably have an easier (well maybe more...
|
by: George Sakkis |
last post by:
Although I consider dict(**kwds) as one of the few unfortunate design
choices in python since it prevents the future addition of useful
keyword arguments (e.g a default value or an orderby...
|
by: jeremito |
last post by:
Please excuse me if this is obvious to others, but I can't figure it
out. I am subclassing dict, but want to prevent direct changing of
some key/value pairs. For this I thought I should override...
|
by: Giampaolo Rodola' |
last post by:
Hi,
for an FTP server I wrote I'd need to group the FTP commands in one
table that defines the command itself, the syntax string, required
permission, whether it requires authorization, whether it...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...
| |