473,406 Members | 2,336 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

persistent deque (continued)

Some time ago, I was asking about the feasibility of a persistent
deque, a double-ended queue.

It runs into the typical space allocation problems. If you're storing
a pickle, you have to allocate and fragment the file you've opened,
since pickles can be variable-length strings; i.e. if the new data is
too long, blank out its entry, and grow the file. If you're storing a
data-type, you lose Python's dynamic-type advantages, as well as its
long integers, as they can be any length. If you change the object in
the deque, such as when using any mutable type, you have to update the
container too.

Does anyone have any experience storing pickles (I am aware of the
similarities to shelf) to a database?
Can the file system facilitate the variable-length string problem?
How feasible is a length-of-length - length - data solution to the
unbounded integer problem?
Is there any alternative to completely re-pickling a large (say 1k
pickled) object you only change slightly?
What other issues are there?
Is a hybrid-storage type possible, that stores the contents of its
Python-allocated memory block to disk, including reference count, even
if it's a lot slower? The object could not contain any references to
objects not allocated on disk.

A first approach is for the file to look like this:

00 data 01 data 02
01 data 03 data 04
02 data 05 data 06

Append would add:

03 data 07 data 08

AppendLeft would add:

-01 data 09 data 0a

Pop would remove 03, PopLeft would remove -01. You would need a
length-and-head index to make 'rotate' available. Remove would run a
worst-case risk of renumbering half of the indices stored, plus a
rotate.
Jul 21 '08 #1
4 2379
castironpi wrote:
Some time ago, I was asking about the feasibility of a persistent
deque, a double-ended queue.

It runs into the typical space allocation problems. If you're storing
a pickle, you have to allocate and fragment the file you've opened,
since pickles can be variable-length strings; i.e. if the new data is
too long, blank out its entry, and grow the file. If you're storing a
data-type, you lose Python's dynamic-type advantages, as well as its
long integers, as they can be any length. If you change the object in
the deque, such as when using any mutable type, you have to update the
container too.

Does anyone have any experience storing pickles (I am aware of the
similarities to shelf) to a database?
Can the file system facilitate the variable-length string problem?
How feasible is a length-of-length - length - data solution to the
unbounded integer problem?
Is there any alternative to completely re-pickling a large (say 1k
pickled) object you only change slightly?
What other issues are there?
Is a hybrid-storage type possible, that stores the contents of its
Python-allocated memory block to disk, including reference count, even
if it's a lot slower? The object could not contain any references to
objects not allocated on disk.

A first approach is for the file to look like this:

00 data 01 data 02
01 data 03 data 04
02 data 05 data 06

Append would add:

03 data 07 data 08

AppendLeft would add:

-01 data 09 data 0a

Pop would remove 03, PopLeft would remove -01. You would need a
length-and-head index to make 'rotate' available. Remove would run a
worst-case risk of renumbering half of the indices stored, plus a
rotate.
It is so much easier to implement this using a database table that IMHO most
people would go that route.

-Larry
Jul 21 '08 #2
On 2008-07-21 21:08, castironpi wrote:
Some time ago, I was asking about the feasibility of a persistent
deque, a double-ended queue.
You might want to have a look at mxBeeBase:

http://www.egenix.com/products/python/mxBase/mxBeeBase/

Using the integer index you could probably write an on-disk
dequeue.

The B*Tree indexes available in mxBeeBase keep the indexes sorted,
so you'd only have to keep incrementing the index as you add new
values and keep a record of the highest and lowest index currently
in use.

Details are left as exercise for the interested reader ;-)
It runs into the typical space allocation problems. If you're storing
a pickle, you have to allocate and fragment the file you've opened,
since pickles can be variable-length strings; i.e. if the new data is
too long, blank out its entry, and grow the file. If you're storing a
data-type, you lose Python's dynamic-type advantages, as well as its
long integers, as they can be any length. If you change the object in
the deque, such as when using any mutable type, you have to update the
container too.

Does anyone have any experience storing pickles (I am aware of the
similarities to shelf) to a database?
Can the file system facilitate the variable-length string problem?
How feasible is a length-of-length - length - data solution to the
unbounded integer problem?
Is there any alternative to completely re-pickling a large (say 1k
pickled) object you only change slightly?
What other issues are there?
Is a hybrid-storage type possible, that stores the contents of its
Python-allocated memory block to disk, including reference count, even
if it's a lot slower? The object could not contain any references to
objects not allocated on disk.

A first approach is for the file to look like this:

00 data 01 data 02
01 data 03 data 04
02 data 05 data 06

Append would add:

03 data 07 data 08

AppendLeft would add:

-01 data 09 data 0a

Pop would remove 03, PopLeft would remove -01. You would need a
length-and-head index to make 'rotate' available. Remove would run a
worst-case risk of renumbering half of the indices stored, plus a
rotate.
--
http://mail.python.org/mailman/listinfo/python-list
--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Jul 21 2008)
>>Python/Zope Consulting and Support ... http://www.egenix.com/
mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
__________________________________________________ ______________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
Jul 21 '08 #3
On Jul 21, 12:08*pm, castironpi <castiro...@gmail.comwrote:
Some time ago, I was asking about the feasibility of a persistent
deque, a double-ended queue.

It runs into the typical space allocation problems. *
Try starting with a dict-based implementation of a double-ended queue
( http://aspn.activestate.com/ASPN/Coo.../Recipe/259179 )
and then replace the dict with a shelf. Presto, you've got a
persistent deque.

Raymond
Jul 21 '08 #4
On Jul 21, 5:20*pm, Raymond Hettinger <pyt...@rcn.comwrote:
On Jul 21, 12:08*pm, castironpi <castiro...@gmail.comwrote:
Some time ago, I was asking about the feasibility of a persistent
deque, a double-ended queue.
It runs into the typical space allocation problems. *

Try starting with a dict-based implementation of a double-ended queue
(http://aspn.activestate.com/ASPN/Coo.../Recipe/259179)
and then replace the dict with a shelf. *Presto, you've got a
persistent deque.

Raymond
We have the cookbook recipe Raymond brings up, and an automatic
pickler that inhahe brought up in May.

I've initiated the approach to the solution I have in mind, but the
dilemma comes early and isn't pleasant. I can either (i) recompile
the whole interpreter, replacing malloc with a custom function that
allocates from disk; or (ii) duplicate every class I want to be
persistable, making every change to the allocations by hand.

Disk-resident objects can clearly not contain references to any
objects not on disk (arguably at finalization-stage only), and
references to ones that are need to be wrapped by offset, and 'made
live' at initialization-time, so that references to the mapped file
resume succeeding after termination and restart--- the mapped memory
segment won't be the same from run to run.

Further, if I can't hijack the existing arena-pool model native in
Python, it would take writing a dedicated memory manager.

Assuming the whole memory isn't moved to disk, which is feasible, the
usage would look like this, to ensure that 'disk-resident' objects
don't refer to 'live' ones.

listA= perst( [ perst( 0 ), perst( 1 ), perst( "abc" ) ] )

to create the list [ 0, 1, "abc" ]. Modifications, e.g. append "def",
would look like this:

listA.append( perst( "def" ) )

and would execute the addition directly to mapped memory. The memory
file would then have five objects: 0, 1, "abc", "def", and this list.

It is not clear that normal assignments would still work, self.b= c,
even for disk-resident c, since on a later run, 'self.b' would refer
to a different place in memory. It might take either trapping updates
to self.__dict__, or something deeper at the level of the ast and
Assm't statements. Or an extra level of indirection in accessing it.
>>self.b
<C instance at index 12 in mem-mapped file 'data.dat'>

The indirection would consist of an offset-finder and a delegation
layer.
>>self.b.append( perst( set( ) ) )
It would first have to look-up 'self.b', as opposed to accessing it
directly by memory address.

What I want from the group is to know if I've assessed the options for
a model of persistence accurately. Or if I haven't depicted what I'm
going for clearly. It seems enormous and if it is this hard, I doubt
I'll produce anything other than a few correct persistent types.
Jul 27 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
by: newsock | last post by:
What differences between queue, deque and priority_queue? And under what situations choose one of them to use? Thanks for your help!
7
by: Jenny | last post by:
Hi, I have a class foo which will construct some objects in my code. some of the objects store int values into the data deque, while others store float values to the deque. template <class...
9
by: R.Z. | last post by:
i was wondering whether it pays off in terms of memory use to maintain lots of empty deques (it would be convenient for my algorithms but memory use is more important). and does the size of a deque...
7
by: Dan Trowbridge | last post by:
He everyone, I am just getting started with .NET and I am having a porting problem. I get and error in code that lookssomething like this (really stripped down but you get the idea)... class...
5
by: Russell Warren | last post by:
Does anyone have an easier/faster/better way of popping from the middle of a deque than this? class mydeque(deque): def popmiddle(self, pos): self.rotate(-pos) ret = self.popleft()...
5
by: yancheng.cheok | last post by:
after reading http://www.codeproject.com/vcpp/stl/vector_vs_deque.asp, i realize that deque has its own speed advantage over vector in all the aspect (except for memory deallocation). does it...
1
by: Homeworkboy | last post by:
Can anyone help me with this program? Look at the bottom of this program for help methods: /*1. Make a program that uses numbers from 1 to 100 including each ends which puts the even...
15
by: Juha Nieminen | last post by:
I'm sure this is not a new idea, but I have never heard about it before. I'm wondering if this could work: Assume that you have a common base class and a bunch of classes derived from it, and...
8
by: castironpi | last post by:
I'd like a persistent deque, such that the instance operations all commit atomicly to file system.
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.