473,625 Members | 2,687 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Builtin classes list, set, dict reimplemented via B-trees

barnesc at engr.orst.edu wrote:
So my question is: are there any other *practical* applications of a
B-tree based list/set/dict ? In other words, is this module totally
worth coding, or is it just academic wankery and theoretical flim
flam ? :)
B-trees are specifically designed for disk storage. Seeking to a
page takes much longer than reading a page of several kilobytes.
B-trees read the keys for a many-way comparison in one seek.

For in-memory comparison-based dictionaries, Red-Black trees,
AVL trees, 2-3 trees, or skip-lists, are a better choice.
--
--Bryan


---------------------------------------------------------------------
Memory Usage
---------------------------------------------------------------------

Here overhead is compared to a C array of > 1 million PyObject *s.

The B-tree is assumed to have a branch factor of ~1024.

A B-tree has an average of 1.5x overhead (2x at worst, 1x at best,
using malloc). This assumes that B-tree nodes contain a fixed
size C array for storing values (which will be half-full on
average), and that the child array is only allocated for internal
nodes as needed.
Any balanced binary tree with left and right child pointers has
a 6x overhead (using malloc, measured via a gcc win32 test program),
or with a free list, a 3x overhead (though memory will not
be returned until the data structure's destructor is called).
I haven't tried using PyObject_Malloc (), but it should be in
between these two factors.
A skip list has n values and n next pointers on the bottom level.
In the ideal case (with respect to memory usage), we make the
probability sufficiently small so that the higher levels take up a
negligible amount of memory. Thus we have a 4x overhead (using
malloc, measured via a gcc win32 test program), or with a free list,
an overhead slightly greater than 2x (again, memory will only be
returned when the destructor is called). I didn't test
PyObject_Malloc (), but it should give an overhead between 2x and 4x.

Thus, on average, a > 1 million element B-tree uses 25% less memory
than any other balanced data structure which I am aware of, and 50%
more memory than a raw C array.
---------------------------------------------------------------------
Speed
---------------------------------------------------------------------

I have no idea how B-trees compare to skip lists (the likely
contender) in terms of speed.

To do this, one would need two high performance C implementations .
From this, the Python performance can presumably be deduced (although

caching causes unpredictable effects, the faster C algorithm should be
faster in Python).

You could start with optimized parallel C implementations , such as
the ones by John-Mark Gurney:

* http://resnet.uoregon.edu/~gurney_j/jmpc/btree.html
* http://resnet.uoregon.edu/~gurney_j/jmpc/skiplist.html

He indicates that the skip list is slower than the B-tree, but doesn't
give any solid numbers. Of course you'd have to check his code and
optimize it to death (Perhaps representing a sorted set of integers
using each respective data structure would be a good test).

Also note that my B-tree code currently has optimized special cases
for block insertions, deletions, and retrievals, since this is easy
(er, well, relatively speaking) to do in the B-tree case. In the skip
list case, block retrievals are easy (though not as fast as B-tree
block retrievals due to pointer indirection and cache problems). I
have no idea how to do a fast block insert or delete on a skip list.

----------------------------------------------------------------------

Well, enough ivory tower silliness and academic blustering.

<on_topic>

Are there any *practical* applications for in-memory balanced data
structures (e.g. skip list, AVL tree, RB tree, B tree) ?

</on_topic>

- Connelly Barnes
Sep 14 '05 #1
2 2984
IMO sorted dict implementation can be useful, eg. one can get an
interval:
L = D['A' : 'K']

other useful data types:
linkedlist
queue, stack (well deque can do it efficiently in py 2.4)
prioritydict (for graph algorithms)
multimap, multiset (i've never used it but it's in the c++ stl)
mutable string (kind of list/array of chars, but with string functions)

nsz

Sep 14 '05 #2
ba*****@engr.or st.edu wrote:
Here overhead is compared to a C array of > 1 million PyObject *s.

Thus, on average, a > 1 million element B-tree uses 25% less memory
than any other balanced data structure which I am aware of, and 50%
more memory than a raw C array.
That's overhead of indexing; it doesn't consider the space
already used to store the keys and values. The B-tree can get by
with modestly fewer pointers, because it has fewer internal
nodes that need to be referenced by other internal pointers.
That assumes that the B-tree nodes are kept as linear arrays,
which means that either inserting into them will time
proportional to the fan-out, or searching them will.

[...] I have no idea how B-trees compare to skip lists (the likely
contender) in terms of speed.
I'd expect Red-Black trees to be at least as good a contender.
Are there any *practical* applications for in-memory balanced data
structures (e.g. skip list, AVL tree, RB tree, B tree) ?


Yes, absolutely. Efficient ordered sub-ranges; efficient rank
queries; sustainable performance with adversarial inputs.
--
--Bryan
Sep 14 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
1960
by: Blair Hall | last post by:
Can anyone please tell me how to correctly use a built in function when there is a function of the same name in local scope? Here is an example. Suppose the following is in myApply.py: def apply(func,seq): # # Code can default to # built-in definition in some cases: return __builtins__.apply(func,seq)
10
4742
by: Stefan Seefeld | last post by:
hi there, I'm trying to convert a tuple to a list, and get a 'TypeError: list objects are unhashable'. Can anybody enlighten me as to the possible causes for this ? Where does hashing come into play during this conversion ? Could it be that my runtime is corrupted ?
49
2797
by: Christopher J. Bottaro | last post by:
I find myself doing the following very often: class Struct: pass .... blah = Struct() blah.some_field = x blah.other_field = y ....
4
2073
by: Tuure Laurinolli | last post by:
Someone pasted the original version of the following code snippet on #python today. I started investigating why the new-style class didn't work as expected, and found that at least some instances of new-style classes apparently don't return true for PyInstance_Check, which causes a problem in PySequence_Check, since it will only do an attribute lookup for instances. Things probably shouldn't be this way. Should I go to python-dev with...
7
2120
by: John Reese | last post by:
Why hello there ha ha. I have got in the habit of testing the types of variables with isinstance and the builtin type names instead of using the types module, as was the style back around Python 2.1. That is, rather than if type(x) == types.ListType: I now do:
6
1281
by: bertgoos | last post by:
Hey, I have the following code that has to send every command it receives to a list of backends. Instead of: class MultiBackend(object): """Renders to multiple backends""" def __init__(self, backends): self.backends = backends def flush(self):
3
1496
by: Daniel Nogradi | last post by:
I used to have the following code to collect all (old style) class names defined in the current module to a list called reg: def meta( reg ): def _meta( name, bases, dictionary ): reg.append( name ) return _meta reg =
23
2001
by: =?ISO-8859-1?Q?Sch=FCle_Daniel?= | last post by:
Hello, lst = list((1,2,3)) lst = t = tupel((1,2,3)) t = (1,2,3) s = set((1,2,3)) s = ...
20
2410
by: Seongsu Lee | last post by:
Hi, I have a dictionary with million keys. Each value in the dictionary has a list with up to thousand integers. Follow is a simple example with 5 keys. dict = {1: , 2: , 900000: , 900001: ,
2
5521
by: Dmitry | last post by:
Hi All, I've trying to develop one Python application, and neet to solve one problem. I need to list all classes defined in one package (not module!). Could anybody please show me more convinient (correct) way to implement this? Thanks,
0
8251
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
1
8352
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8494
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7178
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6115
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4085
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2614
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1800
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1496
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.