Builtin classes list, set, dict reimplemented via B-trees

barnesc

barnesc at engr.orst.edu wrote:
So my question is: are there any other *practical* applications of a
B-tree based list/set/dict ? In other words, is this module totally
worth coding, or is it just academic wankery and theoretical flim
flam ? :)
B-trees are specifically designed for disk storage. Seeking to a
page takes much longer than reading a page of several kilobytes.
B-trees read the keys for a many-way comparison in one seek.

For in-memory comparison-based dictionaries, Red-Black trees,
AVL trees, 2-3 trees, or skip-lists, are a better choice.
--
--Bryan

---------------------------------------------------------------------
Memory Usage
---------------------------------------------------------------------

Here overhead is compared to a C array of > 1 million PyObject *s.

The B-tree is assumed to have a branch factor of ~1024.

A B-tree has an average of 1.5x overhead (2x at worst, 1x at best,
using malloc). This assumes that B-tree nodes contain a fixed
size C array for storing values (which will be half-full on
average), and that the child array is only allocated for internal
nodes as needed.
Any balanced binary tree with left and right child pointers has
a 6x overhead (using malloc, measured via a gcc win32 test program),
or with a free list, a 3x overhead (though memory will not
be returned until the data structure's destructor is called).
I haven't tried using PyObject_Malloc(), but it should be in
between these two factors.
A skip list has n values and n next pointers on the bottom level.
In the ideal case (with respect to memory usage), we make the
probability sufficiently small so that the higher levels take up a
negligible amount of memory. Thus we have a 4x overhead (using
malloc, measured via a gcc win32 test program), or with a free list,
an overhead slightly greater than 2x (again, memory will only be
returned when the destructor is called). I didn't test
PyObject_Malloc(), but it should give an overhead between 2x and 4x.

Thus, on average, a > 1 million element B-tree uses 25% less memory
than any other balanced data structure which I am aware of, and 50%
more memory than a raw C array.
---------------------------------------------------------------------
Speed
---------------------------------------------------------------------

I have no idea how B-trees compare to skip lists (the likely
contender) in terms of speed.

To do this, one would need two high performance C implementations.
From this, the Python performance can presumably be deduced (although

caching causes unpredictable effects, the faster C algorithm should be
faster in Python).

You could start with optimized parallel C implementations, such as
the ones by John-Mark Gurney:

* http://resnet.uoregon.edu/~gurney_j/jmpc/btree.html
* http://resnet.uoregon.edu/~gurney_j/jmpc/skiplist.html

He indicates that the skip list is slower than the B-tree, but doesn't
give any solid numbers. Of course you'd have to check his code and
optimize it to death (Perhaps representing a sorted set of integers
using each respective data structure would be a good test).

Also note that my B-tree code currently has optimized special cases
for block insertions, deletions, and retrievals, since this is easy
(er, well, relatively speaking) to do in the B-tree case. In the skip
list case, block retrievals are easy (though not as fast as B-tree
block retrievals due to pointer indirection and cache problems). I
have no idea how to do a fast block insert or delete on a skip list.

----------------------------------------------------------------------

Well, enough ivory tower silliness and academic blustering.

<on_topic>

Are there any *practical* applications for in-memory balanced data
structures (e.g. skip list, AVL tree, RB tree, B tree) ?

</on_topic>

- Connelly Barnes

Sep 14 '05 #1

Subscribe Post Reply

2923

Szabolcs Nagy

IMO sorted dict implementation can be useful, eg. one can get an
interval:
L = D['A' : 'K']

other useful data types:
linkedlist
queue, stack (well deque can do it efficiently in py 2.4)
prioritydict (for graph algorithms)
multimap, multiset (i've never used it but it's in the c++ stl)
mutable string (kind of list/array of chars, but with string functions)

nsz

Sep 14 '05 #2

Bryan Olson

ba*****@engr.orst.edu wrote:

Here overhead is compared to a C array of > 1 million PyObject *s.

Thus, on average, a > 1 million element B-tree uses 25% less memory
than any other balanced data structure which I am aware of, and 50%
more memory than a raw C array.
That's overhead of indexing; it doesn't consider the space
already used to store the keys and values. The B-tree can get by
with modestly fewer pointers, because it has fewer internal
nodes that need to be referenced by other internal pointers.
That assumes that the B-tree nodes are kept as linear arrays,
which means that either inserting into them will time
proportional to the fan-out, or searching them will.

[...] I have no idea how B-trees compare to skip lists (the likely
contender) in terms of speed.
I'd expect Red-Black trees to be at least as good a contender.
Are there any *practical* applications for in-memory balanced data
structures (e.g. skip list, AVL tree, RB tree, B tree) ?

Yes, absolutely. Efficient ordered sub-ranges; efficient rank
queries; sustainable performance with adversarial inputs.
--
--Bryan

Sep 14 '05 #3

Similar topics

The namespace for builtin functions?

by: Blair Hall | last post by:

Can anyone please tell me how to correctly use a built in function when there is a function of the same name in local scope? Here is an example. Suppose the following is in myApply.py: def...

Python

error converting list to tuple

by: Stefan Seefeld | last post by:

hi there, I'm trying to convert a tuple to a list, and get a 'TypeError: list objects are unhashable'. Can anybody enlighten me as to the possible causes for this ? Where does hashing come...

Python

empty classes as c structs?

by: Christopher J. Bottaro | last post by:

I find myself doing the following very often: class Struct: pass .... blah = Struct() blah.some_field = x blah.other_field = y ....

Python

New-style classes, iter and PySequence_Check

by: Tuure Laurinolli | last post by:

Someone pasted the original version of the following code snippet on #python today. I started investigating why the new-style class didn't work as expected, and found that at least some instances...

Python

Why is there no instancemethod builtin?

by: John Reese | last post by:

Why hello there ha ha. I have got in the habit of testing the types of variables with isinstance and the builtin type names instead of using the types module, as was the style back around Python...

Python

Automatic methods in new-style classes

by: bertgoos | last post by:

Hey, I have the following code that has to send every command it receives to a list of backends. Instead of: class MultiBackend(object): """Renders to multiple backends""" def __init__(self,...

Python

module wide metaclass for new style classes

by: Daniel Nogradi | last post by:

I used to have the following code to collect all (old style) class names defined in the current module to a list called reg: def meta( reg ): def _meta( name, bases, dictionary ): reg.append(...

Python

builtin set literal

by: =?ISO-8859-1?Q?Sch=FCle_Daniel?= | last post by:

Hello, lst = list((1,2,3)) lst = t = tupel((1,2,3)) t = (1,2,3) s = set((1,2,3)) s = ...

Python

searching a value of a dict (each value is a list)

by: Seongsu Lee | last post by:

Hi, I have a dictionary with million keys. Each value in the dictionary has a list with up to thousand integers. Follow is a simple example with 5 keys. dict = {1: , 2: , 900000: , 900001:...

Python

list classes in package

by: Dmitry | last post by:

Hi All, I've trying to develop one Python application, and neet to solve one problem. I need to list all classes defined in one package (not module!). Could anybody please show me more...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing