469,330 Members | 1,379 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,330 developers. It's quick & easy.

OLAP and pivot tables

After a brief search, I didn't find any python package related to OLAP
and pivot tables. Did I miss anything ? To be more precise, I'm not so
interested in a full-blown OLAP server with an RDBMS backend, but
rather a pythonic API for constructing datacubes in memory, slicing and
dicing them, drilling down or up dimensions and exposing them in some
suitable form to a presentation layer. I've hacked a first cut of a
pivot table implementation and an XHTML generator that produces
hierarchical html tables but it's not particularly general or easily
extensible so far. Is there any interest at all on a pythonic version
of something like JOLAP or XMLA ?

George

May 26 '06 #1
3 5518
George Sakkis wrote:
After a brief search, I didn't find any python package related to OLAP
and pivot tables. Did I miss anything ? To be more precise, I'm not so
interested in a full-blown OLAP server with an RDBMS backend, but
rather a pythonic API for constructing datacubes in memory, slicing and
dicing them, drilling down or up dimensions and exposing them in some
suitable form to a presentation layer. I've hacked a first cut of a
pivot table implementation and an XHTML generator that produces
hierarchical html tables but it's not particularly general or easily
extensible so far. Is there any interest at all on a pythonic version
of something like JOLAP or XMLA ?

George

I'd be interested as well. I posted a similar question to the ruby
mailing list a few months ago to no avail. Ideally, someone much more
talented than myself would create a open OLAP library in C that could be
interfaced with dynamic languages easily (I ordered some OLAP books and
started in on this, and decided I was in over my head for now). As far
as free software, all I've been able to find is java-based Mondrian.
Maybe it could serve as a reference implementation for someone.

Cheers,
Ben
May 26 '06 #2
George Sakkis wrote:
After a brief search, I didn't find any python package related to OLAP
and pivot tables. Did I miss anything ? To be more precise, I'm not so
interested in a full-blown OLAP server with an RDBMS backend, but
rather a pythonic API for constructing datacubes in memory, slicing and
dicing them, drilling down or up dimensions and exposing them in some
suitable form to a presentation layer. I've hacked a first cut of a
pivot table implementation and an XHTML generator that produces
hierarchical html tables but it's not particularly general or easily
extensible so far. Is there any interest at all on a pythonic version
of something like JOLAP or XMLA ?

George


I have a few applications that require the generation of large numbers
of contingency tables from a higher-dimensional base table. The
approaches I've tried (Numeric arrays / dictionary-based sparse arrays /
various caching schemes / searches on subset lattices for previously
generated 'super'-tables that can be marginalised from etc.) still
represent major bottlenecks. So, I guess I would be interested.

Duncan
May 26 '06 #3
Ben Stroud wrote:
George Sakkis wrote:
After a brief search, I didn't find any python package related to OLAP
and pivot tables. Did I miss anything ? To be more precise, I'm not so
interested in a full-blown OLAP server with an RDBMS backend, but
rather a pythonic API for constructing datacubes in memory, slicing and
dicing them, drilling down or up dimensions and exposing them in some
suitable form to a presentation layer. I've hacked a first cut of a
pivot table implementation and an XHTML generator that produces
hierarchical html tables but it's not particularly general or easily
extensible so far. Is there any interest at all on a pythonic version
of something like JOLAP or XMLA ?
I'd be interested as well. I posted a similar question to the ruby
mailing list a few months ago to no avail. Ideally, someone much more
talented than myself would create a open OLAP library in C that could be
interfaced with dynamic languages easily (I ordered some OLAP books and
started in on this, and decided I was in over my head for now). As far
as free software, all I've been able to find is java-based Mondrian.
Maybe it could serve as a reference implementation for someone.


The NetEpi Analysis project - see http://sourceforge.net/projects/netepi
, although not strictly an OLAP or datacube engine, might offer some of
the things you are looking for. It is intended for exploratory
epidemiological analysis of (potentially large) health-related datasets,
but should work with most types of data for which an OLAP engine would
be useful. Underneath there is a vertically-disaggregated,
ordinally-mapped, set-theoretic data selection and summarisation engine,
which is a pompous way of saying that it holds data column-wise in
memory-mapped Numpy (Numeric Python) arrays, and uses some fast
(custom-written) set functions on inverted indexes on the ordinal
positions of column values to select and summarise data (entirely at
run-time, cf most OLAP engines, which rely on a degree of
pre-summarisation along pre-chosen dimensions). It is all Python and
thus has a Python(ic) API, including an SQL-like WHERE clause parser
for data selection (OK, SQL is not Pythonic, but that's just for data
subsetting). It includes quite a few statistical functions and nice
graphics courtesy of R (http://www.r-project.org) (which is embedded via
RPy - http://rpy.sourceforge.net/). Full support for missing values and
weighted datasets is provided (but not full support for survey data with
complex sample designs - that's forthcoming). Currently it works well
with datasets in the 5-10 million row range, but the basic design lends
itself easily to parallelisation if you have bigger datasets, and
preliminary work indicates good speed improvements - something we want
to pursue given all these multi-core CPUs which are now available at
reasonable cost. Be warned that NetEpi Analysis is currently only of
beta quality, and is a bit of a pig to install, on Linux/Unix/Mac OS X
only at present. We hope to be able to ready a production-ready Version
1.0 by the end of 2006, possibly with MS-Windows support as well.
However, the core data summarisation/subsetting engine is thought to be
sound (and there are some unit tests to attest to that).

Probably not quite what you were after but I thought it worth a mention.
Please post follow-ups, if any, to the NetEpi mailing list:
http://sourceforge.net/mail/?group_id=123700

Tim C



Cheers,
Ben


May 26 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Philip Stoev | last post: by
2 posts views Thread by Rob | last post: by
3 posts views Thread by nikila | last post: by
9 posts views Thread by PeteCresswell | last post: by
3 posts views Thread by Thyag | last post: by
reply views Thread by Purva khokhar | last post: by
reply views Thread by haryvincent176 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.