473,573 Members | 2,924 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

OLAP Proposal for MySQL

Hi all,

Please tell me if any of this makes sense. Any pointers to relevant
projects/articles will be much appreciated.

Philip Stoev

=============== =============== =====


The goal is to create an OLAP engine coupled with a presentation layer that
will be easy enough for normal people to use, with no MDX experience
required. While it is probably a fact that Wal-Mart has 70 GB of data, this
does not mean that all people have such data sets, so the goal is reasonable
performance for reasonably-sized datasets. Most people do not join 30 tables
together either. Also, it is pre-supposed that Wal-Mart engage in
extra-complex calculations to determine business strategies, most people are
often content to know "How much I sold yesterday".


The OLAP "engine" takes a standard SQL query with GROUP BY statements and
aggregate functions, executes it, and saves the entire resulting dataset in
the cache. A cache index entry is then created, noting what the source
tables, the GROUP_BY columns, the aggregate functions and the WHERE
conditions that were used.

Upon execution of further queries, the OLAP engine checks the cache whether
there is a cached dataset that can be used to answer the query immediately.
This would include any of the following:

1. The query's GROUP BY columns are equal or a sub-set of the cached query.
So, a query like:
SELECT salesman, state, SUM(sales) FROM company.sales GROUP BY
salesman, state
provides the answer for
SELECT salesman, SUM(sales) FROM company.sales GROUP BY salesman

2. The query's WHERE clause is equal or more restrictive to the WHERE clause
of a cached query, and contains columns that were GROUP BY-ed.
A query like:
SELECT date, salesman, SUM(sales) FROM company.sales GROUP BY
date, salesman WHERE date > '2003-01-01'
provides the answer for:
SELECT date, salesman, SUM(sales) FROM company.sales GROUP BY
date, salesman WHERE date > '2003-01-01' AND date > '2003-06-01'
Obviously, a human will not write a query with such a WHERE statement,
however a graphical Pivot tool may be explicitly designed to create such a
query when drilling-down so that a cache hit is scored.

3. The query's source tables are equal or a sub-set of the cached query's
source tables.
So, the query:
SELECT salesman, gender, SUM(sales) FROM company.sales INNER JOIN salesman
USING (salesman_id) GROUP BY salesman, gender
or even something very complex with 10 joined tables, can be used to answer:
SELECT salesman, SUM(sales) FROM company.sales GROUP BY salesman
or even something even more complex with 5 joined tables

4. The query's aggregate functions are equal of a sub-set of the cached
query's. Certain aggregate functions may not be cached like COUNT(DISTINCT) ,
and others require special care (AVERAGE(value) must be translated to

The benefits of such a cache implementation is that is it data-independent.
You do not have to describe your data prior to executing your queries. It
also does not rely on creating your own cache structure and your own cache
index - a few tables can be used to hold the cache index and can be then
queried by SQL themselves to determine a hit.

If an interactive Pivoting tool is executing those queries, the cache should
(hopefully) soon fill with entries that allow most, if not all, of the
queries resulting from interactive browsing to be served from the cache.
Additionally, the tool can apply for pre-fetching of relevant data by
drilling down a bit more than the user has requested, resulting in a cache
hit when the user indeed drills deeper. Also, the tool does not have to
cache data to sort it on its own, since queries that differ only in their
SORT BY are cached. An additional enhancement would be the ability to serve
a hit from the cache using more than one cached table.


A. No cache hit, so we just populate the cache
Initial query:
SELECT salesman, state, COUNT(*) FROM sales GROUP BY salesman,
The server does:
CREATE TABLE 1234567 SELECT salesman, COUNT(*) FROM sales GROUP
BY salesman, state
SELECT * FROM 1234567

B. A cache hit
Initial query:
SELECT state, COUNT(*) FROM sales GROUP BY state
The server does:
SELECT state, SUM(`COUNT(*)`) AS `COUNT(*)` FROM 1234567 GROUP
BY state
[`COUNT(*)` being a valid column name for table 1234567]


1. In my humble opinion, people do not think in MDX. Instead, they think in
terms of GROUP BY. So, for most uses, it should be sufficient to allow the
user to construct his own GROUP BY statement and specify the aggregate
functions that he is interested in, rather than asking him to create a cube,
an axis, a view, a measure, etc, etc.

2. People also think in terms of everyday phrases, like "last 7 days" or
"all Mondays". A pre-compiled dictionary of such phrases will be immensely
useful, as well as the ability to specify such phrases. People also like to
be able to do "call duration in 5-minite intervals", which is not available
in Microsoft Excel when working with columns of type "time".

3. Normal people do not expect all of their columns to be available for
analysis, and they do not want their report to have either 2 or 2000 rows.

For example, if you have a date column and you do a Microsoft Excel
PivotTable, you will first have to select that column from a list that
contains bunch of other fields, then wait for the table to be generated with
a row for each date, and then you group or sort the dates somehow to arrive
to the numbers that interest you. Other tools (at least in their example
scenarios) facing a date column will start with the data grouped by year,
and you then have to expand to month (the months often being shown as
numbers), and from there on to weeks and days, and table has to refresh and
recalculate a dozen times for your convenience.

Instead, a person should have a list of phrases that we can use as rows and
columns, like "last 7 days per day", "all months since January by week",
etc. She will then be able to arrive precisely to the data that she wants to
see. Only one SQL query will be required.

4. Data is not always perfect

If you store your data as 1 and 0, and your boss wants to see "yes" and "no"
, this should be possible. If sales > $5000 means a pro salesman, then the
user does not have to display the row sales number in a column, and then
group on figures below $5000 and figures above $5000, and then separately
calculate the salesmen that are too recently hired to be able to score.
Months and days of week have names. Times of the day may be morning,
afternoon and evening, not (0..24:0..59:0. 59). Times that are messed up due
to time zones can be adjusted on the fly without jeopardizing the work of
company software that relates on data being messed up.


A mod_perl GUI is envisioned that will allow you view and rotate your data
as you see fit. In particular, the following goals have been set:
1. Fully bookmarkable URLs that people can mail around to others
so that they too can see the same report;
2. Usage of phrases described in Section II to make access to
the most relevant portions of the report easier;
3. Sorting, drilling up and down, expanding, contracting,
hiding, showing, axis-swapping, grouping and ungrouping, coloring, etc.,
4. Tabs instead of drop-down lists, e.g. a tab for January, a
tab for February, etc.
5. Access control, full logging, etc. etc.;
6. Speed, speed, speed. Anything that is slower than Microsoft
Excel for comparable datasets should be optimized. Data may be queried (and
retrieved) in portions to provide concurrency and instant feedback to user.
For example, if we have a table keyed by date, we can always retrieve
January, show it to the user, and then proceed to retrieve the other months
and keep displaying them as they arrive (which, as a side effect, may cause
other queries to slip in between, providing faster performance for everyone
at least perceptually). Any queries that are known to run long (based on
timing previous invocations), should have a progress bar.
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/my***********...ie.nctu.edu.tw

Jul 19 '05 #1
0 4684

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

by: Will | last post by:
On the subject of Data Warehouses, Data Cubes & OLAP…. I would like to speak frankly about Data Warehouses, Data Cubes and OLAP (on-line analytical processing). Has it dawned on anyone else that these buzz words were created by some geek who decided to take a stab at marketing? Knowing that to the backwoods manager who knows little of...
by: DD | last post by:
Hi Guys! Just would like to share with you my experiense in this matter. I was trying to evaluate how suitable Oracle OLAP for our applications. As probably you did, I have downloaded from OTN website OraJDeveloper and BI Beans, installed them and opened Cube Viewer ... no luck! Just did not work. After very long conversation(s) with...
by: Philip Stoev | last post by:
Hi all, Please tell me if any of this makes sense. Any pointers to relevant projects/articles will be much appreciated. Philip Stoev http://www.stoev.org/pivot/manifest.htm ===================================
by: Framework fan | last post by:
Hello, If I wrote the next ebay (yes I know, yawn-snore) and I had a database with 5 million auction items in it, what would be a really good strategy to get a search done very quickly? Would it involve something called OLAP and/or "data mining"? The only technology I am familiar with is simply SQL Server databases with stored procedures....
by: Gadi Refaeli | last post by:
Hello All, We are currently considering DB purchases for a new system, we are looking at Cognos, Oracle, DB2 and Essbase. We came across some questions regarding Essbase and DB2 OLAP. 1. Are DB2 OLAP and Essbase are identical? 2. Does the standard Essbase SDK is provided with DB2 OLAP? 3. Does EIS is provided with DB2 OLAP? 4. Is there...
by: Eduardo Quiroz Salinas | last post by:
do someone knows where can i get a good tutorial or how to make OLAP cubes.???? thanx a lot -- Linux user number 344659 "...Los que no requieren de un dios para ser virtuosos, son la desesperacion de los creyentes..."
by: universalbitmapper | last post by:
This time I've really met the ultimate experience in my computing life. I've downloaded and installed Micro Olap dba center. It has actually wiped out all my MySQL databases in Wamp 5 Just after I've done a complete backup (wooshshsh...) It's not the first time I test wacko software (previous was Zend Studio), but few that was close.
by: George Sakkis | last post by:
After a brief search, I didn't find any python package related to OLAP and pivot tables. Did I miss anything ? To be more precise, I'm not so interested in a full-blown OLAP server with an RDBMS backend, but rather a pythonic API for constructing datacubes in memory, slicing and dicing them, drilling down or up dimensions and exposing them in...
by: YellowFin Announcements | last post by:
Yellowfin Reporting Announces Release 3 OLAP Connectivity New Features Including OLAP-to-Relational Drill Through Provide Customers with One Complete Web BI Tool for OLAP Analysis Yellowfin, today announced the availability of Yellowfin Release 3, the newest version of the leading query, reporting, and analysis tool for the web. Release 3...
by: YellowFin | last post by:
Yellowfin Announces Release 3 OLAP Connectivity New Features Including OLAP-to-Relational Drill Through Provide Customers with One Complete Web BI Tool for OLAP Analysis Yellowfin, today announced the availability of Yellowfin Release 3, the newest version of the leading query, reporting, and analysis tool for the web. Release 3 provides...
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.