By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,395 Members | 2,513 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,395 IT Pros & Developers. It's quick & easy.

On the subject of Data Warehouses, Data Cubes & OLAP….

P: n/a
On the subject of Data Warehouses, Data Cubes & OLAP….

I would like to speak frankly about Data Warehouses, Data Cubes and
OLAP (on-line analytical processing). Has it dawned on anyone else
that these buzz words were created by some geek who decided to take a
stab at marketing? Knowing that to the backwoods manager who knows
little of technology that new innovative names for old concepts would
help to sale their products.

I mean seriously, what is the story here? In a nut shell, and please
stop me if you disagree, but isn’t a data warehouse simply a
database? Can’t you do everything on a conventional database
like SQL Server, Oracle or DB2 that you can do on these new
proprietary Data Warehouse constructs? I mean who are they trying to
fool?

Take a look, for instance, at Data Cubes. Who hasn’t noticed
the striking similarity between data cubes and views used in all the
more robust databases? Also, what about OLAP? OLAP is nothing more
than a report generator. There’s nothing you can do with these
million dollar price tagged Data Warehouse total solution packages
that I can’t do with SQL Server, Oracle or DB2…for that
matter Microsoft Access.

As an example some sales people for Metadata Corporation has the Vice
President of I.T. in Nashville, for Healthspring, sold on their total
solution data respository which is such a scam. All they had to do
was throw a couple of buzzwords at him and they have him hypnotized.

Personally, I feel that these kinds of marketing practices undermine
our industry. It helps to unravel what little standards or
consistency we have. What do you guys think?

Stuart
Jul 20 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Hi,
it is true one could do most of the data warehousing/minning in a
regular relational database, indeed some of the products that do data
analysis are utilizing relational DB backends. However there is a lot
more to OLAP then is in relational DB.

First there is the volume of data this is why you need the cubes.
Consider you need to create reports on the performance of an
enterprise - the data amounts ~150 Mb per hour. So you periodically
sum it in different dimensions georgraphical, time etc. (this is done
in a persistent manner, unlike views in relational DB's, so you have
this data available immediately not after an hour of disk scratching).

Next the reports true these are simple things at first sight, but the
maths (statistics, complex polinomials) usually embedded in OLAP
products are impressive.

So true OLAP looks much like traditional DB, but doing OLAP with
regular DB is a lot of head banging. Without the tools - If you get
the maths right probably most of the things will be so slow it will
take a lifetime to see results or the other way around.
Jul 20 '05 #2

P: n/a
"Will" <st**********@synovusmortgage.com> wrote in message

Personally, I feel that these kinds of marketing practices undermine
our industry. It helps to unravel what little standards or
consistency we have. What do you guys think?


We can certainly jump out and say that "data warehousing" is some cool
marketing hype. However, one could also say the same thing about using a
relational database. Fact is, many of the concepts of relational database
and the concepts of data normalizing are extremely important.

The same goes for data warehousing. For example, you might want to do some
tracking on some products sold. However, you can't always put millions and
millions of records in to a huge database and then simply start running
quires on that huge dataset. It is going to be too large to work with, and
too many records need to be processed to get your information in a quick
manor. So, for example to track product sales, you might SUMMARIZE the data
into a data cube. Your 3rd dimension might only contain the last 2, or 3
months worth of transactions (sales over time). Thus, there is LOT of work
and decision making as to how much detail data you will put into the data
warehouse. Further, often much of the data you put into the data warehouse
is SUMMARY data (ie: totals of sales of product over given amount of time).
A typical complex system will often easily have 40, or 50 tables of related
data. When you move the results to a data warehouse, you are de-nominating
the data into summary tables. You may not even store which customer bought
which product, but only stuff like sales per day, and perhaps by region.
Often those resulting tables will not need a whole bunch of joined sql data
tables to get your results (they will HAVE BEEN ALREADY summarised and
totalled BEFORE you start asking questions about the data).

Only the analyst who has a good deal of experience on the kind of questions
(queries) to asked will really have a good idea as to how much data will go
into the warehouse. As mentioned, you often can't even begin to put every
single transaction into that data warehouse. A lot of planning will have to
go into how much drill down of details you want to keep.

So, while there might be buzz words like a "data warehouse", it is certainly
not just marketing hype but is result that we can't always query the live
production database fast enough to get the kinds of answers we need on a
daily bases (ie: just how much should you spend on radio and tv advertising
this week?).
--
Albert D. Kallal
Edmonton, Alberta Canada
ka****@msn.com
http://www.attcanada.net/~kallal.msn

Jul 20 '05 #3

P: n/a
> So true OLAP looks much like traditional DB, but doing OLAP with
regular DB is a lot of head banging. Without the tools - If you get
the maths right probably most of the things will be so slow it will
take a lifetime to see results or the other way around.


It is my experience OLAP is nothing more than a report generator using
views. Anyone wants to argue with me? I've done everything you have
mentioned with a great deal of success using SQL Server and Crystal
Reports. Having created all the input to the reports using views and
unions I've made some damn impressive reports. The views function
identical to cubes. You might argue that this consumes resources
because they are generated on the fly and could hamper traffic, but
I've never experienced this. The perfect DW model in my humble
opinion is snowflake/normalized on a SQL Server (robust) box.
Attention to detail and elegant views yield a most powerful EIS.
Arguments?

The statistical calculations are easy to put together. More
importantly...IT'S THE MOST FUN PART!!! CAN I GET AN AMEN?

While I'm on my soapbox has anyone noticed the conceptual similarities
of star/snowflake schema models and that of non-realational/relational
or unmormalized/normalized schema models respectively??? GIVE ME A
BREAK! Why does anyone CHOOSE to create new names for old concepts?
I think they should be drug out into the street and shot! I CAN'T
HEAR YOU CONGREGATION!

Stu
Jul 20 '05 #4

P: n/a
An arguement, in general, on Stuart's side. Here we keep talking
about millions and millions of records. I work with a small
manufacturing enterprise with annual revenue of about 50 million, they
actually use Access, data retrieval is slowish, imho, not because of
the volumne of data (about 65 MB), but because of unplanned db design.
After exporting the Access db to SQL Server, and normalized a bit,
man, data retrieval is almost light speed. They continue to use
existing Access programs like FORMS etc. while getting whatever data
they want from the SQL Server quickly and easily with a tool I
developed. Down the road they may dump the Access db, but that would
be later.
My point is, for large enterprises like Fortune 1000 something, the
conventional DW etc. may be of value to them if the implementation is
successful (I heard of a high rate of DW implementation failure)
however that may not be applicable to SMEs (small to medium sized
enterprises).
DL
http://www.hegelsoftware.com
"Albert D. Kallal" <NO************@msn.com> wrote in message news:<gW1jb.98779$9l5.63320@pd7tw2no>...
"Will" <st**********@synovusmortgage.com> wrote in message

Personally, I feel that these kinds of marketing practices undermine
our industry. It helps to unravel what little standards or
consistency we have. What do you guys think?


We can certainly jump out and say that "data warehousing" is some cool
marketing hype. However, one could also say the same thing about using a
relational database. Fact is, many of the concepts of relational database
and the concepts of data normalizing are extremely important.

The same goes for data warehousing. For example, you might want to do some
tracking on some products sold. However, you can't always put millions and
millions of records in to a huge database and then simply start running
quires on that huge dataset. It is going to be too large to work with, and
too many records need to be processed to get your information in a quick
manor. So, for example to track product sales, you might SUMMARIZE the data
into a data cube. Your 3rd dimension might only contain the last 2, or 3
months worth of transactions (sales over time). Thus, there is LOT of work
and decision making as to how much detail data you will put into the data
warehouse. Further, often much of the data you put into the data warehouse
is SUMMARY data (ie: totals of sales of product over given amount of time).
A typical complex system will often easily have 40, or 50 tables of related
data. When you move the results to a data warehouse, you are de-nominating
the data into summary tables. You may not even store which customer bought
which product, but only stuff like sales per day, and perhaps by region.
Often those resulting tables will not need a whole bunch of joined sql data
tables to get your results (they will HAVE BEEN ALREADY summarised and
totalled BEFORE you start asking questions about the data).

Only the analyst who has a good deal of experience on the kind of questions
(queries) to asked will really have a good idea as to how much data will go
into the warehouse. As mentioned, you often can't even begin to put every
single transaction into that data warehouse. A lot of planning will have to
go into how much drill down of details you want to keep.

So, while there might be buzz words like a "data warehouse", it is certainly
not just marketing hype but is result that we can't always query the live
production database fast enough to get the kinds of answers we need on a
daily bases (ie: just how much should you spend on radio and tv advertising
this week?).

Jul 20 '05 #5

P: n/a
I would argue that 65mb on Access shouldn't be overwhelming. Though
Access on a single processor PC is never going to perform. SQL Server
on a "built" box compared to Access is like comparing a Fierrari to a
Ford Focus.

SQL Server is truly ideal. Sad it's a Microsoft product. Let me tell
you that when you can tie in security to the DB through your local
network security software ("NT") then why would you go with anyother
product? If Oracle came out with their own proprietary network that
would meld with the security of their db or vise versa then Oracle
would be the ONLY solution! Oracle, can you hear me now?

Stu
qw********@boxfrog.com (Doug Baroter) wrote in message news:<fc**************************@posting.google. com>...
An arguement, in general, on Stuart's side. Here we keep talking
about millions and millions of records. I work with a small
manufacturing enterprise with annual revenue of about 50 million, they
actually use Access, data retrieval is slowish, imho, not because of
the volumne of data (about 65 MB), but because of unplanned db design.
After exporting the Access db to SQL Server, and normalized a bit,
man, data retrieval is almost light speed. They continue to use
existing Access programs like FORMS etc. while getting whatever data
they want from the SQL Server quickly and easily with a tool I
developed. Down the road they may dump the Access db, but that would
be later.
My point is, for large enterprises like Fortune 1000 something, the
conventional DW etc. may be of value to them if the implementation is
successful (I heard of a high rate of DW implementation failure)
however that may not be applicable to SMEs (small to medium sized
enterprises).
DL
http://www.hegelsoftware.com
"Albert D. Kallal" <NO************@msn.com> wrote in message news:<gW1jb.98779$9l5.63320@pd7tw2no>...
"Will" <st**********@synovusmortgage.com> wrote in message

Personally, I feel that these kinds of marketing practices undermine
our industry. It helps to unravel what little standards or
consistency we have. What do you guys think?


We can certainly jump out and say that "data warehousing" is some cool
marketing hype. However, one could also say the same thing about using a
relational database. Fact is, many of the concepts of relational database
and the concepts of data normalizing are extremely important.

The same goes for data warehousing. For example, you might want to do some
tracking on some products sold. However, you can't always put millions and
millions of records in to a huge database and then simply start running
quires on that huge dataset. It is going to be too large to work with, and
too many records need to be processed to get your information in a quick
manor. So, for example to track product sales, you might SUMMARIZE the data
into a data cube. Your 3rd dimension might only contain the last 2, or 3
months worth of transactions (sales over time). Thus, there is LOT of work
and decision making as to how much detail data you will put into the data
warehouse. Further, often much of the data you put into the data warehouse
is SUMMARY data (ie: totals of sales of product over given amount of time).
A typical complex system will often easily have 40, or 50 tables of related
data. When you move the results to a data warehouse, you are de-nominating
the data into summary tables. You may not even store which customer bought
which product, but only stuff like sales per day, and perhaps by region.
Often those resulting tables will not need a whole bunch of joined sql data
tables to get your results (they will HAVE BEEN ALREADY summarised and
totalled BEFORE you start asking questions about the data).

Only the analyst who has a good deal of experience on the kind of questions
(queries) to asked will really have a good idea as to how much data will go
into the warehouse. As mentioned, you often can't even begin to put every
single transaction into that data warehouse. A lot of planning will have to
go into how much drill down of details you want to keep.

So, while there might be buzz words like a "data warehouse", it is certainly
not just marketing hype but is result that we can't always query the live
production database fast enough to get the kinds of answers we need on a
daily bases (ie: just how much should you spend on radio and tv advertising
this week?).

Jul 20 '05 #6

P: n/a
I would argue that 65mb on Access shouldn't be overwhelming. Though
Access on a single processor PC is never going to perform. SQL Server
on a "built" box compared to Access is like comparing a Fierrari to a
Ford Focus.

SQL Server is truly ideal. Sad it's a Microsoft product. Let me tell
you that when you can tie in security to the DB through your local
network security software ("NT") then why would you go with anyother
product? If Oracle came out with their own proprietary network that
would meld with the security of their db or vise versa then Oracle
would be the ONLY solution! Oracle, can you hear me now?

Stu
qw********@boxfrog.com (Doug Baroter) wrote in message news:<fc**************************@posting.google. com>...
An arguement, in general, on Stuart's side. Here we keep talking
about millions and millions of records. I work with a small
manufacturing enterprise with annual revenue of about 50 million, they
actually use Access, data retrieval is slowish, imho, not because of
the volumne of data (about 65 MB), but because of unplanned db design.
After exporting the Access db to SQL Server, and normalized a bit,
man, data retrieval is almost light speed. They continue to use
existing Access programs like FORMS etc. while getting whatever data
they want from the SQL Server quickly and easily with a tool I
developed. Down the road they may dump the Access db, but that would
be later.
My point is, for large enterprises like Fortune 1000 something, the
conventional DW etc. may be of value to them if the implementation is
successful (I heard of a high rate of DW implementation failure)
however that may not be applicable to SMEs (small to medium sized
enterprises).
DL
http://www.hegelsoftware.com
"Albert D. Kallal" <NO************@msn.com> wrote in message news:<gW1jb.98779$9l5.63320@pd7tw2no>...
"Will" <st**********@synovusmortgage.com> wrote in message

Personally, I feel that these kinds of marketing practices undermine
our industry. It helps to unravel what little standards or
consistency we have. What do you guys think?


We can certainly jump out and say that "data warehousing" is some cool
marketing hype. However, one could also say the same thing about using a
relational database. Fact is, many of the concepts of relational database
and the concepts of data normalizing are extremely important.

The same goes for data warehousing. For example, you might want to do some
tracking on some products sold. However, you can't always put millions and
millions of records in to a huge database and then simply start running
quires on that huge dataset. It is going to be too large to work with, and
too many records need to be processed to get your information in a quick
manor. So, for example to track product sales, you might SUMMARIZE the data
into a data cube. Your 3rd dimension might only contain the last 2, or 3
months worth of transactions (sales over time). Thus, there is LOT of work
and decision making as to how much detail data you will put into the data
warehouse. Further, often much of the data you put into the data warehouse
is SUMMARY data (ie: totals of sales of product over given amount of time).
A typical complex system will often easily have 40, or 50 tables of related
data. When you move the results to a data warehouse, you are de-nominating
the data into summary tables. You may not even store which customer bought
which product, but only stuff like sales per day, and perhaps by region.
Often those resulting tables will not need a whole bunch of joined sql data
tables to get your results (they will HAVE BEEN ALREADY summarised and
totalled BEFORE you start asking questions about the data).

Only the analyst who has a good deal of experience on the kind of questions
(queries) to asked will really have a good idea as to how much data will go
into the warehouse. As mentioned, you often can't even begin to put every
single transaction into that data warehouse. A lot of planning will have to
go into how much drill down of details you want to keep.

So, while there might be buzz words like a "data warehouse", it is certainly
not just marketing hype but is result that we can't always query the live
production database fast enough to get the kinds of answers we need on a
daily bases (ie: just how much should you spend on radio and tv advertising
this week?).

Jul 20 '05 #7

P: n/a
st**********@synovusmortgage.com (Will) wrote in message news:<4e**************************@posting.google. com>...
I would argue that 65mb on Access shouldn't be overwhelming. The thing is, as I mentioned in last posting, the db was not well
designed, it has over 100 user tables, some not even being used at all
but they are not too clear as to which ones EXACTLY, and some are
linked table to external db, anyway, it's sort of snow-balling.
Though
Access on a single processor PC is never going to perform. SQL Server
on a "built" box compared to Access is like comparing a Fierrari to a
Ford Focus.

SQL Server is truly ideal. I like it too and I believe it is a good fit for lots of SMEs.
Sad it's a Microsoft product. I don't necessarily dislike MS.
Let me tell
you that when you can tie in security to the DB through your local
network security software ("NT") then why would you go with anyother
product? Yes, MS SQL Server has two security mechanisms. Though I'm not a
heavy Oracle user, my impression is Oralce is stronger than MS SQL
Server in this dept.
If Oracle came out with their own proprietary network that
would meld with the security of their db or vise versa then Oracle
would be the ONLY solution! Oracle, can you hear me now?

Stu
qw********@boxfrog.com (Doug Baroter) wrote in message news:<fc**************************@posting.google. com>...
An arguement, in general, on Stuart's side. Here we keep talking
about millions and millions of records. I work with a small
manufacturing enterprise with annual revenue of about 50 million, they
actually use Access, data retrieval is slowish, imho, not because of
the volumne of data (about 65 MB), but because of unplanned db design.
After exporting the Access db to SQL Server, and normalized a bit,
man, data retrieval is almost light speed. They continue to use
existing Access programs like FORMS etc. while getting whatever data
they want from the SQL Server quickly and easily with a tool I
developed. Down the road they may dump the Access db, but that would
be later.
My point is, for large enterprises like Fortune 1000 something, the
conventional DW etc. may be of value to them if the implementation is
successful (I heard of a high rate of DW implementation failure)
however that may not be applicable to SMEs (small to medium sized
enterprises).
DL
http://www.hegelsoftware.com
"Albert D. Kallal" <NO************@msn.com> wrote in message news:<gW1jb.98779$9l5.63320@pd7tw2no>...
"Will" <st**********@synovusmortgage.com> wrote in message

>
> Personally, I feel that these kinds of marketing practices undermine
> our industry. It helps to unravel what little standards or
> consistency we have. What do you guys think?
>

We can certainly jump out and say that "data warehousing" is some cool
marketing hype. However, one could also say the same thing about using a
relational database. Fact is, many of the concepts of relational database
and the concepts of data normalizing are extremely important.

The same goes for data warehousing. For example, you might want to do some
tracking on some products sold. However, you can't always put millions and
millions of records in to a huge database and then simply start running
quires on that huge dataset. It is going to be too large to work with, and
too many records need to be processed to get your information in a quick
manor. So, for example to track product sales, you might SUMMARIZE the data
into a data cube. Your 3rd dimension might only contain the last 2, or 3
months worth of transactions (sales over time). Thus, there is LOT of work
and decision making as to how much detail data you will put into the data
warehouse. Further, often much of the data you put into the data warehouse
is SUMMARY data (ie: totals of sales of product over given amount of time).
A typical complex system will often easily have 40, or 50 tables of related
data. When you move the results to a data warehouse, you are de-nominating
the data into summary tables. You may not even store which customer bought
which product, but only stuff like sales per day, and perhaps by region.
Often those resulting tables will not need a whole bunch of joined sql data
tables to get your results (they will HAVE BEEN ALREADY summarised and
totalled BEFORE you start asking questions about the data).

Only the analyst who has a good deal of experience on the kind of questions
(queries) to asked will really have a good idea as to how much data will go
into the warehouse. As mentioned, you often can't even begin to put every
single transaction into that data warehouse. A lot of planning will have to
go into how much drill down of details you want to keep.

So, while there might be buzz words like a "data warehouse", it is certainly
not just marketing hype but is result that we can't always query the live
production database fast enough to get the kinds of answers we need on a
daily bases (ie: just how much should you spend on radio and tv advertising
this week?).

Jul 20 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.