473,322 Members | 1,526 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Star Joins

I've got a database with a star schema and I'm kind of confused about
what the optimizer is doing with my queries. I've got a fact table
(about 24 M rows), which I'm joining to 3 dimensions. Each dimension
ID in the fact has it's own index.

No matter what I do, I've been unable to get the optimizer to choose a
star join (IXAND or the like). It always just picks one of the
dimensions and does a nested loop join and then HSJOINs in the rest.
To be clear the queries I've tried are like this:

select a.field, b.field, count(*)
from fact f, dim1 a, dim2 b, dim 3 c
where
f.dim1_key = a.dim1_key and
f.dim2_key = b.dim2_key and
f.dim3_key = c.dim3_key and
a.field like 'Data%' and
b.field like 'Data%' and
c.field like 'Data%'
group by a.field, b.field

The really odd-ball thing is that I've got another dimension, which has
absolutely no selectivity -- i.e. All the items in the fact are in a
single dimension. (I know this seems stupid, but it's a test
database). If I join in this dimension (in queries similar to the one
above), in many of my queries, the optimizer will actually choose this
dimension to drive the NLJOIN with the fact. How it could ever use
that dimension over anything else is beyond me as the result is
essentially a full table scan.

So, what I'm wondering is 2 things:

1) Can anyone point me to a good reference about how star joins work
or offer any advice regarding gettting the optimizer to use a star
join?
2) Does anyone have any idea why the optimizer would choose worst
possible index under any circumstances?

Oh, by the way, the stats are all up-to-date.

Thanks,
Greig Wise
DB2 DBA
Press Ganey Associates

Jun 5 '06 #1
3 2335
I'm following up my own message here. I've learned more. In regards
to the problem where the optimizer selects the dimension with only one
matching pair in the fact table, I think I understand what's going on
there. Although there's only one value for this particular dimension
in the fact table, there are actually 49 other records in the dimension
for which no facts exist. It seems that that the optimizer is assuming
2% selectivity since I'm selecting one of 50 records in the dimension
(even though the fact only has data for the one single value of that
dimension that I'm specifying in the where clause of the query).

To prove this is true to myself, I removed the other 49 records from
the dimension, reran stats and re-explained the query and indeed it no
longer used this index to drive the NLJOIN.

It seems strange that the optimizer would assume a totally even
distribution of a dimension within a fact like that. Is that what it's
really doing or am I missing something? It seems pretty odd,
especially since I've run stats with distrubution I'd think the
optimizer would be able to tell that the dimension wasn't evenly
distributed in the fact table.

Any information would be helpful.

Thanks,
Greig Wise
gw***@pressganey.com wrote:
I've got a database with a star schema and I'm kind of confused about
what the optimizer is doing with my queries. I've got a fact table
(about 24 M rows), which I'm joining to 3 dimensions. Each dimension
ID in the fact has it's own index.

No matter what I do, I've been unable to get the optimizer to choose a
star join (IXAND or the like). It always just picks one of the
dimensions and does a nested loop join and then HSJOINs in the rest.
To be clear the queries I've tried are like this:

select a.field, b.field, count(*)
from fact f, dim1 a, dim2 b, dim 3 c
where
f.dim1_key = a.dim1_key and
f.dim2_key = b.dim2_key and
f.dim3_key = c.dim3_key and
a.field like 'Data%' and
b.field like 'Data%' and
c.field like 'Data%'
group by a.field, b.field

The really odd-ball thing is that I've got another dimension, which has
absolutely no selectivity -- i.e. All the items in the fact are in a
single dimension. (I know this seems stupid, but it's a test
database). If I join in this dimension (in queries similar to the one
above), in many of my queries, the optimizer will actually choose this
dimension to drive the NLJOIN with the fact. How it could ever use
that dimension over anything else is beyond me as the result is
essentially a full table scan.

So, what I'm wondering is 2 things:

1) Can anyone point me to a good reference about how star joins work
or offer any advice regarding gettting the optimizer to use a star
join?
2) Does anyone have any idea why the optimizer would choose worst
possible index under any circumstances?

Oh, by the way, the stats are all up-to-date.

Thanks,
Greig Wise
DB2 DBA
Press Ganey Associates


Jun 6 '06 #2
gw***@pressganey.com wrote:
I'm following up my own message here. I've learned more. In regards
to the problem where the optimizer selects the dimension with only one
matching pair in the fact table, I think I understand what's going on
there. Although there's only one value for this particular dimension
in the fact table, there are actually 49 other records in the dimension
for which no facts exist. It seems that that the optimizer is assuming
2% selectivity since I'm selecting one of 50 records in the dimension
(even though the fact only has data for the one single value of that
dimension that I'm specifying in the where clause of the query).

To prove this is true to myself, I removed the other 49 records from
the dimension, reran stats and re-explained the query and indeed it no
longer used this index to drive the NLJOIN.

It seems strange that the optimizer would assume a totally even
distribution of a dimension within a fact like that. Is that what it's
really doing or am I missing something? It seems pretty odd,
especially since I've run stats with distrubution I'd think the
optimizer would be able to tell that the dimension wasn't evenly
distributed in the fact table.

Any information would be helpful.

Thanks,
Greig Wise
gw***@pressganey.com wrote:
I've got a database with a star schema and I'm kind of confused about
what the optimizer is doing with my queries. I've got a fact table
(about 24 M rows), which I'm joining to 3 dimensions. Each dimension
ID in the fact has it's own index.

No matter what I do, I've been unable to get the optimizer to choose a
star join (IXAND or the like). It always just picks one of the
dimensions and does a nested loop join and then HSJOINs in the rest.
To be clear the queries I've tried are like this:

select a.field, b.field, count(*)
from fact f, dim1 a, dim2 b, dim 3 c
where
f.dim1_key = a.dim1_key and
f.dim2_key = b.dim2_key and
f.dim3_key = c.dim3_key and
a.field like 'Data%' and
b.field like 'Data%' and
c.field like 'Data%'
group by a.field, b.field

The really odd-ball thing is that I've got another dimension, which has
absolutely no selectivity -- i.e. All the items in the fact are in a
single dimension. (I know this seems stupid, but it's a test
database). If I join in this dimension (in queries similar to the one
above), in many of my queries, the optimizer will actually choose this
dimension to drive the NLJOIN with the fact. How it could ever use
that dimension over anything else is beyond me as the result is
essentially a full table scan.

So, what I'm wondering is 2 things:

1) Can anyone point me to a good reference about how star joins work
or offer any advice regarding gettting the optimizer to use a star
join?
2) Does anyone have any idea why the optimizer would choose worst
possible index under any circumstances?

Oh, by the way, the stats are all up-to-date.

... but which stats did you run?

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

IOD Conference
http://www.ibm.com/software/data/ond...ness/conf2006/
Jun 6 '06 #3
I did runstats on table with distribution and detailed indexes all for
every table in the database.

Greig

Serge Rielau wrote:
Oh, by the way, the stats are all up-to-date.

.. but which stats did you run?

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

IOD Conference
http://www.ibm.com/software/data/ond...ness/conf2006/


Jun 7 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Jeremy Posner | last post by:
Can someone please clear up some confusion I am having on Star Transformations: I have read about the conditions for Star Transformations, especially this onen (from 9i Data Warehousing Guide): ...
0
by: tekanet | last post by:
Hi folks, I'm new with MySQL and I'm really impressed by this database engine. While waiting for Stored Procedures :) (ver 5: anyone can say when will be, approximately, released?), due to the...
3
by: Prem | last post by:
Hi, I am having many problems with inner join. my first problem is : 1) I want to know the precedance while evaluating query with multiple joins. eg. select Employees.FirstName,...
8
by: Safalra | last post by:
I'm planning to put some star maps on my website. Each constellation (above about 40 degrees south) will have a page, starting with a star map and followed by details of the mjor objects. I'm...
1
by: Jon Davis | last post by:
I'm running the Community Starter Kit from www.asp.net and I'm having a problem with the Web Box Services. I've removed the Star Wars subscription and added my own (http://ifc.gospelcom.net/index.rdf...
0
by: MohF1 | last post by:
I was wondering if some power users of Python would be willing to pay money to benefit from doing a lot of their scientific/engineering/financial modeling and simulation from running their...
5
by: gubbachchi | last post by:
Hi, I need to store the star rated values in mysql. How will I do it. This is the code I have used to create the star rater, <ul class="star-rating small-star"> <li class="current-rating"...
36
by: TC | last post by:
I've used Access for many years. Several times, I've encountered a bug which I refer to as the "Vanishing Joins" bug. When it happens, joins vanish randomly from queries. More specifically, all...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.