I've got a database with a star schema and I'm kind of confused about
what the optimizer is doing with my queries. I've got a fact table
(about 24 M rows), which I'm joining to 3 dimensions. Each dimension
ID in the fact has it's own index.
No matter what I do, I've been unable to get the optimizer to choose a
star join (IXAND or the like). It always just picks one of the
dimensions and does a nested loop join and then HSJOINs in the rest.
To be clear the queries I've tried are like this:
select a.field, b.field, count(*)
from fact f, dim1 a, dim2 b, dim 3 c
where
f.dim1_key = a.dim1_key and
f.dim2_key = b.dim2_key and
f.dim3_key = c.dim3_key and
a.field like 'Data%' and
b.field like 'Data%' and
c.field like 'Data%'
group by a.field, b.field
The really odd-ball thing is that I've got another dimension, which has
absolutely no selectivity -- i.e. All the items in the fact are in a
single dimension. (I know this seems stupid, but it's a test
database). If I join in this dimension (in queries similar to the one
above), in many of my queries, the optimizer will actually choose this
dimension to drive the NLJOIN with the fact. How it could ever use
that dimension over anything else is beyond me as the result is
essentially a full table scan.
So, what I'm wondering is 2 things:
1) Can anyone point me to a good reference about how star joins work
or offer any advice regarding gettting the optimizer to use a star
join?
2) Does anyone have any idea why the optimizer would choose worst
possible index under any circumstances?
Oh, by the way, the stats are all up-to-date.
Thanks,
Greig Wise
DB2 DBA
Press Ganey Associates 3 2361
I'm following up my own message here. I've learned more. In regards
to the problem where the optimizer selects the dimension with only one
matching pair in the fact table, I think I understand what's going on
there. Although there's only one value for this particular dimension
in the fact table, there are actually 49 other records in the dimension
for which no facts exist. It seems that that the optimizer is assuming
2% selectivity since I'm selecting one of 50 records in the dimension
(even though the fact only has data for the one single value of that
dimension that I'm specifying in the where clause of the query).
To prove this is true to myself, I removed the other 49 records from
the dimension, reran stats and re-explained the query and indeed it no
longer used this index to drive the NLJOIN.
It seems strange that the optimizer would assume a totally even
distribution of a dimension within a fact like that. Is that what it's
really doing or am I missing something? It seems pretty odd,
especially since I've run stats with distrubution I'd think the
optimizer would be able to tell that the dimension wasn't evenly
distributed in the fact table.
Any information would be helpful.
Thanks,
Greig Wise gw***@pressgane y.com wrote: I've got a database with a star schema and I'm kind of confused about what the optimizer is doing with my queries. I've got a fact table (about 24 M rows), which I'm joining to 3 dimensions. Each dimension ID in the fact has it's own index.
No matter what I do, I've been unable to get the optimizer to choose a star join (IXAND or the like). It always just picks one of the dimensions and does a nested loop join and then HSJOINs in the rest. To be clear the queries I've tried are like this:
select a.field, b.field, count(*) from fact f, dim1 a, dim2 b, dim 3 c where f.dim1_key = a.dim1_key and f.dim2_key = b.dim2_key and f.dim3_key = c.dim3_key and a.field like 'Data%' and b.field like 'Data%' and c.field like 'Data%' group by a.field, b.field
The really odd-ball thing is that I've got another dimension, which has absolutely no selectivity -- i.e. All the items in the fact are in a single dimension. (I know this seems stupid, but it's a test database). If I join in this dimension (in queries similar to the one above), in many of my queries, the optimizer will actually choose this dimension to drive the NLJOIN with the fact. How it could ever use that dimension over anything else is beyond me as the result is essentially a full table scan.
So, what I'm wondering is 2 things:
1) Can anyone point me to a good reference about how star joins work or offer any advice regarding gettting the optimizer to use a star join? 2) Does anyone have any idea why the optimizer would choose worst possible index under any circumstances?
Oh, by the way, the stats are all up-to-date.
Thanks, Greig Wise DB2 DBA Press Ganey Associates gw***@pressgane y.com wrote: I'm following up my own message here. I've learned more. In regards to the problem where the optimizer selects the dimension with only one matching pair in the fact table, I think I understand what's going on there. Although there's only one value for this particular dimension in the fact table, there are actually 49 other records in the dimension for which no facts exist. It seems that that the optimizer is assuming 2% selectivity since I'm selecting one of 50 records in the dimension (even though the fact only has data for the one single value of that dimension that I'm specifying in the where clause of the query).
To prove this is true to myself, I removed the other 49 records from the dimension, reran stats and re-explained the query and indeed it no longer used this index to drive the NLJOIN.
It seems strange that the optimizer would assume a totally even distribution of a dimension within a fact like that. Is that what it's really doing or am I missing something? It seems pretty odd, especially since I've run stats with distrubution I'd think the optimizer would be able to tell that the dimension wasn't evenly distributed in the fact table.
Any information would be helpful.
Thanks, Greig Wise gw***@pressgane y.com wrote: I've got a database with a star schema and I'm kind of confused about what the optimizer is doing with my queries. I've got a fact table (about 24 M rows), which I'm joining to 3 dimensions. Each dimension ID in the fact has it's own index.
No matter what I do, I've been unable to get the optimizer to choose a star join (IXAND or the like). It always just picks one of the dimensions and does a nested loop join and then HSJOINs in the rest. To be clear the queries I've tried are like this:
select a.field, b.field, count(*) from fact f, dim1 a, dim2 b, dim 3 c where f.dim1_key = a.dim1_key and f.dim2_key = b.dim2_key and f.dim3_key = c.dim3_key and a.field like 'Data%' and b.field like 'Data%' and c.field like 'Data%' group by a.field, b.field
The really odd-ball thing is that I've got another dimension, which has absolutely no selectivity -- i.e. All the items in the fact are in a single dimension. (I know this seems stupid, but it's a test database). If I join in this dimension (in queries similar to the one above), in many of my queries, the optimizer will actually choose this dimension to drive the NLJOIN with the fact. How it could ever use that dimension over anything else is beyond me as the result is essentially a full table scan.
So, what I'm wondering is 2 things:
1) Can anyone point me to a good reference about how star joins work or offer any advice regarding gettting the optimizer to use a star join? 2) Does anyone have any idea why the optimizer would choose worst possible index under any circumstances?
Oh, by the way, the stats are all up-to-date.
... but which stats did you run?
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
IOD Conference http://www.ibm.com/software/data/ond...ness/conf2006/
I did runstats on table with distribution and detailed indexes all for
every table in the database.
Greig
Serge Rielau wrote: Oh, by the way, the stats are all up-to-date.
.. but which stats did you run?
-- Serge Rielau DB2 Solutions Development IBM Toronto Lab
IOD Conference http://www.ibm.com/software/data/ond...ness/conf2006/ This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Jeremy Posner |
last post by:
Can someone please clear up some confusion I am having on Star
Transformations: I have read about the conditions for Star
Transformations, especially this onen (from 9i Data Warehousing
Guide):
"A prerequisite of the star transformation is that there be a
single-column bitmap index on every join column of the fact table.
These join columns include all foreign key columns"
Does this mean every join column in a particular query, or...
|
by: tekanet |
last post by:
Hi folks,
I'm new with MySQL and I'm really impressed by this database engine.
While waiting for Stored Procedures :) (ver 5: anyone can say when
will be, approximately, released?), due to the fact that I'm coming
from MSSQL world, can anyone tell me if there's a way to perform join
without the classic SQL statement (inner join on, right join on, etc)
but using the WHERE clause, with equal =(inner join), star-equal *=
(left join) and...
|
by: Prem |
last post by:
Hi,
I am having many problems with inner join. my first problem is :
1) I want to know the precedance while evaluating query with multiple
joins.
eg.
select Employees.FirstName, Employees.LastName, TerritoryID,
Employees.EmployeeID,
RegionID, ProductID
from Employees
|
by: Safalra |
last post by:
I'm planning to put some star maps on my website. Each constellation
(above about 40 degrees south) will have a page, starting with a star
map and followed by details of the mjor objects. I'm thinking of making
the star map an image map so visitors can jump down the page to the
appropriate star/nebula.
WAI recommends having extra text links for each link in an image map,
but is there much point in doing that here? Users without images...
|
by: Jon Davis |
last post by:
I'm running the Community Starter Kit from www.asp.net and I'm having a problem with the Web Box Services. I've removed the Star Wars subscription and added my own (http://ifc.gospelcom.net/index.rdf + http://www.blogs4god.com/index.rdf). But when I click on "Refresh" button in the Admin screen for these subscriptions, nothing happens. I go back to the home page and nothing happens, it's empty (because I've cleared out all content from the Admin...
| |
by: MohF1 |
last post by:
I was wondering if some power users of Python would be willing to pay
money to benefit from doing a lot of their
scientific/engineering/financial modeling and simulation from running
their programs on high powered computing HPC. HPC would essentially
mean that you run on a cluster or group of computers. The problem is,
it takes forever to parallelize code and as such many people are
uninterested in running their simulations, models, etc....
|
by: gubbachchi |
last post by:
Hi,
I need to store the star rated values in mysql. How will I do it. This is the code I have used to create the star rater,
<ul class="star-rating small-star">
<li class="current-rating" style="width:50%">Currently 2.5/5 Stars.</li>
<li><a href="#" title="1 star out of 5" class="one-star">1</a></li>
<li><a href="#" title="2 stars out of 5" class="two-stars">2</a></li>
<li><a href="#" title="3 stars out of 5"...
|
by: TC |
last post by:
I've used Access for many years. Several times, I've encountered a bug
which I refer to as the "Vanishing Joins" bug. When it happens, joins
vanish randomly from queries. More specifically, all joins vanish from
at least one (seemingly random) query.
I've always regarded the Vanishing Joins bug as a symptom of
corruption. When it happens, I usually give my users advice on how to
recover from corruption, and how to avoid it in the future....
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
| |
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |