473,320 Members | 1,952 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Re: MDC vs. cluster index: which to use?

Ian
Henry J. wrote:
>
MDC on type_id? So cluster index is not a good idea?
MDC *guarantees* clustering, whereas a table with a clustering index
will eventually require maintenance (a.k.a. reorg) to maintain the
cluster ratio.

That's not to say that a clustering index isn't still valuable
(especially for high cardinality columns that aren't a reasonable
candidate as an MDC dimension).

Oct 21 '08 #1
10 3109
On Oct 21, 1:24*am, Ian <ianb...@mobileaudio.comwrote:
Henry J. wrote:
MDC on type_id? *So cluster index is not a good idea?

MDC *guarantees* clustering, whereas a table with a clustering index
will eventually require maintenance (a.k.a. reorg) to maintain the
cluster ratio.

That's not to say that a clustering index isn't still valuable
(especially for high cardinality columns that aren't a reasonable
candidate as an MDC dimension).
For my case, the table is purged and re-populated every day. And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? I guess I'd choose type_id
if I have to pick one column for MDC. Then it won't be as helpful as
a cluster index on (type_id, product_id)?
Oct 21 '08 #2
On Oct 21, 7:18*am, "Henry J." <tank209...@yahoo.comwrote:
For my case, the table is purged and re-populated every day. *And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? *I guess I'd choose type_id
if I have to pick one column for MDC. *Then it won't be as helpful as
a cluster index on (type_id, product_id)?
MDC still might be faster - probably faster at loading and equivalent
for searches based on type_id. For searches on type_id and
product_id, it could outperform the cluster index, if you also create
an index on product_id - and the combination (MDC + index on
product_id) will definitely outperform a cluster index on type_id,
product_id for product_id searches.

You'll want to verify these claims - but since you rebuild the table
every day, you are in a perfect position to try these various options
out. Pick a new one each day until you've found the right
combination.

-Chris
Oct 21 '08 #3
On Oct 21, 11:19*am, ChrisC <cunningham...@gmail.comwrote:
On Oct 21, 7:18*am, "Henry J." <tank209...@yahoo.comwrote:
For my case, the table is purged and re-populated every day. *And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? *I guess I'd choose type_id
if I have to pick one column for MDC. *Then it won't be as helpful as
a cluster index on (type_id, product_id)?

MDC still might be faster - probably faster at loading and equivalent
for searches based on type_id. *For searches on type_id and
product_id, it could outperform the cluster index, if you also create
an index on product_id - and the combination (MDC + index on
product_id) will definitely outperform a cluster index on type_id,
product_id for product_id searches.

You'll want to verify these claims - but since you rebuild the table
every day, you are in a perfect position to try these various options
out. *Pick a new one each day until you've found the right
combination.

-Chris
Thanks Chris.

The primary key of the table is (type_id, product_id). Now if we add
a MDC on type_id, will the MDC be actually used if we query like the
following, or the primary key will be used only? I guess even if only
the primary key is used, the I/O will still be benefited.

select client.client_id, sum( prod.quantity * prod.price *
client.ordersize )
from prod, client
where prod.type_id = 4 and prod.product_id =
client.product_id
group by client.client_id

Also, would the following query benefit from both the MDC and the
primary key?

select *
from prod
where prod.type_id = 4 and prod.product_id = 101

Or, instead of the primary key, just use an index on (product_id)
would cover both of the above queries?

I'll certainly test to find out. But I'd like to gain better
understanding about indexing. Also, the turnaround of making such
changes are not as fast as I want and I hope to get to the best
solution a little faster. Thanks.

Oct 21 '08 #4
Ian
Henry J. wrote:
On Oct 21, 1:24 am, Ian <ianb...@mobileaudio.comwrote:
>Henry J. wrote:
>>MDC on type_id? So cluster index is not a good idea?
MDC *guarantees* clustering, whereas a table with a clustering index
will eventually require maintenance (a.k.a. reorg) to maintain the
cluster ratio.

That's not to say that a clustering index isn't still valuable
(especially for high cardinality columns that aren't a reasonable
candidate as an MDC dimension).

For my case, the table is purged and re-populated every day. And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? I guess I'd choose type_id
if I have to pick one column for MDC. Then it won't be as helpful as
a cluster index on (type_id, product_id)?
By purging it every day, you're going to make it nearly impossible
for a clustering index to be useful. Unless your data comes in
already sorted by type_id,product_id the cluster ratio is probably
going to be very low, regardless of what you set pctfree to.

Furthermore, will a clustering index really help with queries that
aggregating on the clustering key? Maybe, but maybe not -- that
depends on whether the query is doing an index scan / fetch. If
it's just doing a tablescan, you'll still need to do a sort.
Oct 21 '08 #5
On Oct 21, 2:32*pm, Ian <ianb...@mobileaudio.comwrote:
Henry J. wrote:
On Oct 21, 1:24 am, Ian <ianb...@mobileaudio.comwrote:
Henry J. wrote:
>MDC on type_id? *So cluster index is not a good idea?
MDC *guarantees* clustering, whereas a table with a clustering index
will eventually require maintenance (a.k.a. reorg) to maintain the
cluster ratio.
That's not to say that a clustering index isn't still valuable
(especially for high cardinality columns that aren't a reasonable
candidate as an MDC dimension).
For my case, the table is purged and re-populated every day. *And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? *I guess I'd choose type_id
if I have to pick one column for MDC. *Then it won't be as helpful as
a cluster index on (type_id, product_id)?

By purging it every day, you're going to make it nearly impossible
for a clustering index to be useful. *Unless your data comes in
already sorted by type_id,product_id the cluster ratio is probably
going to be very low, regardless of what you set pctfree to.

Furthermore, will a clustering index really help with queries that
aggregating on the clustering key? *Maybe, but maybe not -- that
depends on whether the query is doing an index scan / fetch. *If
it's just doing a tablescan, you'll still need to do a sort.- Hide quotedtext -

- Show quoted text -
I'm totally confused -- doesn't a clustering index make sure the rows
are inserted as sorted and hence with high cluster ratio? And only
updates/deletes/inserts over time would reduce the cluster ratio? And
what I understood is that a truly clustered index would avoid a sort.
Oct 21 '08 #6
On Oct 21, 1:05*pm, "Henry J." <tank209...@yahoo.comwrote:
On Oct 21, 11:19*am, ChrisC <cunningham...@gmail.comwrote:


On Oct 21, 7:18*am, "HenryJ." <tank209...@yahoo.comwrote:
For my case, the table is purged and re-populated every day. *And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? *I guess I'd choose type_id
if I have to pick one column for MDC. *Then it won't be as helpful as
a cluster index on (type_id, product_id)?
MDC still might be faster - probably faster at loading and equivalent
for searches based on type_id. *For searches on type_id and
product_id, it could outperform the cluster index, if you also create
an index on product_id - and the combination (MDC + index on
product_id) will definitely outperform a cluster index on type_id,
product_id for product_id searches.
You'll want to verify these claims - but since you rebuild the table
every day, you are in a perfect position to try these various options
out. *Pick a new one each day until you've found the right
combination.
-Chris

Thanks Chris.

The primary key of the table is (type_id, product_id). *Now if we add
a MDC on type_id, will the MDC be actually used if we query like the
following, or the primary key will be used only? *I guess even if only
the primary key is used, the I/O will still be benefited.

* * * * * * select client.client_id, sum( prod.quantity * prod.price *
client.ordersize )
* * * * * * *from prod, client
* * * * * * *where prod.type_id = 4 and prod.product_id =
client.product_id
* * * * * * *group by client.client_id

Also, would the following query benefit from both the MDC and the
primary key?

* * *select *
* * * * * *from prod
* * * * * * *where prod.type_id = 4 and prod.product_id = *101

Or, instead of the primary key, just use an index on (product_id)
would cover both of the above queries?

I'll certainly test to find out. *But I'd like to gain better
understanding about indexing. *Also, the turnaround of making such
changes are not as fast as I want and I hope to get to the best
solution a little faster. *Thanks.- Hide quoted text -

- Show quoted text -
Now I've changed the primary key of the table to (type_id, product_id)
while making (type_id) as MDC.

When examing the plan, I found that the following query:

select client.client_id, sum( prod.quantity * prod.price
* client.ordersize )
from prod, client
where prod.type_id = 4 and prod.product_id =
client.product_id
group by client.client_id

does use the new primary key on (type_id, product_id). However, if I
use a bind variable for type_id, i.e.:

select client.client_id, sum( prod.quantity * prod.price
* client.ordersize )
from prod, client
where prod.type_id = ? and prod.product_id =
client.product_id
group by client.client_id

It only uses the MDC on (type_id).

Now, my questions are:

1) should I drop the bind variable on type_id so it can utilize the
primary key? The cardinality of type_id is about 5 to 8.

2) Would an addition on (product_id) help?

Thanks.

Oct 28 '08 #7
On Oct 21, 11:19*am, ChrisC <cunningham...@gmail.comwrote:
On Oct 21, 7:18*am, "HenryJ." <tank209...@yahoo.comwrote:
For my case, the table is purged and re-populated every day. *And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? *I guess I'd choose type_id
if I have to pick one column forMDC. *Then it won't be as helpful as
a cluster index on (type_id, product_id)?

MDCstill might be faster - probably faster at loading and equivalent
for searches based on type_id. *For searches on type_id and
product_id, it could outperform the cluster index, if you also create
an index on product_id - and the combination (MDC+ index on
product_id) will definitely outperform a cluster index on type_id,
product_id for product_id searches.

You'll want to verify these claims - but since you rebuild the table
every day, you are in a perfect position to try these various options
out. *Pick a new one each day until you've found the right
combination.

-Chris
I have the primary key changed to (type_id, product_id), and added a
MDC on type_id. Now the following query:
select client.client_id, sum( prod.quantity * prod.price *
client.ordersize )
from prod, client
where prod.type_id = 4 and prod.product_id =
client.product_id
group by client.client_id

it will utilize the primary key. However, if I use a host variable
for type_id, it will only pick up the MDC on type_id, which I think is
not optimal.

My questions are:

1) should I add an index on (product_id) which is what's really needed
rather than the primary key on (type_id, product_id). Note that the
cardinality of type_id is pretty small, around 6.

2) should I drop the host variable on type_id? Note that each type_id
has the same number of product_id's. Perhaps this is not a good case
for host variable. BTW, this SQL is embedded in a Java (as static SQL
I think?).

Thanks!

Thanks!
Oct 28 '08 #8
On Oct 21, 11:19*am, ChrisC <cunningham...@gmail.comwrote:
On Oct 21, 7:18*am, "HenryJ." <tank209...@yahoo.comwrote:
For my case, the table is purged and re-populated every day. *And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? *I guess I'd choose type_id
if I have to pick one column forMDC. *Then it won't be as helpful as
a cluster index on (type_id, product_id)?

MDCstill might be faster - probably faster at loading and equivalent
for searches based on type_id. *For searches on type_id and
product_id, it could outperform the cluster index, if you also create
an index on product_id - and the combination (MDC+ index on
product_id) will definitely outperform a cluster index on type_id,
product_id for product_id searches.

You'll want to verify these claims - but since you rebuild the table
every day, you are in a perfect position to try these various options
out. *Pick a new one each day until you've found the right
combination.

-Chris
I have the primary key changed to (type_id, product_id), and added a
MDC on type_id. Now the following query:
select client.client_id, sum( prod.quantity * prod.price *
client.ordersize )
from prod, client
where prod.type_id = 4 and prod.product_id =
client.product_id
group by client.client_id

will utilize the primary key. However, if I use a host variable for
type_id (i.e., type_id = ?), it will only pick up the MDC on type_id,
which I think is not optimal.

My questions are:

1) should I add an index on (product_id) which is what's really needed
rather than the primary key on (type_id, product_id). Note that the
cardinality of type_id is pretty small, around 6.

2) should I drop the host variable on type_id? Note that each type_id
has the same number of product_id's. Perhaps this is not a good case
for host variable. BTW, this SQL is embedded in a Java (as static SQL
I think?).

Thanks!
Oct 28 '08 #9
On Oct 29, 12:56*am, "Henry J." <tank209...@yahoo.comwrote:
On Oct 21, 11:19*am, ChrisC <cunningham...@gmail.comwrote:


On Oct 21, 7:18*am, "HenryJ." <tank209...@yahoo.comwrote:
For my case, the table is purged and re-populated every day. *And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? *I guess I'd choose type_id
if I have to pick one column forMDC. *Then it won't be as helpful as
a cluster index on (type_id, product_id)?
MDCstill might be faster - probably faster at loading and equivalent
for searches based on type_id. *For searches on type_id and
product_id, it could outperform the cluster index, if you also create
an index on product_id - and the combination (MDC+ index on
product_id) will definitely outperform a cluster index on type_id,
product_id for product_id searches.
You'll want to verify these claims - but since you rebuild the table
every day, you are in a perfect position to try these various options
out. *Pick a new one each day until you've found the right
combination.
-Chris

I have the primary key changed to (type_id, product_id), and added a
MDC on type_id. *Now the following query:

* * * *select client.client_id, sum( prod.quantity * prod.price *
client.ordersize )
* * * * * * *from prod, client
* * * * * * *where prod.type_id = 4 and prod.product_id =
client.product_id
* * * * * * *group by client.client_id

will utilize the primary key. *However, if I use a host variable for
type_id (i.e., type_id = ?), it will only pick up the MDC on type_id,
which I think is not optimal.

My questions are:

1) should I add an index on (product_id) which is what's really needed
rather than the primary key on (type_id, product_id). *Note that the
cardinality of type_id is pretty small, around 6.

2) should I drop the host variable on type_id? *Note that each type_id
has the same number of product_id's. *Perhaps this is not a good case
for host variable. *BTW, this SQL is embedded in a Java (as static SQL
I think?).

Thanks!- Hide quoted text -

- Show quoted text -
Hi,
there is a shell script know extent utilisation.sh which can be
downloaded from IBM website.it will answers all your queries regarding
which will be a better column and how much space will you save and how
will it improve the performance.
Oct 29 '08 #10
On Oct 29, 9:55*am, Saurabh...@gmail.com wrote:
On Oct 29, 12:56*am, "HenryJ." <tank209...@yahoo.comwrote:


On Oct 21, 11:19*am, ChrisC <cunningham...@gmail.comwrote:
On Oct 21, 7:18*am, "HenryJ." <tank209...@yahoo.comwrote:
For my case, the table is purged and re-populated every day. *Andthe
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? *I guess I'd choose type_id
if I have to pick one column forMDC. *Then it won't be as helpfulas
a cluster index on (type_id, product_id)?
MDCstill might be faster - probably faster at loading and equivalent
for searches based on type_id. *For searches on type_id and
product_id, it could outperform the cluster index, if you also create
an index on product_id - and the combination (MDC+ index on
product_id) will definitely outperform a cluster index on type_id,
product_id for product_id searches.
You'll want to verify these claims - but since you rebuild the table
every day, you are in a perfect position to try these various options
out. *Pick a new one each day until you've found the right
combination.
-Chris
I have the primary key changed to (type_id, product_id), and added a
MDCon type_id. *Now the following query:
* * * *select client.client_id, sum( prod.quantity * prod.price*
client.ordersize )
* * * * * * *from prod, client
* * * * * * *where prod.type_id = 4 and prod.product_id=
client.product_id
* * * * * * *group by client.client_id
will utilize the primary key. *However, if I use a host variable for
type_id (i.e., type_id = ?), it will only pick up theMDCon type_id,
which I think is not optimal.
My questions are:
1) should I add an index on (product_id) which is what's really needed
rather than the primary key on (type_id, product_id). *Note that the
cardinality of type_id is pretty small, around 6.
2) should I drop the host variable on type_id? *Note that each type_id
has the same number of product_id's. *Perhaps this is not a good case
for host variable. *BTW, this SQL is embedded in a Java (as static SQL
I think?).
Thanks!- Hide quoted text -
- Show quoted text -

Hi,
there is a shell script know extent utilisation.sh which can be
downloaded from IBM website.it will answers all your queries regarding
which will be a better column and how much space will you save and how
will it improve the performance.- Hide quoted text -

- Show quoted text -
Could you please give me the link? I couldn't find it. Thanks.
Oct 29 '08 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
by: Philip Yale | last post by:
I'm probably going to get shot down with thousands of reasons for this, but I've never really heard or read a convincing explanation, so here goes ... Clustered indexes are more efficient at...
2
by: Adam Kavan | last post by:
I have a table that II am constantly inserting into (around 10 times a second right now but hope to increase latter). I hold these rows for a week then summarize and delete them. During that week...
2
by: Igor Shevchenko | last post by:
Hi, Are there any plans on adding CLUSTER-related information to "\d tablename" ? -- Best regards, Igor Shevchenko ---------------------------(end of broadcast)---------------------------...
7
by: Luiz Guilherme Freitas de Paula | last post by:
Hi everyone, Anyone have PostgreSQL running on a cluster? I'm trying to run on a OpenMosix 2-machine cluster but the processes are not migrating. All the load is concentrating in one machine....
6
by: Otto | last post by:
Hi, The application I work on has a table created with a clustered index. A stored procedure that updates this table is then created while the table is empty. As i understand it, the access plan...
3
by: Simon | last post by:
Hi All, I'm hoping someone will have some words of wisdom for me regarding MS Clustering on Windows 2003. I have a service that runs on a cluster. During invocation it's supposed to...
4
by: JohnnyDeep | last post by:
I am trying to create a store proc that contain a create index with the cluster option and I receive DB21034E The command was processed as an SQL statement because it was not a valid Command...
1
by: dunleav1 | last post by:
The table is a many row and many column table that is in a 16K page size. I am running row compression on the table. A have an index that over time have come to have a low index cluster ratio. ...
2
by: dunleav1 | last post by:
I have a many row and many column table that is in a 16K page size. I have four indexes on the table. I am running row compression on the table. The table does not have a primary key. The table...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.