473,586 Members | 2,682 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Indexing and Queries

Hi everybody,

After days reading stuff about indexing, extents, pages, 8KB, 64 KB,
IGNORE_DUP_KEY, CREATE INDEX, bla bla, lalalala, lllllll, grrrrrrr and
testing with different kinds of queries, indexes, situations i'm
getting confused, irritated, etc.

SITUATION
I have the following situation. We have a fact table with 1,8 million
records (Testsituation) and i'm inserting 100000 records and i want to
make it faster. Records can only be inserted when it's different from
the one in the Fact table. Currently we don't have any index on the
table. So i thought that could be quicker when i build some indexes. So
i started experimenting, lalala, and some further more and more...

The facttable has Foreign keys and measures. The foreign keys are
referenced with the primary keys in the dimensions. Also i have a field
called Source_key which indicates an unique value (This could be an
integer, but also a varchar). In this case its a varchar. Also i have
an Checksum field. An example of the query is like this (based on the
northwind database):

INSERT INTO DYN.dbo.SAW_Fac t_Order_Detail
SELECT 20051130, 62667,
Customer_Dim_Ke y =
ISNULL(DYN.dbo. SAW_B_LU_Custom er.Customer_Dim _Key,0),
Product_Dim_Key = ISNULL(DYN.dbo. SAW_B_LU_Produc t.Product_Dim_K ey,0),
FK_Order_Date,
FK_Required_Dat e,
FK_Shipped_Date ,
Shipped_Flag,
Order_Duration_ Open_Days,
Order_Required_ Exceed_Days,
Order_Detail_Un it_Price,
Order_Detail_Qu antity,
Order_Detail_Di scount,
ExtendedPrice,
Order_Number,
Order_Detail_Nu mber,
Order_Detail_Co unt,
binary_checksum (
ISNULL(DYN.dbo. SAW_B_LU_Custom er.Customer_Dim _Key,0),
ISNULL(DYN.dbo. SAW_B_LU_Produc t.Product_Dim_K ey,0),
DYN.dbo.TR_Fact _Order_Detail_V .FK_Order_Date,
DYN.dbo.TR_Fact _Order_Detail_V .FK_Required_Da te,
DYN.dbo.TR_Fact _Order_Detail_V .FK_Shipped_Dat e,
DYN.dbo.TR_Fact _Order_Detail_V .Shipped_Flag,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Duration _Open_Days,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Required _Exceed_Days,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_U nit_Price,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_Q uantity,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_D iscount,
DYN.dbo.TR_Fact _Order_Detail_V .ExtendedPrice,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Number,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_N umber,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_C ount),
14,
'N',
getdate(),
'1/1/2049'
FROM DYN.dbo.TR_Fact _Order_Detail_V
LEFT OUTER JOIN DYN.dbo.SAW_B_L U_Customer ON
DYN.dbo.TR_Fact _Order_Detail_V .Customer_Code =
DYN.dbo.SAW_B_L U_Customer.Cust omer_Code
LEFT OUTER JOIN DYN.dbo.SAW_B_L U_Product ON
DYN.dbo.TR_Fact _Order_Detail_V .Product_Code =
DYN.dbo.SAW_B_L U_Product.Produ ct_Code
WHERE NOT EXISTS
(Select *
From DYN.dbo.SAW_Fac t_Order_Detail
Where Checksum_Field = binary_checksum (
ISNULL(DYN.dbo. SAW_B_LU_Custom er.Customer_Dim _Key,0),
ISNULL(DYN.dbo. SAW_B_LU_Produc t.Product_Dim_K ey,0),
DYN.dbo.TR_Fact _Order_Detail_V .FK_Order_Date,
DYN.dbo.TR_Fact _Order_Detail_V .FK_Required_Da te,
DYN.dbo.TR_Fact _Order_Detail_V .FK_Shipped_Dat e,
DYN.dbo.TR_Fact _Order_Detail_V .Shipped_Flag,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Duration _Open_Days,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Required _Exceed_Days,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_U nit_Price,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_Q uantity,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_D iscount,
DYN.dbo.TR_Fact _Order_Detail_V .ExtendedPrice,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Number,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_N umber,
AND DYN.dbo.TR_Fact _Order_Detail_V .Order_Number =
DYN.dbo.SAW_Fac t_Order_Detail. Order_Number
AND DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_N umber =
DYN.dbo.SAW_Fac t_Order_Detail. Order_Detail_Nu mber
AND Expired_Date = '1/1/2049')
EXPERIMENTS
So i focused on the NOT Exists clause. My thought was that when an
index is created on the fields in WHERE clause of the SELECT statement
in the NOT EXISTS part it would be quicker. Because SQL Server should
be quicker decisioning whether a record existed or not. So far theory.
So i started experimenting.

I No Index.
It took about 118 seconds.

II. I created an referencetable (took some time but not relevant here)
so the insert table and the comparetable were not the same anymore. The
fields in the reference table are Checksumfield, Dossier_Source_ Code
(in query example Order_Number and Order_Detail_Nu mber) and
expired_date. So the not exists clause rebuild to reference this newly
created table and now it took about 85 seconds.

III. The part i don't understand is this. So the prior step was a
perfomance gain so i decided to build indexes on the reference table. I
tried al types of different indexes:
* index on checksum (clustered), source_key (Non clustered, unique) and
on expired date (non clustered) --> 99 seconds
* Index on source_key (clustered, unique), checksum_field (non
clustered) and on expired date (non clustered) --> 91 seconds
* the Source key (in the example Order_Number and Order_Detail_Nu mber)
was unique so i decided to build a checksum on these fields and build
an index on the checksum and of course on the Source_key and
expired_date. This took about 101 seconds. What?

So as you can see it took only longer when i build a index. So why? And
has someone any clues to make the query faster?
Thanx ,
Hennie

Nov 30 '05 #1
6 1627
Hennie,

I can't give you the ONE answer 'how to improve the query, but you
should keep in mind that while indexes (should) improve SELECT queries,
they can also slow down INSERTS, because the new records can cause page
splits.
So you have to find a compromise between good SELECT performance and
Insert performance.
By the way, did you try to analyze this query with the Index Tuning
wizard ?

Markus

Dec 1 '05 #2
Hennie,

I definitely hear the irritation in your posting. First it sounds like you
are using straight TSQL for your ETL. While this works, it's hard to
maintain and lacks some key features for performance. I encourage you to
look at some of the industry tools from Informatica, Ascential, or the new
SSIS in 2005. If using only TSQL is your goal here is a tip.

Transactions in SQL is your performance enemy. This is a data warehouse so
transactions aren't very essential. Put your DB in Bulk Load or Simple
recovery mode. If you know each day the number of inserts and deletes will
be high, try writing a new table with all the records out, drop the old
table and rename the new table. A select into should do the trick. You may
even abandon the idea of identifying what changed since writing the entire
table without transactions takes less time. Of course this approach has a
long term issue. As the source grows the time to write larger tables will
always increase.

As for adding an index to the reference table slowing things down. Adding
indexes can slow queries down. Since the optimizer puts more weight on IO
than CPU and memory. The optimizer chooses to use the new index which
reduces IO but in turn changes the plan to do a large hash join. The lowest
IO is not always the best plan.

"Hennie7863 " <hd********@hot mail.com> wrote in message
news:11******** **************@ f14g2000cwb.goo glegroups.com.. .
Hi everybody,

After days reading stuff about indexing, extents, pages, 8KB, 64 KB,
IGNORE_DUP_KEY, CREATE INDEX, bla bla, lalalala, lllllll, grrrrrrr and
testing with different kinds of queries, indexes, situations i'm
getting confused, irritated, etc.

SITUATION
I have the following situation. We have a fact table with 1,8 million
records (Testsituation) and i'm inserting 100000 records and i want to
make it faster. Records can only be inserted when it's different from
the one in the Fact table. Currently we don't have any index on the
table. So i thought that could be quicker when i build some indexes. So
i started experimenting, lalala, and some further more and more...

The facttable has Foreign keys and measures. The foreign keys are
referenced with the primary keys in the dimensions. Also i have a field
called Source_key which indicates an unique value (This could be an
integer, but also a varchar). In this case its a varchar. Also i have
an Checksum field. An example of the query is like this (based on the
northwind database):

INSERT INTO DYN.dbo.SAW_Fac t_Order_Detail
SELECT 20051130, 62667,
Customer_Dim_Ke y =
ISNULL(DYN.dbo. SAW_B_LU_Custom er.Customer_Dim _Key,0),
Product_Dim_Key = ISNULL(DYN.dbo. SAW_B_LU_Produc t.Product_Dim_K ey,0),
FK_Order_Date,
FK_Required_Dat e,
FK_Shipped_Date ,
Shipped_Flag,
Order_Duration_ Open_Days,
Order_Required_ Exceed_Days,
Order_Detail_Un it_Price,
Order_Detail_Qu antity,
Order_Detail_Di scount,
ExtendedPrice,
Order_Number,
Order_Detail_Nu mber,
Order_Detail_Co unt,
binary_checksum (
ISNULL(DYN.dbo. SAW_B_LU_Custom er.Customer_Dim _Key,0),
ISNULL(DYN.dbo. SAW_B_LU_Produc t.Product_Dim_K ey,0),
DYN.dbo.TR_Fact _Order_Detail_V .FK_Order_Date,
DYN.dbo.TR_Fact _Order_Detail_V .FK_Required_Da te,
DYN.dbo.TR_Fact _Order_Detail_V .FK_Shipped_Dat e,
DYN.dbo.TR_Fact _Order_Detail_V .Shipped_Flag,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Duration _Open_Days,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Required _Exceed_Days,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_U nit_Price,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_Q uantity,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_D iscount,
DYN.dbo.TR_Fact _Order_Detail_V .ExtendedPrice,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Number,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_N umber,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_C ount),
14,
'N',
getdate(),
'1/1/2049'
FROM DYN.dbo.TR_Fact _Order_Detail_V
LEFT OUTER JOIN DYN.dbo.SAW_B_L U_Customer ON
DYN.dbo.TR_Fact _Order_Detail_V .Customer_Code =
DYN.dbo.SAW_B_L U_Customer.Cust omer_Code
LEFT OUTER JOIN DYN.dbo.SAW_B_L U_Product ON
DYN.dbo.TR_Fact _Order_Detail_V .Product_Code =
DYN.dbo.SAW_B_L U_Product.Produ ct_Code
WHERE NOT EXISTS
(Select *
From DYN.dbo.SAW_Fac t_Order_Detail
Where Checksum_Field = binary_checksum (
ISNULL(DYN.dbo. SAW_B_LU_Custom er.Customer_Dim _Key,0),
ISNULL(DYN.dbo. SAW_B_LU_Produc t.Product_Dim_K ey,0),
DYN.dbo.TR_Fact _Order_Detail_V .FK_Order_Date,
DYN.dbo.TR_Fact _Order_Detail_V .FK_Required_Da te,
DYN.dbo.TR_Fact _Order_Detail_V .FK_Shipped_Dat e,
DYN.dbo.TR_Fact _Order_Detail_V .Shipped_Flag,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Duration _Open_Days,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Required _Exceed_Days,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_U nit_Price,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_Q uantity,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_D iscount,
DYN.dbo.TR_Fact _Order_Detail_V .ExtendedPrice,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Number,
DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_N umber,
AND DYN.dbo.TR_Fact _Order_Detail_V .Order_Number =
DYN.dbo.SAW_Fac t_Order_Detail. Order_Number
AND DYN.dbo.TR_Fact _Order_Detail_V .Order_Detail_N umber =
DYN.dbo.SAW_Fac t_Order_Detail. Order_Detail_Nu mber
AND Expired_Date = '1/1/2049')
EXPERIMENTS
So i focused on the NOT Exists clause. My thought was that when an
index is created on the fields in WHERE clause of the SELECT statement
in the NOT EXISTS part it would be quicker. Because SQL Server should
be quicker decisioning whether a record existed or not. So far theory.
So i started experimenting.

I No Index.
It took about 118 seconds.

II. I created an referencetable (took some time but not relevant here)
so the insert table and the comparetable were not the same anymore. The
fields in the reference table are Checksumfield, Dossier_Source_ Code
(in query example Order_Number and Order_Detail_Nu mber) and
expired_date. So the not exists clause rebuild to reference this newly
created table and now it took about 85 seconds.

III. The part i don't understand is this. So the prior step was a
perfomance gain so i decided to build indexes on the reference table. I
tried al types of different indexes:
* index on checksum (clustered), source_key (Non clustered, unique) and
on expired date (non clustered) --> 99 seconds
* Index on source_key (clustered, unique), checksum_field (non
clustered) and on expired date (non clustered) --> 91 seconds
* the Source key (in the example Order_Number and Order_Detail_Nu mber)
was unique so i decided to build a checksum on these fields and build
an index on the checksum and of course on the Source_key and
expired_date. This took about 101 seconds. What?

So as you can see it took only longer when i build a index. So why? And
has someone any clues to make the query faster?
Thanx ,
Hennie

Dec 1 '05 #3
>tried al types of different indexes:
* index on checksum (clustered), source_key (Non clustered, unique) and
on expired date (non clustered) --> 99 seconds
* Index on source_key (clustered, unique), checksum_field (non
clustered) and on expired date (non clustered) --> 91 seconds
that is normal. accessing via a non-clustered index (thru a bookmark)
may be quite slow. Having all the columns in just one index might be
better.
* the Source key (in the example Order_Number and Order_Detail_Nu mber)
was unique so i decided to build a checksum on these fields and build
an index on the checksum and of course on the Source_key and
expired_date . This took about 101 seconds. What?


are you speaking about one index containing all those columns?

Also note that if all the columns are not nullable, not exists()
queries may be rewritten as not in() ones.
Also try rewriting you not exists() query as an outer join.

try all 3 ways, they are equivalent if the columns you join on are not
nullable:

1.
select c.left_PK
from left_table c
where left_PK not in(select left_PK from right_table t
where c.left_PK=t.lef t_PK)

2.
select c.left_PK
from left_table c
where not exists(select 1 from right_table t
where c.left_PK=t.lef t_PK)

3.
select c.left_PK
from left_table c left outer join right_table t
on c.left_PK=t.lef t_PK
where t.left_PK is null

Dec 1 '05 #4
oops I meant

3.
select c.left_PK
from left_table c left outer join right_table t
on c.left_PK=t.lef t_PK
where t.RIGHT_PK is null

Dec 1 '05 #5
Hi Markus,

Thanx for your answer. Your answer says exactly what i wanted to do. I
wanted an insert table and a reference table so that the query would
function at its best: one table for inserting and one for referencing
and this would make it quicker (but it takes some to build this
reference table). The next step is building indexes on this referencing

I did use the index tuning wizard and its said that i should build an
index on the checksum field en the fields (ordernumber and order
detail) which is the unique identification of a record.

Hennie

Dec 6 '05 #6
put a single index on the main fact table on "Order_num" assuming this
is a somewhat unique piece of data.
Then, DELETE from your temp table everything that already exists in the
main fact table. This will allow the engine to do a single scan on the
temp table, efficiently look it up in the fact table using a FAST
index, and get rid of dups.

Then, just add what is left of the temp table to the main fact table.
Using this approach, you will never have to do a scan of the main fact
table, ESPECIALLY not a scan for EVERY record in the temp table.

Throwing indexes onto the fact table for non-unique items is generally
a bad thing, and slows things down.

Dec 12 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
5174
by: Danny | last post by:
sorry for posting a question that has probably been posted hundreds of times, but i can't seem to find the answer i need...We're using a Sql Server 7.0 database and recently started getting timeout expired errors on our views, which i'm assuming is due to the volume of records in our tables. Unfortunately i'm not too experiences w/ indexing...
2
1661
by: HumanJHawkins | last post by:
Hi, I am using data from multiple databases and/or queries. It would greatly simplify and speed things up if I could use CONTAINS in processing the results. However, "CONTAINS" requires the data to be indexed. Due to the amount of processing, I think it would be faster even if I had to re-index every time. For example, I would like to do...
1
4813
by: Thomas Bartkus | last post by:
If we have a date/time field and are doing frequent queries WHERE {date/time field} BETWEEN TimeA AND TimeB Does it make sense, query speed wise, to create an index on the date/time field? The reason I ask is that TimeA and TimeB are significant down to seconds. My *assumption* is that with a large amount of data scatter very few records...
2
1248
by: Kent.Brooke | last post by:
Can someone set me straight I know indexing is a "try & see" art. However I am at a loss if it's better to use the INCLUDE switch on a unique index and tag on the columns be used to avoid a lookup on the disk. Or if it's better to use all of the columns as part of the index? One more question, is it a good idea to always have the PK of the...
5
1931
by: ibiza | last post by:
Hi all, I am quite experimented with SQL Server, but not that much with full text indexing. After some successful attempts with english fields, I've decided to try it with Japanese characters. I don't know why, but it seems to have a strange behaviour. As in this screenshot (http://img65.imageshack.us/img65/980/jap3xt.gif), the CONTAINS...
7
1815
by: Ryan | last post by:
I have a bit of a problem with regards an indexing strategy. Well, basically there is no indexing strategy on a set of data I have at work. Now, I didn't create the design as I would have allowed for this. OK, so there is primary key (clustered) indexes (mainly composite keys), but no other indexes on the tables. As you would expect, the...
0
1260
by: Owen Jenkins | last post by:
Hi, My application allows users to create a new back end for separate purposes. It does this by using Make Table Queries and Indexing via SQL. For example ... sqlString = "SELECT * INTO " & TableName & " IN '" & NewDBName & "' FROM " & TableName & " WHERE 1=2;" DoCmd.RunSQL sqlString 'The WHERE 1=2 ensures that no records are copied -...
4
3255
by: Owen Jenkins | last post by:
Hi, No-one replied to this when I sent it last week. Any bites out there today?? ----- My application allows users to create a new back end for separate purposes. It does this by using Make Table Queries and Indexing via SQL. For example ...
4
2056
by: Amar | last post by:
Hi All, I need to select data from a database table containing huge amount of data. Now I am storing data using one primary key and I am just using simple select statement, and this process gives me the output but it is taking long to execute the query. As much I had heared I want to use some indexing or cluster indexing which might help me...
0
7912
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7839
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
6614
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5710
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3837
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3865
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2345
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1449
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1180
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.