473,396 Members | 1,748 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Indexing and Queries

Hi everybody,

After days reading stuff about indexing, extents, pages, 8KB, 64 KB,
IGNORE_DUP_KEY, CREATE INDEX, bla bla, lalalala, lllllll, grrrrrrr and
testing with different kinds of queries, indexes, situations i'm
getting confused, irritated, etc.

SITUATION
I have the following situation. We have a fact table with 1,8 million
records (Testsituation) and i'm inserting 100000 records and i want to
make it faster. Records can only be inserted when it's different from
the one in the Fact table. Currently we don't have any index on the
table. So i thought that could be quicker when i build some indexes. So
i started experimenting, lalala, and some further more and more...

The facttable has Foreign keys and measures. The foreign keys are
referenced with the primary keys in the dimensions. Also i have a field
called Source_key which indicates an unique value (This could be an
integer, but also a varchar). In this case its a varchar. Also i have
an Checksum field. An example of the query is like this (based on the
northwind database):

INSERT INTO DYN.dbo.SAW_Fact_Order_Detail
SELECT 20051130, 62667,
Customer_Dim_Key =
ISNULL(DYN.dbo.SAW_B_LU_Customer.Customer_Dim_Key, 0),
Product_Dim_Key = ISNULL(DYN.dbo.SAW_B_LU_Product.Product_Dim_Key,0) ,
FK_Order_Date,
FK_Required_Date,
FK_Shipped_Date,
Shipped_Flag,
Order_Duration_Open_Days,
Order_Required_Exceed_Days,
Order_Detail_Unit_Price,
Order_Detail_Quantity,
Order_Detail_Discount,
ExtendedPrice,
Order_Number,
Order_Detail_Number,
Order_Detail_Count,
binary_checksum(
ISNULL(DYN.dbo.SAW_B_LU_Customer.Customer_Dim_Key, 0),
ISNULL(DYN.dbo.SAW_B_LU_Product.Product_Dim_Key,0) ,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Order_Date,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Required_Date,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Shipped_Date,
DYN.dbo.TR_Fact_Order_Detail_V.Shipped_Flag,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Duration_Open _Days,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Required_Exce ed_Days,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Unit_P rice,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Quanti ty,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Discou nt,
DYN.dbo.TR_Fact_Order_Detail_V.ExtendedPrice,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Number,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Number ,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Count) ,
14,
'N',
getdate(),
'1/1/2049'
FROM DYN.dbo.TR_Fact_Order_Detail_V
LEFT OUTER JOIN DYN.dbo.SAW_B_LU_Customer ON
DYN.dbo.TR_Fact_Order_Detail_V.Customer_Code =
DYN.dbo.SAW_B_LU_Customer.Customer_Code
LEFT OUTER JOIN DYN.dbo.SAW_B_LU_Product ON
DYN.dbo.TR_Fact_Order_Detail_V.Product_Code =
DYN.dbo.SAW_B_LU_Product.Product_Code
WHERE NOT EXISTS
(Select *
From DYN.dbo.SAW_Fact_Order_Detail
Where Checksum_Field = binary_checksum(
ISNULL(DYN.dbo.SAW_B_LU_Customer.Customer_Dim_Key, 0),
ISNULL(DYN.dbo.SAW_B_LU_Product.Product_Dim_Key,0) ,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Order_Date,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Required_Date,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Shipped_Date,
DYN.dbo.TR_Fact_Order_Detail_V.Shipped_Flag,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Duration_Open _Days,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Required_Exce ed_Days,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Unit_P rice,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Quanti ty,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Discou nt,
DYN.dbo.TR_Fact_Order_Detail_V.ExtendedPrice,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Number,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Number ,
AND DYN.dbo.TR_Fact_Order_Detail_V.Order_Number =
DYN.dbo.SAW_Fact_Order_Detail.Order_Number
AND DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Number =
DYN.dbo.SAW_Fact_Order_Detail.Order_Detail_Number
AND Expired_Date = '1/1/2049')
EXPERIMENTS
So i focused on the NOT Exists clause. My thought was that when an
index is created on the fields in WHERE clause of the SELECT statement
in the NOT EXISTS part it would be quicker. Because SQL Server should
be quicker decisioning whether a record existed or not. So far theory.
So i started experimenting.

I No Index.
It took about 118 seconds.

II. I created an referencetable (took some time but not relevant here)
so the insert table and the comparetable were not the same anymore. The
fields in the reference table are Checksumfield, Dossier_Source_Code
(in query example Order_Number and Order_Detail_Number) and
expired_date. So the not exists clause rebuild to reference this newly
created table and now it took about 85 seconds.

III. The part i don't understand is this. So the prior step was a
perfomance gain so i decided to build indexes on the reference table. I
tried al types of different indexes:
* index on checksum (clustered), source_key (Non clustered, unique) and
on expired date (non clustered) --> 99 seconds
* Index on source_key (clustered, unique), checksum_field (non
clustered) and on expired date (non clustered) --> 91 seconds
* the Source key (in the example Order_Number and Order_Detail_Number)
was unique so i decided to build a checksum on these fields and build
an index on the checksum and of course on the Source_key and
expired_date. This took about 101 seconds. What?

So as you can see it took only longer when i build a index. So why? And
has someone any clues to make the query faster?
Thanx ,
Hennie

Nov 30 '05 #1
6 1613
Hennie,

I can't give you the ONE answer 'how to improve the query, but you
should keep in mind that while indexes (should) improve SELECT queries,
they can also slow down INSERTS, because the new records can cause page
splits.
So you have to find a compromise between good SELECT performance and
Insert performance.
By the way, did you try to analyze this query with the Index Tuning
wizard ?

Markus

Dec 1 '05 #2
Hennie,

I definitely hear the irritation in your posting. First it sounds like you
are using straight TSQL for your ETL. While this works, it's hard to
maintain and lacks some key features for performance. I encourage you to
look at some of the industry tools from Informatica, Ascential, or the new
SSIS in 2005. If using only TSQL is your goal here is a tip.

Transactions in SQL is your performance enemy. This is a data warehouse so
transactions aren't very essential. Put your DB in Bulk Load or Simple
recovery mode. If you know each day the number of inserts and deletes will
be high, try writing a new table with all the records out, drop the old
table and rename the new table. A select into should do the trick. You may
even abandon the idea of identifying what changed since writing the entire
table without transactions takes less time. Of course this approach has a
long term issue. As the source grows the time to write larger tables will
always increase.

As for adding an index to the reference table slowing things down. Adding
indexes can slow queries down. Since the optimizer puts more weight on IO
than CPU and memory. The optimizer chooses to use the new index which
reduces IO but in turn changes the plan to do a large hash join. The lowest
IO is not always the best plan.

"Hennie7863" <hd********@hotmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...
Hi everybody,

After days reading stuff about indexing, extents, pages, 8KB, 64 KB,
IGNORE_DUP_KEY, CREATE INDEX, bla bla, lalalala, lllllll, grrrrrrr and
testing with different kinds of queries, indexes, situations i'm
getting confused, irritated, etc.

SITUATION
I have the following situation. We have a fact table with 1,8 million
records (Testsituation) and i'm inserting 100000 records and i want to
make it faster. Records can only be inserted when it's different from
the one in the Fact table. Currently we don't have any index on the
table. So i thought that could be quicker when i build some indexes. So
i started experimenting, lalala, and some further more and more...

The facttable has Foreign keys and measures. The foreign keys are
referenced with the primary keys in the dimensions. Also i have a field
called Source_key which indicates an unique value (This could be an
integer, but also a varchar). In this case its a varchar. Also i have
an Checksum field. An example of the query is like this (based on the
northwind database):

INSERT INTO DYN.dbo.SAW_Fact_Order_Detail
SELECT 20051130, 62667,
Customer_Dim_Key =
ISNULL(DYN.dbo.SAW_B_LU_Customer.Customer_Dim_Key, 0),
Product_Dim_Key = ISNULL(DYN.dbo.SAW_B_LU_Product.Product_Dim_Key,0) ,
FK_Order_Date,
FK_Required_Date,
FK_Shipped_Date,
Shipped_Flag,
Order_Duration_Open_Days,
Order_Required_Exceed_Days,
Order_Detail_Unit_Price,
Order_Detail_Quantity,
Order_Detail_Discount,
ExtendedPrice,
Order_Number,
Order_Detail_Number,
Order_Detail_Count,
binary_checksum(
ISNULL(DYN.dbo.SAW_B_LU_Customer.Customer_Dim_Key, 0),
ISNULL(DYN.dbo.SAW_B_LU_Product.Product_Dim_Key,0) ,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Order_Date,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Required_Date,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Shipped_Date,
DYN.dbo.TR_Fact_Order_Detail_V.Shipped_Flag,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Duration_Open _Days,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Required_Exce ed_Days,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Unit_P rice,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Quanti ty,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Discou nt,
DYN.dbo.TR_Fact_Order_Detail_V.ExtendedPrice,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Number,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Number ,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Count) ,
14,
'N',
getdate(),
'1/1/2049'
FROM DYN.dbo.TR_Fact_Order_Detail_V
LEFT OUTER JOIN DYN.dbo.SAW_B_LU_Customer ON
DYN.dbo.TR_Fact_Order_Detail_V.Customer_Code =
DYN.dbo.SAW_B_LU_Customer.Customer_Code
LEFT OUTER JOIN DYN.dbo.SAW_B_LU_Product ON
DYN.dbo.TR_Fact_Order_Detail_V.Product_Code =
DYN.dbo.SAW_B_LU_Product.Product_Code
WHERE NOT EXISTS
(Select *
From DYN.dbo.SAW_Fact_Order_Detail
Where Checksum_Field = binary_checksum(
ISNULL(DYN.dbo.SAW_B_LU_Customer.Customer_Dim_Key, 0),
ISNULL(DYN.dbo.SAW_B_LU_Product.Product_Dim_Key,0) ,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Order_Date,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Required_Date,
DYN.dbo.TR_Fact_Order_Detail_V.FK_Shipped_Date,
DYN.dbo.TR_Fact_Order_Detail_V.Shipped_Flag,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Duration_Open _Days,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Required_Exce ed_Days,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Unit_P rice,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Quanti ty,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Discou nt,
DYN.dbo.TR_Fact_Order_Detail_V.ExtendedPrice,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Number,
DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Number ,
AND DYN.dbo.TR_Fact_Order_Detail_V.Order_Number =
DYN.dbo.SAW_Fact_Order_Detail.Order_Number
AND DYN.dbo.TR_Fact_Order_Detail_V.Order_Detail_Number =
DYN.dbo.SAW_Fact_Order_Detail.Order_Detail_Number
AND Expired_Date = '1/1/2049')
EXPERIMENTS
So i focused on the NOT Exists clause. My thought was that when an
index is created on the fields in WHERE clause of the SELECT statement
in the NOT EXISTS part it would be quicker. Because SQL Server should
be quicker decisioning whether a record existed or not. So far theory.
So i started experimenting.

I No Index.
It took about 118 seconds.

II. I created an referencetable (took some time but not relevant here)
so the insert table and the comparetable were not the same anymore. The
fields in the reference table are Checksumfield, Dossier_Source_Code
(in query example Order_Number and Order_Detail_Number) and
expired_date. So the not exists clause rebuild to reference this newly
created table and now it took about 85 seconds.

III. The part i don't understand is this. So the prior step was a
perfomance gain so i decided to build indexes on the reference table. I
tried al types of different indexes:
* index on checksum (clustered), source_key (Non clustered, unique) and
on expired date (non clustered) --> 99 seconds
* Index on source_key (clustered, unique), checksum_field (non
clustered) and on expired date (non clustered) --> 91 seconds
* the Source key (in the example Order_Number and Order_Detail_Number)
was unique so i decided to build a checksum on these fields and build
an index on the checksum and of course on the Source_key and
expired_date. This took about 101 seconds. What?

So as you can see it took only longer when i build a index. So why? And
has someone any clues to make the query faster?
Thanx ,
Hennie

Dec 1 '05 #3
>tried al types of different indexes:
* index on checksum (clustered), source_key (Non clustered, unique) and
on expired date (non clustered) --> 99 seconds
* Index on source_key (clustered, unique), checksum_field (non
clustered) and on expired date (non clustered) --> 91 seconds
that is normal. accessing via a non-clustered index (thru a bookmark)
may be quite slow. Having all the columns in just one index might be
better.
* the Source key (in the example Order_Number and Order_Detail_Number)
was unique so i decided to build a checksum on these fields and build
an index on the checksum and of course on the Source_key and
expired_date. This took about 101 seconds. What?


are you speaking about one index containing all those columns?

Also note that if all the columns are not nullable, not exists()
queries may be rewritten as not in() ones.
Also try rewriting you not exists() query as an outer join.

try all 3 ways, they are equivalent if the columns you join on are not
nullable:

1.
select c.left_PK
from left_table c
where left_PK not in(select left_PK from right_table t
where c.left_PK=t.left_PK)

2.
select c.left_PK
from left_table c
where not exists(select 1 from right_table t
where c.left_PK=t.left_PK)

3.
select c.left_PK
from left_table c left outer join right_table t
on c.left_PK=t.left_PK
where t.left_PK is null

Dec 1 '05 #4
oops I meant

3.
select c.left_PK
from left_table c left outer join right_table t
on c.left_PK=t.left_PK
where t.RIGHT_PK is null

Dec 1 '05 #5
Hi Markus,

Thanx for your answer. Your answer says exactly what i wanted to do. I
wanted an insert table and a reference table so that the query would
function at its best: one table for inserting and one for referencing
and this would make it quicker (but it takes some to build this
reference table). The next step is building indexes on this referencing

I did use the index tuning wizard and its said that i should build an
index on the checksum field en the fields (ordernumber and order
detail) which is the unique identification of a record.

Hennie

Dec 6 '05 #6
put a single index on the main fact table on "Order_num" assuming this
is a somewhat unique piece of data.
Then, DELETE from your temp table everything that already exists in the
main fact table. This will allow the engine to do a single scan on the
temp table, efficiently look it up in the fact table using a FAST
index, and get rid of dups.

Then, just add what is left of the temp table to the main fact table.
Using this approach, you will never have to do a scan of the main fact
table, ESPECIALLY not a scan for EVERY record in the temp table.

Throwing indexes onto the fact table for non-unique items is generally
a bad thing, and slows things down.

Dec 12 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Danny | last post by:
sorry for posting a question that has probably been posted hundreds of times, but i can't seem to find the answer i need...We're using a Sql Server 7.0 database and recently started getting timeout...
2
by: HumanJHawkins | last post by:
Hi, I am using data from multiple databases and/or queries. It would greatly simplify and speed things up if I could use CONTAINS in processing the results. However, "CONTAINS" requires the data...
1
by: Thomas Bartkus | last post by:
If we have a date/time field and are doing frequent queries WHERE {date/time field} BETWEEN TimeA AND TimeB Does it make sense, query speed wise, to create an index on the date/time field? ...
2
by: Kent.Brooke | last post by:
Can someone set me straight I know indexing is a "try & see" art. However I am at a loss if it's better to use the INCLUDE switch on a unique index and tag on the columns be used to avoid a lookup...
5
by: ibiza | last post by:
Hi all, I am quite experimented with SQL Server, but not that much with full text indexing. After some successful attempts with english fields, I've decided to try it with Japanese characters. I...
7
by: Ryan | last post by:
I have a bit of a problem with regards an indexing strategy. Well, basically there is no indexing strategy on a set of data I have at work. Now, I didn't create the design as I would have allowed...
0
by: Owen Jenkins | last post by:
Hi, My application allows users to create a new back end for separate purposes. It does this by using Make Table Queries and Indexing via SQL. For example ... sqlString = "SELECT * INTO " &...
4
by: Owen Jenkins | last post by:
Hi, No-one replied to this when I sent it last week. Any bites out there today?? ----- My application allows users to create a new back end for separate purposes. It does this by using Make...
4
by: Amar | last post by:
Hi All, I need to select data from a database table containing huge amount of data. Now I am storing data using one primary key and I am just using simple select statement, and this process...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.