473,406 Members | 2,620 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Optimizing a big query

To start with, I'll give a simplified overview of my data.

BaseRecord (4mil rows, 25k in each Region)
ID | Name | Region | etc

OtherData (7.5mil rows, 1 or 2 per ID)
ID | Type(1/2) | Data

ProblemTable (4mil rows)
ID | ConcatenatedHistory

The concatenated history field is a nvarchar with up to 20 different
pipe delimited date/code combinations, eg. '01/01/2007X|11/28/2006Q|
11/12/2004Q|'

Using left outer joins (all from base, the rest optional) I've got a
view something like:

View (4mil rows)
ID | Name | Region | etc | Data | Data2 | ConcatenatedHistory

Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory
>From View
Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;

Or to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory
>From View
Where Region = 58;

But this takes over a minute:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory
>From View
Where Region = 58
and ConcatenatedHistory like '%test%' ;

What puzzles me most is that it's taking longer to return nothing.

I've tried normalizing this concatenated field into it's own table, or
into 20 and 40 denormalized fields. The denormalized fields were
nightmarishly long queries for a web interface at 5-6 minutes.

The normalized table should have roughly 25mil records, but cutting it
down to just the most relevant years let me play with it at 9.5mil
records. This shifted the results to where it took 35-40 seconds to do
ANY query against that table.

Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code
>From View inner join History on History.ID = View.ID
Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';

I also tried reducing this table down to a linking table between ID
and Code, and pushing the date off to another table, but that only
made things worse.

~~~
Going back to what worked best (the intial View), the Execution Plan
shows 93% on a Clustered Index Scan on the ID field's index in the
ConcatenatedHistory table for the problem query, but spreads out the
load fairly evenly among indexes on the successful query. I'm trying
to figure out a way to improve performance, and more importantly, make
"0 records found" responses be a bit more forthcoming.

If it's relevant, I'm on SQL Server 2005 Standard, and I've already
taken care of the memory, CPU and drive optimization.

Feb 9 '07 #1
5 1712
On Feb 8, 4:05 pm, "Merennulli" <mar...@sdf.lonestar.orgwrote:
To start with, I'll give a simplified overview of my data.

BaseRecord (4mil rows, 25k in each Region)
ID | Name | Region | etc

OtherData (7.5mil rows, 1 or 2 per ID)
ID | Type(1/2) | Data

ProblemTable (4mil rows)
ID | ConcatenatedHistory

The concatenated history field is a nvarchar with up to 20 different
pipe delimited date/code combinations, eg. '01/01/2007X|11/28/2006Q|
11/12/2004Q|'

Using left outer joins (all from base, the rest optional) I've got a
view something like:

View (4mil rows)
ID | Name | Region | etc | Data | Data2 | ConcatenatedHistory

Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View

Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;

Or to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View

Where Region = 58;

But this takes over a minute:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View

Where Region = 58
and ConcatenatedHistory like '%test%' ;

What puzzles me most is that it's taking longer to return nothing.

I've tried normalizing this concatenated field into it's own table, or
into 20 and 40 denormalized fields. The denormalized fields were
nightmarishly long queries for a web interface at 5-6 minutes.

The normalized table should have roughly 25mil records, but cutting it
down to just the most relevant years let me play with it at 9.5mil
records. This shifted the results to where it took 35-40 seconds to do
ANY query against that table.

Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code>From View inner join History on History.ID = View.ID

Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';

I also tried reducing this table down to a linking table between ID
and Code, and pushing the date off to another table, but that only
made things worse.

~~~
Going back to what worked best (the intial View), the Execution Plan
shows 93% on a Clustered Index Scan on the ID field's index in the
ConcatenatedHistory table for the problem query, but spreads out the
load fairly evenly among indexes on the successful query. I'm trying
to figure out a way to improve performance, and more importantly, make
"0 records found" responses be a bit more forthcoming.

If it's relevant, I'm on SQL Server 2005 Standard, and I've already
taken care of the memory, CPU and drive optimization.
do you query the concatenated data, "normalized", in a way that you
could use FULL TEXT ?

Feb 9 '07 #2
I expect that the normalized approach can perform much better than parsing
ConcatenatedHistory using LIKE as long as you have the proper indexes in
place and tune your queries. However it's difficult to make recommendations
without the actual DDL of your existing objects.
Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code
>>From View inner join History on History.ID = View.ID
Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';
You might be able to reformulate this query as something like the example
below:

SELECT View.ID, View.Name, View.Region, View.etc, View.Data, View.Data2,
History.Date, History.Code
FROM View
INNER JOIN History ON
History.ID = View.ID
WHERE
View.Region = 58 AND
History.Date = '11/28/2006'
UNION
SELECT View.ID, View.Name, View.Region, View.etc, View.Data, View.Data2,
History.Date, History.Code
FROM View
INNER JOIN History ON
History.ID = View.ID
WHERE
View.Region = 58 AND
History.Code = '2007X';
--
Hope this helps.

Dan Guzman
SQL Server MVP

"Merennulli" <ma****@sdf.lonestar.orgwrote in message
news:11**********************@s48g2000cws.googlegr oups.com...
To start with, I'll give a simplified overview of my data.

BaseRecord (4mil rows, 25k in each Region)
ID | Name | Region | etc

OtherData (7.5mil rows, 1 or 2 per ID)
ID | Type(1/2) | Data

ProblemTable (4mil rows)
ID | ConcatenatedHistory

The concatenated history field is a nvarchar with up to 20 different
pipe delimited date/code combinations, eg. '01/01/2007X|11/28/2006Q|
11/12/2004Q|'

Using left outer joins (all from base, the rest optional) I've got a
view something like:

View (4mil rows)
ID | Name | Region | etc | Data | Data2 | ConcatenatedHistory

Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory
>>From View
Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;

Or to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory
>>From View
Where Region = 58;

But this takes over a minute:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory
>>From View
Where Region = 58
and ConcatenatedHistory like '%test%' ;

What puzzles me most is that it's taking longer to return nothing.

I've tried normalizing this concatenated field into it's own table, or
into 20 and 40 denormalized fields. The denormalized fields were
nightmarishly long queries for a web interface at 5-6 minutes.

The normalized table should have roughly 25mil records, but cutting it
down to just the most relevant years let me play with it at 9.5mil
records. This shifted the results to where it took 35-40 seconds to do
ANY query against that table.

Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code
>>From View inner join History on History.ID = View.ID
Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';

I also tried reducing this table down to a linking table between ID
and Code, and pushing the date off to another table, but that only
made things worse.

~~~
Going back to what worked best (the intial View), the Execution Plan
shows 93% on a Clustered Index Scan on the ID field's index in the
ConcatenatedHistory table for the problem query, but spreads out the
load fairly evenly among indexes on the successful query. I'm trying
to figure out a way to improve performance, and more importantly, make
"0 records found" responses be a bit more forthcoming.

If it's relevant, I'm on SQL Server 2005 Standard, and I've already
taken care of the memory, CPU and drive optimization.
Feb 9 '07 #3
On Feb 8, 7:27 pm, "Steve" <morrisz...@hotmail.comwrote:
On Feb 8, 4:05 pm, "Merennulli" <mar...@sdf.lonestar.orgwrote:


To start with, I'll give a simplified overview of my data.
BaseRecord (4mil rows, 25k in each Region)
ID | Name | Region | etc
OtherData (7.5mil rows, 1 or 2 per ID)
ID | Type(1/2) | Data
ProblemTable (4mil rows)
ID | ConcatenatedHistory
The concatenated history field is a nvarchar with up to 20 different
pipe delimited date/code combinations, eg. '01/01/2007X|11/28/2006Q|
11/12/2004Q|'
Using left outer joins (all from base, the rest optional) I've got a
view something like:
View (4mil rows)
ID | Name | Region | etc | Data | Data2 | ConcatenatedHistory
Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View
Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;
Or to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View
Where Region = 58;
But this takes over a minute:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View
Where Region = 58
and ConcatenatedHistory like '%test%' ;
What puzzles me most is that it's taking longer to return nothing.
I've tried normalizing this concatenated field into it's own table, or
into 20 and 40 denormalized fields. The denormalized fields were
nightmarishly long queries for a web interface at 5-6 minutes.
The normalized table should have roughly 25mil records, but cutting it
down to just the most relevant years let me play with it at 9.5mil
records. This shifted the results to where it took 35-40 seconds to do
ANY query against that table.
Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code>From View inner join History on History.ID = View.ID
Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';
I also tried reducing this table down to a linking table between ID
and Code, and pushing the date off to another table, but that only
made things worse.
~~~
Going back to what worked best (the intial View), the Execution Plan
shows 93% on a Clustered Index Scan on the ID field's index in the
ConcatenatedHistory table for the problem query, but spreads out the
load fairly evenly among indexes on the successful query. I'm trying
to figure out a way to improve performance, and more importantly, make
"0 records found" responses be a bit more forthcoming.
If it's relevant, I'm on SQL Server 2005 Standard, and I've already
taken care of the memory, CPU and drive optimization.

do you query the concatenated data, "normalized", in a way that you
could use FULL TEXT ?- Hide quoted text -

- Show quoted text -
Steve,

The "normalized" data broke the code and date into a datetime field
and a 5 character char (eg. '2006X'). With that change, I was doing
exact matching instead of "like" comparisons. As far as my knowledge
extends, that should have more than compensated for the table size
difference (4mil vs 25mil and later vs 9mil). My first assumption was
that the indexes were wrong, but a non-clustered index on the ID and
Code fields should have been correct for this. I can try a clustered
on the pair, but I don't see where that would improve performance.
Feb 9 '07 #4
Merennulli (ma****@sdf.lonestar.org) writes:
Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory
>>From View
Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;
I completely agree with Dan that normalising ConcatenatedHistory into
its own table, would give you better performance. But not know the
tables or indexes its difficult to say why your attempt failed.

The one thing I can suggest to improve the speed of the current
query is that you add a COLLATE clause to force a binary collation:
ConcatenatedHistory LIKE '%2007%' COLLATE Latin1_General_BIN

this is particular important if you use a Windows collation or your
column is varchar.
Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code
From View inner join History on History.ID = View.ID
Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';
I don't really know what this code is. Isn't that just a date or
rather a period? And is that really the WHERE clause? Or should it
be:

Where View.Region = 58
(and History.Date = '11/28/2006' or History.Code = '2007X')

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Feb 10 '07 #5
On Feb 10, 10:55 am, Erland Sommarskog <esq...@sommarskog.sewrote:
Merennulli (mar...@sdf.lonestar.org) writes:
Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory
>From View
Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;

I completely agree with Dan that normalising ConcatenatedHistory into
its own table, would give you better performance. But not know the
tables or indexes its difficult to say why your attempt failed.
Sorry, it looks like my last message in response to Dan didn't go
through.
The problem was indeed my index. It failed because I had the ID first
and the date second.
Flipping the field order in the index brought my time down to about
5-8 seconds.

My thought had been that the query would use start with the other side
of the join - the view, narrow it down first and tie to the index
values, then find the date out of the remaining small section of the
index. Instead it seems it started with the opposite side of the join
from what I expected. Because of that it was within an average of 10
places of being enforced as completely random from the date field's
perspective.

Thanks for pointing me in the right direction.

Feb 12 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: A Future Computer Scientist | last post by:
A question: Is it really important to think about optimizing the native code or optimizing it for P Code? Or does the code you write make a difference?
6
by: Uros | last post by:
Hello! I have some trouble getting good results from my query. here is structure stat_views id | integer id_zone | integer created | timestamp
6
by: Ryan | last post by:
I came across a situation that I've been unable to explain and was hoping somebody had an answer: I had written an update query which was taking about 8 seconds to run and considered it too...
2
by: Brian | last post by:
In particular, this question goes out to the Microsoft C++ development team. Back in June, Ronald Laeremans posted the following message. Has the optimizing compiler been included with the...
4
by: Got2Go | last post by:
Hello Group, I have a table that has millions of records in it. About 100 records are added every 5 minutes (one per OIDID) (the sample provided below has data for 2 OIDIDs (99 and 100) And I...
2
by: Nidhi | last post by:
Hello, In my stored procedure i m using same view 5 times. Will this calculate and retrieve from views 5 times ??? Is there any way we can optimise my stored procedure Regards Nidhi
1
by: xpcer | last post by:
hi, friends, i have an problem, like this, i have tables, when i want to use "select" statement that include "join" sintaxt, my query will execute about 1 hour, so long. can u tell me how to...
0
by: rashmigaikwad | last post by:
Hi All, I need help in optimizing the query mentioned below: SELECT SUM(CASE WHEN PROD_TYP='HBRMC' AND INC_MULTIPLE < 2.50 AND LTV <= 0.75 OR LTV IS NULL THEN ADV_IN_QTR ELSE 0 END) ...
5
by: John Rivers | last post by:
Hello has anybody else noticed I have queries that SQL 2000 optimizes correctly and they run very fast Yet SQL 2005 keeps using a dumb query plan and queries run very slow The problem...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.