Optimizing a big query - Microsoft SQL Server

Merennulli

>From View

Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;

Or to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory

>From View

Where Region = 58;

But this takes over a minute:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory

>From View

Where Region = 58
and ConcatenatedHistory like '%test%' ;

What puzzles me most is that it's taking longer to return nothing.

I've tried normalizing this concatenated field into it's own table, or
into 20 and 40 denormalized fields. The denormalized fields were
nightmarishly long queries for a web interface at 5-6 minutes.

The normalized table should have roughly 25mil records, but cutting it
down to just the most relevant years let me play with it at 9.5mil
records. This shifted the results to where it took 35-40 seconds to do
ANY query against that table.

Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code

>From View inner join History on History.ID = View.ID

Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';

I also tried reducing this table down to a linking table between ID
and Code, and pushing the date off to another table, but that only
made things worse.

~~~
Going back to what worked best (the intial View), the Execution Plan
shows 93% on a Clustered Index Scan on the ID field's index in the
ConcatenatedHistory table for the problem query, but spreads out the
load fairly evenly among indexes on the successful query. I'm trying
to figure out a way to improve performance, and more importantly, make
"0 records found" responses be a bit more forthcoming.

If it's relevant, I'm on SQL Server 2005 Standard, and I've already
taken care of the memory, CPU and drive optimization.

Feb 9 '07 #1

Subscribe Post Reply

1712

Steve

On Feb 8, 4:05 pm, "Merennulli" <mar...@sdf.lonestar.orgwrote:

To start with, I'll give a simplified overview of my data.

BaseRecord (4mil rows, 25k in each Region)
ID | Name | Region | etc

OtherData (7.5mil rows, 1 or 2 per ID)
ID | Type(1/2) | Data

ProblemTable (4mil rows)
ID | ConcatenatedHistory

The concatenated history field is a nvarchar with up to 20 different
pipe delimited date/code combinations, eg. '01/01/2007X|11/28/2006Q|
11/12/2004Q|'

Using left outer joins (all from base, the rest optional) I've got a
view something like:

View (4mil rows)
ID | Name | Region | etc | Data | Data2 | ConcatenatedHistory

Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View

Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;

Or to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View

Where Region = 58;

But this takes over a minute:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View

Where Region = 58
and ConcatenatedHistory like '%test%' ;

What puzzles me most is that it's taking longer to return nothing.

I've tried normalizing this concatenated field into it's own table, or
into 20 and 40 denormalized fields. The denormalized fields were
nightmarishly long queries for a web interface at 5-6 minutes.

The normalized table should have roughly 25mil records, but cutting it
down to just the most relevant years let me play with it at 9.5mil
records. This shifted the results to where it took 35-40 seconds to do
ANY query against that table.

Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code>From View inner join History on History.ID = View.ID

Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';

I also tried reducing this table down to a linking table between ID
and Code, and pushing the date off to another table, but that only
made things worse.

~~~
Going back to what worked best (the intial View), the Execution Plan
shows 93% on a Clustered Index Scan on the ID field's index in the
ConcatenatedHistory table for the problem query, but spreads out the
load fairly evenly among indexes on the successful query. I'm trying
to figure out a way to improve performance, and more importantly, make
"0 records found" responses be a bit more forthcoming.

If it's relevant, I'm on SQL Server 2005 Standard, and I've already
taken care of the memory, CPU and drive optimization.

do you query the concatenated data, "normalized", in a way that you
could use FULL TEXT ?

Feb 9 '07 #2

Dan Guzman

I expect that the normalized approach can perform much better than parsing
ConcatenatedHistory using LIKE as long as you have the proper indexes in
place and tune your queries. However it's difficult to make recommendations
without the actual DDL of your existing objects.

Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code
>>From View inner join History on History.ID = View.ID
Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';

You might be able to reformulate this query as something like the example
below:

SELECT View.ID, View.Name, View.Region, View.etc, View.Data, View.Data2,
History.Date, History.Code
FROM View
INNER JOIN History ON
History.ID = View.ID
WHERE
View.Region = 58 AND
History.Date = '11/28/2006'
UNION
SELECT View.ID, View.Name, View.Region, View.etc, View.Data, View.Data2,
History.Date, History.Code
FROM View
INNER JOIN History ON
History.ID = View.ID
WHERE
View.Region = 58 AND
History.Code = '2007X';
--
Hope this helps.

Dan Guzman
SQL Server MVP

"Merennulli" <ma****@sdf.lonestar.orgwrote in message
news:11**********************@s48g2000cws.googlegr oups.com...

To start with, I'll give a simplified overview of my data.

BaseRecord (4mil rows, 25k in each Region)
ID | Name | Region | etc

OtherData (7.5mil rows, 1 or 2 per ID)
ID | Type(1/2) | Data

ProblemTable (4mil rows)
ID | ConcatenatedHistory

The concatenated history field is a nvarchar with up to 20 different
pipe delimited date/code combinations, eg. '01/01/2007X|11/28/2006Q|
11/12/2004Q|'

Using left outer joins (all from base, the rest optional) I've got a
view something like:

View (4mil rows)
ID | Name | Region | etc | Data | Data2 | ConcatenatedHistory

Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory
>>From View
Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;

Or to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory
>>From View
Where Region = 58;

But this takes over a minute:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory
>>From View
Where Region = 58
and ConcatenatedHistory like '%test%' ;

What puzzles me most is that it's taking longer to return nothing.

I've tried normalizing this concatenated field into it's own table, or
into 20 and 40 denormalized fields. The denormalized fields were
nightmarishly long queries for a web interface at 5-6 minutes.

The normalized table should have roughly 25mil records, but cutting it
down to just the most relevant years let me play with it at 9.5mil
records. This shifted the results to where it took 35-40 seconds to do
ANY query against that table.

Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code
>>From View inner join History on History.ID = View.ID
Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';

I also tried reducing this table down to a linking table between ID
and Code, and pushing the date off to another table, but that only
made things worse.

~~~
Going back to what worked best (the intial View), the Execution Plan
shows 93% on a Clustered Index Scan on the ID field's index in the
ConcatenatedHistory table for the problem query, but spreads out the
load fairly evenly among indexes on the successful query. I'm trying
to figure out a way to improve performance, and more importantly, make
"0 records found" responses be a bit more forthcoming.

If it's relevant, I'm on SQL Server 2005 Standard, and I've already
taken care of the memory, CPU and drive optimization.

Feb 9 '07 #3

Merennulli

On Feb 8, 7:27 pm, "Steve" <morrisz...@hotmail.comwrote:

On Feb 8, 4:05 pm, "Merennulli" <mar...@sdf.lonestar.orgwrote:

To start with, I'll give a simplified overview of my data.

BaseRecord (4mil rows, 25k in each Region)
ID | Name | Region | etc

OtherData (7.5mil rows, 1 or 2 per ID)
ID | Type(1/2) | Data

ProblemTable (4mil rows)
ID | ConcatenatedHistory

The concatenated history field is a nvarchar with up to 20 different
pipe delimited date/code combinations, eg. '01/01/2007X|11/28/2006Q|
11/12/2004Q|'

Using left outer joins (all from base, the rest optional) I've got a
view something like:

View (4mil rows)
ID | Name | Region | etc | Data | Data2 | ConcatenatedHistory

Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View

Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;

Or to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View

Where Region = 58;

But this takes over a minute:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View

Where Region = 58
and ConcatenatedHistory like '%test%' ;

What puzzles me most is that it's taking longer to return nothing.

I've tried normalizing this concatenated field into it's own table, or
into 20 and 40 denormalized fields. The denormalized fields were
nightmarishly long queries for a web interface at 5-6 minutes.

The normalized table should have roughly 25mil records, but cutting it
down to just the most relevant years let me play with it at 9.5mil
records. This shifted the results to where it took 35-40 seconds to do
ANY query against that table.

Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code>From View inner join History on History.ID = View.ID

Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';

I also tried reducing this table down to a linking table between ID
and Code, and pushing the date off to another table, but that only
made things worse.

~~~
Going back to what worked best (the intial View), the Execution Plan
shows 93% on a Clustered Index Scan on the ID field's index in the
ConcatenatedHistory table for the problem query, but spreads out the
load fairly evenly among indexes on the successful query. I'm trying
to figure out a way to improve performance, and more importantly, make
"0 records found" responses be a bit more forthcoming.

If it's relevant, I'm on SQL Server 2005 Standard, and I've already
taken care of the memory, CPU and drive optimization.

do you query the concatenated data, "normalized", in a way that you
could use FULL TEXT ?- Hide quoted text -

- Show quoted text -

Steve,

The "normalized" data broke the code and date into a datetime field
and a 5 character char (eg. '2006X'). With that change, I was doing
exact matching instead of "like" comparisons. As far as my knowledge
extends, that should have more than compensated for the table size
difference (4mil vs 25mil and later vs 9mil). My first assumption was
that the indexes were wrong, but a non-clustered index on the ID and
Code fields should have been correct for this. I can try a clustered
on the pair, but I don't see where that would improve performance.

Feb 9 '07 #4

Erland Sommarskog

Merennulli (ma****@sdf.lonestar.org) writes:

Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory
>>From View
Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;

I completely agree with Dan that normalising ConcatenatedHistory into
its own table, would give you better performance. But not know the
tables or indexes its difficult to say why your attempt failed.

The one thing I can suggest to improve the speed of the current
query is that you add a COLLATE clause to force a binary collation:
ConcatenatedHistory LIKE '%2007%' COLLATE Latin1_General_BIN

this is particular important if you use a Windows collation or your
column is varchar.

Select View.ID, View.Name, View.Region, View.etc, View.Data,
View.Data2, History.Date, History.Code
From View inner join History on History.ID = View.ID
Where View.Region = 58
and History.Date = '11/28/2006' or History.Code = '2007X';

I don't really know what this code is. Isn't that just a date or
rather a period? And is that really the WHERE clause? Or should it
be:

Where View.Region = 58
(and History.Date = '11/28/2006' or History.Code = '2007X')

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Feb 10 '07 #5

Merennulli

On Feb 10, 10:55 am, Erland Sommarskog <esq...@sommarskog.sewrote:

Merennulli (mar...@sdf.lonestar.org) writes:
Querying it, it takes about 15-20 seconds to do this:
Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory
>From View
Where Region = 58
and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
like '%2007X%' ;

I completely agree with Dan that normalising ConcatenatedHistory into
its own table, would give you better performance. But not know the
tables or indexes its difficult to say why your attempt failed.

Sorry, it looks like my last message in response to Dan didn't go
through.
The problem was indeed my index. It failed because I had the ID first
and the date second.
Flipping the field order in the index brought my time down to about
5-8 seconds.

My thought had been that the query would use start with the other side
of the join - the view, narrow it down first and tie to the index
values, then find the date out of the remaining small section of the
index. Instead it seems it started with the opposite side of the join
from what I expected. Because of that it was within an average of 10
places of being enforced as completely random from the date field's
perspective.

Thanks for pointing me in the right direction.

Feb 12 '07 #6

Similar topics

Optimizing code

by: A Future Computer Scientist | last post by:

A question: Is it really important to think about optimizing the native code or optimizing it for P Code? Or does the code you write make a difference?

Visual Basic 4 / 5 / 6

Optimizing query

by: Uros | last post by:

Hello! I have some trouble getting good results from my query. here is structure stat_views id | integer id_zone | integer created | timestamp

PostgreSQL Database

Strange way of optimizing a query

by: Ryan | last post by:

I came across a situation that I've been unable to explain and was hoping somebody had an answer: I had written an update query which was taking about 8 seconds to run and considered it too...

Microsoft Access / VBA

VC++ Standard ed. (optimizing compiler revisited)

by: Brian | last post by:

In particular, this question goes out to the Microsoft C++ development team. Back in June, Ronald Laeremans posted the following message. Has the optimizing compiler been included with the...

.NET Framework

Optimizing Slow Queries on millions of records

by: Got2Go | last post by:

Hello Group, I have a table that has millions of records in it. About 100 records are added every 5 minutes (one per OIDID) (the sample provided below has data for 2 OIDIDs (99 and 100) And I...

MySQL Database

Optimizing Views

by: Nidhi | last post by:

Hello, In my stored procedure i m using same view 5 times. Will this calculate and retrieve from views 5 times ??? Is there any way we can optimise my stored procedure Regards Nidhi

C# / C Sharp

Optimizing Query

by: xpcer | last post by:

hi, friends, i have an problem, like this, i have tables, when i want to use "select" statement that include "join" sintaxt, my query will execute about 1 hour, so long. can u tell me how to...

MySQL Database

Optimizing the Query

by: rashmigaikwad | last post by:

Hi All, I need help in optimizing the query mentioned below: SELECT SUM(CASE WHEN PROD_TYP='HBRMC' AND INC_MULTIPLE < 2.50 AND LTV <= 0.75 OR LTV IS NULL THEN ADV_IN_QTR ELSE 0 END) ...

Oracle Database

SQL 2005 Optimizer not optimizing

by: John Rivers | last post by:

Hello has anybody else noticed I have queries that SQL 2000 optimizes correctly and they run very fast Yet SQL 2005 keeps using a dumb query plan and queries run very slow The problem...

Microsoft SQL Server

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server