INNER JOIN/Index Threshold? - Microsoft SQL Server

Dave

Guys I am really stuck on this one. Any help or suggestions would be
appreciated.
We have a large table which seemed to just hit some kind of threshold.
They query is somewhat responsive when there are NO indexes on the
table. However, when we index email the query takes forever.

FACTS
- The problem is very "data specific". I can not recreate the
problem using different data.
- There is only a problem when I index email on the base table.
- The problem goes away when I add "AND b.email IS NOT NULL" to the
inner join condition. It does not help when I add the logic to the
"WHERE" clause.
DDL
CREATE TABLE base (bk char(25), email varchar(100))
create clustered index icx on base(bk)
create index ix_email on base(email)
CREATE TABLE filter (bk char(25), email varchar(100))
create clustered index icx on filter (bk)
create index ix_email on filter (email)

Query
SELECT b.bk, b.email
FROM base b WITH(NOLOCK)
INNER JOIN filter f ON f.email = b.email
--and f.email is not null

Data Profile
--35120500, 35120491, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
--16796199, 16796192, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
WHERE email IS NOT NULL

--250552, 250552, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
--250208, 250208, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
WHERE email IS NOT NULL

Jan 4 '07 #1

Subscribe Reply

5036

Tracy McKibben

Dave wrote:

Guys I am really stuck on this one. Any help or suggestions would be
appreciated.
We have a large table which seemed to just hit some kind of threshold.
They query is somewhat responsive when there are NO indexes on the
table. However, when we index email the query takes forever.

FACTS
- The problem is very "data specific". I can not recreate the
problem using different data.
- There is only a problem when I index email on the base table.
- The problem goes away when I add "AND b.email IS NOT NULL" to the
inner join condition. It does not help when I add the logic to the
"WHERE" clause.
DDL
CREATE TABLE base (bk char(25), email varchar(100))
create clustered index icx on base(bk)
create index ix_email on base(email)
CREATE TABLE filter (bk char(25), email varchar(100))
create clustered index icx on filter (bk)
create index ix_email on filter (email)

Query
SELECT b.bk, b.email
FROM base b WITH(NOLOCK)
INNER JOIN filter f ON f.email = b.email
--and f.email is not null

Data Profile
--35120500, 35120491, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
--16796199, 16796192, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
WHERE email IS NOT NULL

--250552, 250552, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
--250208, 250208, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
WHERE email IS NOT NULL

What does the estimated execution plan look like? Try adding the column
BK as a second key in your email index.
--
Tracy McKibben
MCDBA
http://www.realsqlguy.com

Jan 4 '07 #2

Dave

bk is clustered so there is no reason to add it to the other index.

Estimated plan does an index scan on each table, then a merge join.

When I remove the index on base.email the plan does a table scan on
base then performs a Hash Match
Tracy McKibben wrote:

Dave wrote:
Guys I am really stuck on this one. Any help or suggestions would be
appreciated.
We have a large table which seemed to just hit some kind of threshold.
They query is somewhat responsive when there are NO indexes on the
table. However, when we index email the query takes forever.

FACTS
- The problem is very "data specific". I can not recreate the
problem using different data.
- There is only a problem when I index email on the base table.
- The problem goes away when I add "AND b.email IS NOT NULL" to the
inner join condition. It does not help when I add the logic to the
"WHERE" clause.
DDL
CREATE TABLE base (bk char(25), email varchar(100))
create clustered index icx on base(bk)
create index ix_email on base(email)
CREATE TABLE filter (bk char(25), email varchar(100))
create clustered index icx on filter (bk)
create index ix_email on filter (email)

Query
SELECT b.bk, b.email
FROM base b WITH(NOLOCK)
INNER JOIN filter f ON f.email = b.email
--and f.email is not null

Data Profile
--35120500, 35120491, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
--16796199, 16796192, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
WHERE email IS NOT NULL

--250552, 250552, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
--250208, 250208, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
WHERE email IS NOT NULL

What does the estimated execution plan look like? Try adding the column
BK as a second key in your email index.
--
Tracy McKibben
MCDBA
http://www.realsqlguy.com

Jan 4 '07 #3

AlterEgo

Dave,

Do you have a lot of null emails on both tables? If so, that might be an
influencing factor.

-- Bill

"Dave" <da******@gmail .comwrote in message
news:11******** **************@ s80g2000cwa.goo glegroups.com.. .

Guys I am really stuck on this one. Any help or suggestions would be
appreciated.
We have a large table which seemed to just hit some kind of threshold.
They query is somewhat responsive when there are NO indexes on the
table. However, when we index email the query takes forever.

FACTS
- The problem is very "data specific". I can not recreate the
problem using different data.
- There is only a problem when I index email on the base table.
- The problem goes away when I add "AND b.email IS NOT NULL" to the
inner join condition. It does not help when I add the logic to the
"WHERE" clause.
DDL
CREATE TABLE base (bk char(25), email varchar(100))
create clustered index icx on base(bk)
create index ix_email on base(email)
CREATE TABLE filter (bk char(25), email varchar(100))
create clustered index icx on filter (bk)
create index ix_email on filter (email)

Query
SELECT b.bk, b.email
FROM base b WITH(NOLOCK)
INNER JOIN filter f ON f.email = b.email
--and f.email is not null

Data Profile
--35120500, 35120491, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
--16796199, 16796192, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
WHERE email IS NOT NULL

--250552, 250552, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
--250208, 250208, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
WHERE email IS NOT NULL

Jan 4 '07 #4

Erland Sommarskog

Dave (da******@gmail .com) writes:

bk is clustered so there is no reason to add it to the other index.

Estimated plan does an index scan on each table, then a merge join.

When I remove the index on base.email the plan does a table scan on
base then performs a Hash Match

And the merge-join plan takes forever? That's indeed very funny, because
that's sounds like a quite decent plan. Your tables looks funny, because
they have no keys, but that would matter.

What happens if you try:

SELECT b.hk, b.email
FROM base b
WHERE EXISTS (SELECT *
FROM filter
WHERE f.email = b.email)
Which version of SQL Server?

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Jan 4 '07 #5

Dave

Your correlated subquery works.
It added a stream aggregate step to the execution plan right before the
merge join.

This is really weird. Could it be an SQL Server Bug? Like I said
earlier, it is very data specific. I can add 3 million random records
to the filter table and it normally works fine. It is only this
specific dataset that is causing problems.
Erland Sommarskog wrote:

Dave (da******@gmail .com) writes:
bk is clustered so there is no reason to add it to the other index.

Estimated plan does an index scan on each table, then a merge join.

When I remove the index on base.email the plan does a table scan on
base then performs a Hash Match

And the merge-join plan takes forever? That's indeed very funny, because
that's sounds like a quite decent plan. Your tables looks funny, because
they have no keys, but that would matter.

What happens if you try:

SELECT b.hk, b.email
FROM base b
WHERE EXISTS (SELECT *
FROM filter
WHERE f.email = b.email)
Which version of SQL Server?

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Jan 4 '07 #6

Gert-Jan Strik

Hi Dave,

I tried to create a situation similar to yours on SQL Server 2000. What
I noticed is:
- Without the extra predicate, the optimizer does not seem smart enough
to chose SEEK:([b].[email] IsNotNull) instead of a full Index Scan
- You are correct that the change in access method is not achieved if
the predicate is added to the WHERE clause. It seems it must be part of
the join ON clause
- If either "b.email IS NOT NULL" or "f.email IS NOT NULL" is added, the
join type remains MERGE JOIN. If both are added to the ON clause I get a
HASH JOIN
- I am not sure if the MERGE JOIN is faster or if the HASH JOIN is, but
in either case you may want to add the two "unnecessar y" predicates

There is a thresshold somewhere (depending on your specific situation)
where merge-joining or hashing the two partially scanned indexes is
faster. Up to that point, the fastest plan will be to partially scan the
index of filter and loop join and seek each matching row in base. In all
cases it will be faster if NULL rows are excluded from the index seeks.

HTH,
Gert-Jan
Dave wrote:

>
Guys I am really stuck on this one. Any help or suggestions would be
appreciated.

We have a large table which seemed to just hit some kind of threshold.
They query is somewhat responsive when there are NO indexes on the
table. However, when we index email the query takes forever.

FACTS
- The problem is very "data specific". I can not recreate the
problem using different data.
- There is only a problem when I index email on the base table.
- The problem goes away when I add "AND b.email IS NOT NULL" to the
inner join condition. It does not help when I add the logic to the
"WHERE" clause.

DDL
CREATE TABLE base (bk char(25), email varchar(100))
create clustered index icx on base(bk)
create index ix_email on base(email)

CREATE TABLE filter (bk char(25), email varchar(100))
create clustered index icx on filter (bk)
create index ix_email on filter (email)

Query
SELECT b.bk, b.email
FROM base b WITH(NOLOCK)
INNER JOIN filter f ON f.email = b.email
--and f.email is not null

Data Profile
--35120500, 35120491, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base

--16796199, 16796192, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
WHERE email IS NOT NULL

--250552, 250552, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter

--250208, 250208, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
WHERE email IS NOT NULL

Jan 5 '07 #7

Erland Sommarskog

Dave (da******@gmail .com) writes:

Your correlated subquery works.
It added a stream aggregate step to the execution plan right before the
merge join.

This is really weird. Could it be an SQL Server Bug? Like I said
earlier, it is very data specific. I can add 3 million random records
to the filter table and it normally works fine. It is only this
specific dataset that is causing problems.

Note that the two queries are not semantically equivalent. If a certain
email address appears four times in the filter and four times in base, your
original query will return 16 rows, while mine will return four.

With the numbers you posted there are only four duplicates in the filter.
Provided, that is, you are running with ANSI_NULLS ON. If this is in a
stored procedure saved from Enterprise Manager, you are likely to be running
with ANSI_NULLS OFF, in which case NULL = NULL, and you will get an
enormous number of duplicates. Which are filtered away if you add the
IS NOT NULL filter.
--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Jan 5 '07 #8

Dave

We are using SQL 2K Enterprise SP4.

Erland that was an excellent post! However it was not my problem. The
code is run from .NET application and I am testing using SQL Management
Studio.

I added this to the test script and it did not help.
SET ANSI_NULLS ON
I think there are two things going on here.
1. SQL Server is not picking the best plan
2. There is an issue with my specific data set that I can not reproduce
with any other dataset except for "some" subsets or supersets.
Thanks for the help guys!!!

Erland Sommarskog wrote:

Dave (da******@gmail .com) writes:
Your correlated subquery works.
It added a stream aggregate step to the execution plan right before the
merge join.

This is really weird. Could it be an SQL Server Bug? Like I said
earlier, it is very data specific. I can add 3 million random records
to the filter table and it normally works fine. It is only this
specific dataset that is causing problems.

Note that the two queries are not semantically equivalent. If a certain
email address appears four times in the filter and four times in base, your
original query will return 16 rows, while mine will return four.

With the numbers you posted there are only four duplicates in the filter.
Provided, that is, you are running with ANSI_NULLS ON. If this is in a
stored procedure saved from Enterprise Manager, you are likely to be running
with ANSI_NULLS OFF, in which case NULL = NULL, and you will get an
enormous number of duplicates. Which are filtered away if you add the
IS NOT NULL filter.
--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Jan 5 '07 #9

Erland Sommarskog

Dave (da******@gmail .com) writes:

Erland that was an excellent post! However it was not my problem. The
code is run from .NET application and I am testing using SQL Management
Studio.

I added this to the test script and it did not help.
SET ANSI_NULLS ON

As long it's a loose script, ANSI_BULLS would be on by default. I will
have to admit that I was clutching a straws. Without access to the
database seeing it, it is very difficult to analyse the problem accurately.
It's not alwyas that easy even if you have full acecss.

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Jan 5 '07 #10

Similar topics

6416

Need help on inner joins.

by: Prem | last post by:

Hi, I am having many problems with inner join. my first problem is : 1) I want to know the precedance while evaluating query with multiple joins. eg. select Employees.FirstName, Employees.LastName, TerritoryID, Employees.EmployeeID, RegionID, ProductID from Employees

Microsoft SQL Server

8106

INNER JOIN using surrogate ID, or [Date] BETWEEN?

by: DCM Fan | last post by:

{CREATE TABLEs and INSERTs follow...} Gents, I have a main table that is in ONE-MANY with many other tables. For example, if the main table is named A, there are these realtionships: A-->B A-->C A-->D

Microsoft SQL Server

2911

Inner Join experts out there??

by: news-east.earthlink.net | last post by:

The scenario: two tables CustomerTable --------------- CustomerID OrderID CustomerName CustomerEmail

Microsoft SQL Server

9033

left outer join terrible slow compared to inner join

by: Thomas Beutin | last post by:

Hi, i've a speed problem withe the following statement: SELECT DISTINCT pz.l1_id, pz.l2_id, pz.l3_id, pz.l4_id FROM ot_adresse AS a, ot_produkt AS p LEFT OUTER JOIN ot_kat_prod AS pz ON ( p.p_id = pz.p_id ) WHERE p.a_id = a.id AND a.id = '105391105424941' AND a.m_id = '37'; This is terrible slow compared to the inner join: SELECT DISTINCT pz.l1_id, pz.l2_id, pz.l3_id, pz.l4_id

PostgreSQL Database

21437

exception join vs. inner join with <> (is it the same?)

by: PASQUALE | last post by:

Hi I have a question: do the both statements below give the same result? If yes then does somebody know something about preformance differencies using these joins? SELECT A.* FROM Table1 A INNER JOIN Table2 B on A.Field1 <> B.Field1

DB2 Database

1623

help with converting query to use inner joins

by: gkellymail | last post by:

the following query works fine: select link.idx, link.x_table, link.x_id_a, link.x_id_z, a.strandid, b.strandid from link_detail, link, strand A, strand B where link_detail.x_table = 'enclosure' and link_detail.x_id = 3 and link.idx = link_detail.linkidx and A.strandid = link.x_id_a and B.strandid = link.x_id_z would someone please convert this to a more efficient query using inner

Microsoft Access / VBA

18886

Removing HASH Match / Inner Join

by: moldster | last post by:

Hi, I'm at my wits end! I have two large tables one with 1.2mill one with 2.3 mill and they are very wide tables. I have a select with an inner join. All columns used in the join are contained in indexes. But it does an index scan and a massive hashmatch. Why is this? both tables have columns in the index ordered the same, all datatypes of the indexes are ints. The code looks like this. SELECT TblActivities.* FROM ...

Microsoft SQL Server

9315

SQL statement on inner join that hits another table twice?

by: dmonroe | last post by:

hi group -- Im having a nested inner join problem with an Access SQl statement/Query design. Im running the query from ASP and not usng the access interface at all. Here's the tables: tblEmployees empId -- EmpName -- EmpRole -- EmpManager -------....------------.... ---------....--------------- 1........ dan yella..........1..........2

Microsoft Access / VBA

6345

Can inner join replace multiple loop selects?

by: MP | last post by:

Hi trying to begin to learn database using vb6, ado/adox, mdb format, sql (not using access...just mdb format via ado) i need to group the values of multiple fields - get their possible variations(combination of fields), - then act on each group in some way ...eg ProcessRs (oRs as RecordSet)... the following query will get me the distinct groups

Microsoft Access / VBA

9287

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10046

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

9886

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

9857

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

6542

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5155

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5318

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3817

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

2677

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General