473,756 Members | 8,443 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

INNER JOIN/Index Threshold?

Guys I am really stuck on this one. Any help or suggestions would be
appreciated.
We have a large table which seemed to just hit some kind of threshold.
They query is somewhat responsive when there are NO indexes on the
table. However, when we index email the query takes forever.

FACTS
- The problem is very "data specific". I can not recreate the
problem using different data.
- There is only a problem when I index email on the base table.
- The problem goes away when I add "AND b.email IS NOT NULL" to the
inner join condition. It does not help when I add the logic to the
"WHERE" clause.
DDL
CREATE TABLE base (bk char(25), email varchar(100))
create clustered index icx on base(bk)
create index ix_email on base(email)
CREATE TABLE filter (bk char(25), email varchar(100))
create clustered index icx on filter (bk)
create index ix_email on filter (email)

Query
SELECT b.bk, b.email
FROM base b WITH(NOLOCK)
INNER JOIN filter f ON f.email = b.email
--and f.email is not null

Data Profile
--35120500, 35120491, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
--16796199, 16796192, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
WHERE email IS NOT NULL

--250552, 250552, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
--250208, 250208, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
WHERE email IS NOT NULL

Jan 4 '07 #1
18 5036
Dave wrote:
Guys I am really stuck on this one. Any help or suggestions would be
appreciated.
We have a large table which seemed to just hit some kind of threshold.
They query is somewhat responsive when there are NO indexes on the
table. However, when we index email the query takes forever.

FACTS
- The problem is very "data specific". I can not recreate the
problem using different data.
- There is only a problem when I index email on the base table.
- The problem goes away when I add "AND b.email IS NOT NULL" to the
inner join condition. It does not help when I add the logic to the
"WHERE" clause.
DDL
CREATE TABLE base (bk char(25), email varchar(100))
create clustered index icx on base(bk)
create index ix_email on base(email)
CREATE TABLE filter (bk char(25), email varchar(100))
create clustered index icx on filter (bk)
create index ix_email on filter (email)

Query
SELECT b.bk, b.email
FROM base b WITH(NOLOCK)
INNER JOIN filter f ON f.email = b.email
--and f.email is not null

Data Profile
--35120500, 35120491, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
--16796199, 16796192, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
WHERE email IS NOT NULL

--250552, 250552, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
--250208, 250208, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
WHERE email IS NOT NULL
What does the estimated execution plan look like? Try adding the column
BK as a second key in your email index.
--
Tracy McKibben
MCDBA
http://www.realsqlguy.com
Jan 4 '07 #2
bk is clustered so there is no reason to add it to the other index.

Estimated plan does an index scan on each table, then a merge join.

When I remove the index on base.email the plan does a table scan on
base then performs a Hash Match
Tracy McKibben wrote:
Dave wrote:
Guys I am really stuck on this one. Any help or suggestions would be
appreciated.
We have a large table which seemed to just hit some kind of threshold.
They query is somewhat responsive when there are NO indexes on the
table. However, when we index email the query takes forever.

FACTS
- The problem is very "data specific". I can not recreate the
problem using different data.
- There is only a problem when I index email on the base table.
- The problem goes away when I add "AND b.email IS NOT NULL" to the
inner join condition. It does not help when I add the logic to the
"WHERE" clause.
DDL
CREATE TABLE base (bk char(25), email varchar(100))
create clustered index icx on base(bk)
create index ix_email on base(email)
CREATE TABLE filter (bk char(25), email varchar(100))
create clustered index icx on filter (bk)
create index ix_email on filter (email)

Query
SELECT b.bk, b.email
FROM base b WITH(NOLOCK)
INNER JOIN filter f ON f.email = b.email
--and f.email is not null

Data Profile
--35120500, 35120491, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
--16796199, 16796192, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
WHERE email IS NOT NULL

--250552, 250552, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
--250208, 250208, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
WHERE email IS NOT NULL

What does the estimated execution plan look like? Try adding the column
BK as a second key in your email index.
--
Tracy McKibben
MCDBA
http://www.realsqlguy.com
Jan 4 '07 #3
Dave,

Do you have a lot of null emails on both tables? If so, that might be an
influencing factor.

-- Bill

"Dave" <da******@gmail .comwrote in message
news:11******** **************@ s80g2000cwa.goo glegroups.com.. .
Guys I am really stuck on this one. Any help or suggestions would be
appreciated.
We have a large table which seemed to just hit some kind of threshold.
They query is somewhat responsive when there are NO indexes on the
table. However, when we index email the query takes forever.

FACTS
- The problem is very "data specific". I can not recreate the
problem using different data.
- There is only a problem when I index email on the base table.
- The problem goes away when I add "AND b.email IS NOT NULL" to the
inner join condition. It does not help when I add the logic to the
"WHERE" clause.
DDL
CREATE TABLE base (bk char(25), email varchar(100))
create clustered index icx on base(bk)
create index ix_email on base(email)
CREATE TABLE filter (bk char(25), email varchar(100))
create clustered index icx on filter (bk)
create index ix_email on filter (email)

Query
SELECT b.bk, b.email
FROM base b WITH(NOLOCK)
INNER JOIN filter f ON f.email = b.email
--and f.email is not null

Data Profile
--35120500, 35120491, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
--16796199, 16796192, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
WHERE email IS NOT NULL

--250552, 250552, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
--250208, 250208, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
WHERE email IS NOT NULL

Jan 4 '07 #4
Dave (da******@gmail .com) writes:
bk is clustered so there is no reason to add it to the other index.

Estimated plan does an index scan on each table, then a merge join.

When I remove the index on base.email the plan does a table scan on
base then performs a Hash Match
And the merge-join plan takes forever? That's indeed very funny, because
that's sounds like a quite decent plan. Your tables looks funny, because
they have no keys, but that would matter.

What happens if you try:

SELECT b.hk, b.email
FROM base b
WHERE EXISTS (SELECT *
FROM filter
WHERE f.email = b.email)
Which version of SQL Server?

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jan 4 '07 #5
Your correlated subquery works.
It added a stream aggregate step to the execution plan right before the
merge join.

This is really weird. Could it be an SQL Server Bug? Like I said
earlier, it is very data specific. I can add 3 million random records
to the filter table and it normally works fine. It is only this
specific dataset that is causing problems.
Erland Sommarskog wrote:
Dave (da******@gmail .com) writes:
bk is clustered so there is no reason to add it to the other index.

Estimated plan does an index scan on each table, then a merge join.

When I remove the index on base.email the plan does a table scan on
base then performs a Hash Match

And the merge-join plan takes forever? That's indeed very funny, because
that's sounds like a quite decent plan. Your tables looks funny, because
they have no keys, but that would matter.

What happens if you try:

SELECT b.hk, b.email
FROM base b
WHERE EXISTS (SELECT *
FROM filter
WHERE f.email = b.email)
Which version of SQL Server?

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jan 4 '07 #6
Hi Dave,

I tried to create a situation similar to yours on SQL Server 2000. What
I noticed is:
- Without the extra predicate, the optimizer does not seem smart enough
to chose SEEK:([b].[email] IsNotNull) instead of a full Index Scan
- You are correct that the change in access method is not achieved if
the predicate is added to the WHERE clause. It seems it must be part of
the join ON clause
- If either "b.email IS NOT NULL" or "f.email IS NOT NULL" is added, the
join type remains MERGE JOIN. If both are added to the ON clause I get a
HASH JOIN
- I am not sure if the MERGE JOIN is faster or if the HASH JOIN is, but
in either case you may want to add the two "unnecessar y" predicates

There is a thresshold somewhere (depending on your specific situation)
where merge-joining or hashing the two partially scanned indexes is
faster. Up to that point, the fastest plan will be to partially scan the
index of filter and loop join and seek each matching row in base. In all
cases it will be faster if NULL rows are excluded from the index seeks.

HTH,
Gert-Jan
Dave wrote:
>
Guys I am really stuck on this one. Any help or suggestions would be
appreciated.

We have a large table which seemed to just hit some kind of threshold.
They query is somewhat responsive when there are NO indexes on the
table. However, when we index email the query takes forever.

FACTS
- The problem is very "data specific". I can not recreate the
problem using different data.
- There is only a problem when I index email on the base table.
- The problem goes away when I add "AND b.email IS NOT NULL" to the
inner join condition. It does not help when I add the logic to the
"WHERE" clause.

DDL
CREATE TABLE base (bk char(25), email varchar(100))
create clustered index icx on base(bk)
create index ix_email on base(email)

CREATE TABLE filter (bk char(25), email varchar(100))
create clustered index icx on filter (bk)
create index ix_email on filter (email)

Query
SELECT b.bk, b.email
FROM base b WITH(NOLOCK)
INNER JOIN filter f ON f.email = b.email
--and f.email is not null

Data Profile
--35120500, 35120491, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base

--16796199, 16796192, 14221553
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM base
WHERE email IS NOT NULL

--250552, 250552, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter

--250208, 250208, 250205
SELECT COUNT(*) ,COUNT(DISTINCT bk), COUNT(DISTINCT email)
FROM filter
WHERE email IS NOT NULL
Jan 5 '07 #7
Dave (da******@gmail .com) writes:
Your correlated subquery works.
It added a stream aggregate step to the execution plan right before the
merge join.

This is really weird. Could it be an SQL Server Bug? Like I said
earlier, it is very data specific. I can add 3 million random records
to the filter table and it normally works fine. It is only this
specific dataset that is causing problems.

Note that the two queries are not semantically equivalent. If a certain
email address appears four times in the filter and four times in base, your
original query will return 16 rows, while mine will return four.

With the numbers you posted there are only four duplicates in the filter.
Provided, that is, you are running with ANSI_NULLS ON. If this is in a
stored procedure saved from Enterprise Manager, you are likely to be running
with ANSI_NULLS OFF, in which case NULL = NULL, and you will get an
enormous number of duplicates. Which are filtered away if you add the
IS NOT NULL filter.
--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jan 5 '07 #8
We are using SQL 2K Enterprise SP4.

Erland that was an excellent post! However it was not my problem. The
code is run from .NET application and I am testing using SQL Management
Studio.

I added this to the test script and it did not help.
SET ANSI_NULLS ON
I think there are two things going on here.
1. SQL Server is not picking the best plan
2. There is an issue with my specific data set that I can not reproduce
with any other dataset except for "some" subsets or supersets.
Thanks for the help guys!!!

Erland Sommarskog wrote:
Dave (da******@gmail .com) writes:
Your correlated subquery works.
It added a stream aggregate step to the execution plan right before the
merge join.

This is really weird. Could it be an SQL Server Bug? Like I said
earlier, it is very data specific. I can add 3 million random records
to the filter table and it normally works fine. It is only this
specific dataset that is causing problems.


Note that the two queries are not semantically equivalent. If a certain
email address appears four times in the filter and four times in base, your
original query will return 16 rows, while mine will return four.

With the numbers you posted there are only four duplicates in the filter.
Provided, that is, you are running with ANSI_NULLS ON. If this is in a
stored procedure saved from Enterprise Manager, you are likely to be running
with ANSI_NULLS OFF, in which case NULL = NULL, and you will get an
enormous number of duplicates. Which are filtered away if you add the
IS NOT NULL filter.
--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jan 5 '07 #9
Dave (da******@gmail .com) writes:
Erland that was an excellent post! However it was not my problem. The
code is run from .NET application and I am testing using SQL Management
Studio.

I added this to the test script and it did not help.
SET ANSI_NULLS ON
As long it's a loose script, ANSI_BULLS would be on by default. I will
have to admit that I was clutching a straws. Without access to the
database seeing it, it is very difficult to analyse the problem accurately.
It's not alwyas that easy even if you have full acecss.

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jan 5 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
6416
by: Prem | last post by:
Hi, I am having many problems with inner join. my first problem is : 1) I want to know the precedance while evaluating query with multiple joins. eg. select Employees.FirstName, Employees.LastName, TerritoryID, Employees.EmployeeID, RegionID, ProductID from Employees
4
8106
by: DCM Fan | last post by:
{CREATE TABLEs and INSERTs follow...} Gents, I have a main table that is in ONE-MANY with many other tables. For example, if the main table is named A, there are these realtionships: A-->B A-->C A-->D
11
2911
by: news-east.earthlink.net | last post by:
The scenario: two tables CustomerTable --------------- CustomerID OrderID CustomerName CustomerEmail
6
9033
by: Thomas Beutin | last post by:
Hi, i've a speed problem withe the following statement: SELECT DISTINCT pz.l1_id, pz.l2_id, pz.l3_id, pz.l4_id FROM ot_adresse AS a, ot_produkt AS p LEFT OUTER JOIN ot_kat_prod AS pz ON ( p.p_id = pz.p_id ) WHERE p.a_id = a.id AND a.id = '105391105424941' AND a.m_id = '37'; This is terrible slow compared to the inner join: SELECT DISTINCT pz.l1_id, pz.l2_id, pz.l3_id, pz.l4_id
4
21437
by: PASQUALE | last post by:
Hi I have a question: do the both statements below give the same result? If yes then does somebody know something about preformance differencies using these joins? SELECT A.* FROM Table1 A INNER JOIN Table2 B on A.Field1 <> B.Field1
2
1623
by: gkellymail | last post by:
the following query works fine: select link.idx, link.x_table, link.x_id_a, link.x_id_z, a.strandid, b.strandid from link_detail, link, strand A, strand B where link_detail.x_table = 'enclosure' and link_detail.x_id = 3 and link.idx = link_detail.linkidx and A.strandid = link.x_id_a and B.strandid = link.x_id_z would someone please convert this to a more efficient query using inner
3
18886
by: moldster | last post by:
Hi, I'm at my wits end! I have two large tables one with 1.2mill one with 2.3 mill and they are very wide tables. I have a select with an inner join. All columns used in the join are contained in indexes. But it does an index scan and a massive hashmatch. Why is this? both tables have columns in the index ordered the same, all datatypes of the indexes are ints. The code looks like this. SELECT TblActivities.* FROM ...
6
9315
by: dmonroe | last post by:
hi group -- Im having a nested inner join problem with an Access SQl statement/Query design. Im running the query from ASP and not usng the access interface at all. Here's the tables: tblEmployees empId -- EmpName -- EmpRole -- EmpManager -------....------------.... ---------....--------------- 1........ dan yella..........1..........2
52
6345
by: MP | last post by:
Hi trying to begin to learn database using vb6, ado/adox, mdb format, sql (not using access...just mdb format via ado) i need to group the values of multiple fields - get their possible variations(combination of fields), - then act on each group in some way ...eg ProcessRs (oRs as RecordSet)... the following query will get me the distinct groups
0
9287
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10046
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9886
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9857
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
6542
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5155
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5318
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3817
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2677
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.