473,396 Members | 2,061 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Need help speedig up query

The following query needs about 2 minutes to complete (finding dupes)
on a table of about 10000 addresses. Does anyone have an idea on how
to speed this up ?

Thanks in advance !!!

Sebastian

Select
Top 1000 *
From
addresses ab1
Where
(
Select Count(*) From addresses base ab2 Where
(
(
(ab2.LastName = ab1.LastName And Ltrim(RTrim(ab1.LastName)) != '' )
Or
(ab2.Company = ab1.Company And (Ltrim(RTrim(ab1.Company)) != '') )
)
And
(
ab2.ZipCode = ab1.ZipCode
Or
ab1.ZipCode = ''
)
)
And ab2.Ad_Id != ab1.Ad_Id
) >= 1
Order By
LastName, FirstName
Jul 23 '05 #1
2 1706
On 14 Feb 2005 04:06:13 -0800, Sebastian wrote:
The following query needs about 2 minutes to complete (finding dupes)
on a table of about 10000 addresses. Does anyone have an idea on how
to speed this up ?
Hi Sebastian,

I'm hope you made a mistake while copying the query. It should return an
error message in mere milliseconds:
Select Count(*) From addresses base ab2 Where ^^^^^^^^

A table can have a maximum of one alias, never two.

A quick win in this case is to replace the test for COUNT(*) >= 1 with a
test for EXISTS. With COUNT(*), SQL Server will go on to find a second,
third, etc., match after finding the first; with EXISTS it won't.

Another quick win is to not use SELECT *, but specify a column list. You
may be lucky and have a covering index that can be used to speed up the
query if you don't show all columns.

Why are you using things like "Ltrim(RTrim(ab1.LastName)) != ''"? Do you
mean to say that your LastName column might contain empty strings, but
also a series of spaces? Why don't you use NULL to represent missing data,
that's exactly what the NULL symbol is invented for.

From your query, I get the impression that each row in your table has
exactly one of LastName and Company filled; the other column is always an
empty string or some spaces. If you had used NULLS, you could now simply
have written "ab2.LastName = ab1.LastName OR ab2.Company = ab1.Company".
Not necessarily faster (though certainly not slower), but a lot more
readable!

This code: And
(
ab2.ZipCode = ab1.ZipCode
Or
ab1.ZipCode = ''
)

will result in ANY zip code from ab2 being considered a match if the zip
code in ab1 is blank. Are you sure that is what you want? If you want a
blank zip code in ab1 to match only blank zip codes in ab2, reduce this to
AND ab2.ZipCode = ab1.ZipCode
Not only shorter and easier, but probably quicker as well.

For more help, you'll have to post more information: the structure of your
table (as CREATE TABLE statement, with irrelevant columns omitted, but all
constraints and properties included - and don't forget to include indexes
as well), some sample data (as INSERT statements) to illustrate how your
data looks and the output you expect to get from that sample data. Plus a
description of what you consider to be a duplicate, as your query
indicates that your definition is not trivial.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Jul 23 '05 #2
I'll have a go at it. Try this:

Select
Top 1000 *
From (
SELECT *
FROM addresses ab1
INNER JOIN (
SELECT LastName,ZipCode
FROM Addresses
WHERE LastName > Space(100)
AND Zipcode <> ''
GROUP BY LastName,ZipCode
HAVING COUNT(*)>1
) ab2
ON ab1.LastName=ab2.LastName
AND ab1.ZipCode =ab2.ZipCode

UNION ALL

SELECT *
FROM addresses ab1
INNER JOIN (
SELECT LastName
FROM Addresses
WHERE LastName > Space(100)
HAVING COUNT(*)>1
GROUP BY LastName
AND MIN(ZipCode)=''
) ab2
ON ab1.LastName=ab2.LastName
AND ab1.ZipCode =''

UNION ALL

SELECT *
FROM addresses ab1
INNER JOIN (
SELECT Company,ZipCode
FROM Addresses
WHERE Company > Space(100)
AND Zipcode <> ''
GROUP BY Company,ZipCode
HAVING COUNT(*)>1
AND MIN(LastName) < MAX(LastName)
) ab2
ON ab1.Company=ab2.Company
AND ab1.ZipCode=ab2.ZipCode

UNION ALL

SELECT *
FROM addresses ab1
INNER JOIN (
SELECT Company
FROM Addresses
WHERE Company > Space(100)
GROUP BY Company
HAVING COUNT(*)>1
AND MIN(LastName) < MAX(LastName)
AND MIN(ZipCode)=''
) ab2
ON ab1.Company=ab2.Company
AND ab1.ZipCode=''
) X
Order By
LastName, FirstName

Note that the predicate "AND MIN(LastName) < MAX(LastName)" tries to
eliminate duplicate duplicates. However, this may result in a missed
Company duplicate, because of existing LastName duplicates for the same
ZipCode.

Of course, if you are using TOP 1000 to just get the first 1000
duplicates (and not all duplicates), then you can also do something like
this:

Declare @count int
Set @count=0

SELECT TOP 1000 *
FROM addresses ab1
INNER JOIN (
SELECT LastName,ZipCode
FROM Addresses
WHERE LastName > Space(100)
AND Zipcode <> ''
GROUP BY LastName,ZipCode
HAVING COUNT(*)>1
) ab2
ON ab1.LastName=ab2.LastName
AND ab1.ZipCode =ab2.ZipCode
ORDER BY LastName, FirstName

Set @Count=@Count+@@rowcount
If @Count < 1000
Begin
SET ROWCOUNT 1000-@Count

SELECT TOP 1000 *
FROM addresses ab1
INNER JOIN (
SELECT LastName
FROM Addresses
WHERE LastName > Space(100)
HAVING COUNT(*)>1
GROUP BY LastName
AND MIN(ZipCode)=''
) ab2
ON ab1.LastName=ab2.LastName
AND ab1.ZipCode =''
ORDER BY LastName, FirstName

Set @Count=@Count+@@rowcount
End

If @Count < 1000
Begin
SET ROWCOUNT 1000-@Count

SELECT TOP 1000 *
FROM addresses ab1
INNER JOIN (
SELECT Company,ZipCode
FROM Addresses
WHERE Company > Space(100)
AND Zipcode <> ''
GROUP BY Company,ZipCode
HAVING COUNT(*)>1
AND MIN(LastName) < MAX(LastName)
) ab2
ON ab1.Company=ab2.Company
AND ab1.ZipCode=ab2.ZipCode
ORDER BY LastName, FirstName

Set @Count=@Count+@@rowcount
End

If @Count < 1000
Begin
SET ROWCOUNT 1000-@Count

SELECT TOP 1000 *
FROM addresses ab1
INNER JOIN (
SELECT Company
FROM Addresses
WHERE Company > Space(100)
GROUP BY Company
HAVING COUNT(*)>1
AND MIN(LastName) < MAX(LastName)
AND MIN(ZipCode)=''
) ab2
ON ab1.Company=ab2.Company
AND ab1.ZipCode=''
ORDER BY LastName, FirstName
End
SET ROWCOUNT 0
Other notes:
- The predicate "ab2.Ad_Id != ab1.Ad_Id" uses proprietary syntax. The
ANSI-SQL syntax is "ab2.Ad_Id <> ab1.Ad_Id"
- If you are comparing with an empty string, then it is useless to
perform two Trim functions. So you can simplify
"Ltrim(RTrim(ab1.Company)) != ''" to "RTrim(ab1.Company) <> ''". In the
query above, it is translated to "ab1.Company > Space(100)", because
this makes it a usuable search argument for the optimizer

Hope this helps,
Gert-Jan
Sebastian wrote:

The following query needs about 2 minutes to complete (finding dupes)
on a table of about 10000 addresses. Does anyone have an idea on how
to speed this up ?

Thanks in advance !!!

Sebastian

Select
Top 1000 *
From
addresses ab1
Where
(
Select Count(*) From addresses base ab2 Where
(
(
(ab2.LastName = ab1.LastName And Ltrim(RTrim(ab1.LastName)) != '' )
Or
(ab2.Company = ab1.Company And (Ltrim(RTrim(ab1.Company)) != '') )
)
And
(
ab2.ZipCode = ab1.ZipCode
Or
ab1.ZipCode = ''
)
)
And ab2.Ad_Id != ab1.Ad_Id
) >= 1
Order By
LastName, FirstName

Jul 23 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: lawrence | last post by:
I've been bad about documentation so far but I'm going to try to be better. I've mostly worked alone so I'm the only one, so far, who's suffered from my bad habits. But I'd like other programmers...
9
by: netpurpose | last post by:
I need to extract data from this table to find the lowest prices of each product as of today. The product will be listed/grouped by the name only, discarding the product code - I use...
3
by: pw | last post by:
Hi, I am having a mental block trying to figure out how to code this. Two tables: "tblQuestions" (fields = quesnum, questype, question) "tblAnswers" (fields = clientnum, quesnum, questype,...
7
by: K. Crothers | last post by:
I administer a mechanical engineering database. I need to build a query which uses the results from a subquery as its input or criterion. I am attempting to find all of the component parts of...
3
by: google | last post by:
I have a database with four table. In one of the tables, I use about five lookup fields to get populate their dropdown list. I have read that lookup fields are really bad and may cause problems...
0
by: ward | last post by:
Greetings. Ok, I admit it, I bit off a bit more than I can chew. I need to complete this "Generate Report" page for my employer and I'm a little over my head. I could use some additional...
10
by: L. R. Du Broff | last post by:
I own a small business. Need to track a few hundred pieces of rental equipment that can be in any of a few dozen locations. I'm an old-time C language programmer (UNIX environment). If the only...
7
by: Rnykster | last post by:
I know a little about Access and have made several single table databases. Been struggling for about a month to do a multiple table database with no success. Help! There are two tables. First...
3
by: pbd22 | last post by:
Hi. I need some help with structuring my query strings. I have a form with a search bar and some links. Each link is a search type (such as "community"). The HREF for the link's anchor looks...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.