Need help speedig up query

Sebastian

The following query needs about 2 minutes to complete (finding dupes)
on a table of about 10000 addresses. Does anyone have an idea on how
to speed this up ?

Thanks in advance !!!

Sebastian

Select
Top 1000 *
From
addresses ab1
Where
(
Select Count(*) From addresses base ab2 Where
(
(
(ab2.LastName = ab1.LastName And Ltrim(RTrim(ab1.LastName)) != '' )
Or
(ab2.Company = ab1.Company And (Ltrim(RTrim(ab1.Company)) != '') )
)
And
(
ab2.ZipCode = ab1.ZipCode
Or
ab1.ZipCode = ''
)
)
And ab2.Ad_Id != ab1.Ad_Id
) >= 1
Order By
LastName, FirstName

Jul 23 '05 #1

Subscribe Post Reply

1706

Hugo Kornelis

On 14 Feb 2005 04:06:13 -0800, Sebastian wrote:

The following query needs about 2 minutes to complete (finding dupes)
on a table of about 10000 addresses. Does anyone have an idea on how
to speed this up ?
Hi Sebastian,

I'm hope you made a mistake while copying the query. It should return an
error message in mere milliseconds:
Select Count(*) From addresses base ab2 Where ^^^^^^^^

A table can have a maximum of one alias, never two.

A quick win in this case is to replace the test for COUNT(*) >= 1 with a
test for EXISTS. With COUNT(*), SQL Server will go on to find a second,
third, etc., match after finding the first; with EXISTS it won't.

Another quick win is to not use SELECT *, but specify a column list. You
may be lucky and have a covering index that can be used to speed up the
query if you don't show all columns.

Why are you using things like "Ltrim(RTrim(ab1.LastName)) != ''"? Do you
mean to say that your LastName column might contain empty strings, but
also a series of spaces? Why don't you use NULL to represent missing data,
that's exactly what the NULL symbol is invented for.

From your query, I get the impression that each row in your table has
exactly one of LastName and Company filled; the other column is always an
empty string or some spaces. If you had used NULLS, you could now simply
have written "ab2.LastName = ab1.LastName OR ab2.Company = ab1.Company".
Not necessarily faster (though certainly not slower), but a lot more
readable!

This code: And
(
ab2.ZipCode = ab1.ZipCode
Or
ab1.ZipCode = ''
)

will result in ANY zip code from ab2 being considered a match if the zip
code in ab1 is blank. Are you sure that is what you want? If you want a
blank zip code in ab1 to match only blank zip codes in ab2, reduce this to
AND ab2.ZipCode = ab1.ZipCode
Not only shorter and easier, but probably quicker as well.

For more help, you'll have to post more information: the structure of your
table (as CREATE TABLE statement, with irrelevant columns omitted, but all
constraints and properties included - and don't forget to include indexes
as well), some sample data (as INSERT statements) to illustrate how your
data looks and the output you expect to get from that sample data. Plus a
description of what you consider to be a duplicate, as your query
indicates that your definition is not trivial.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

Jul 23 '05 #2

Gert-Jan Strik

I'll have a go at it. Try this:

Select
Top 1000 *
From (
SELECT *
FROM addresses ab1
INNER JOIN (
SELECT LastName,ZipCode
FROM Addresses
WHERE LastName > Space(100)
AND Zipcode <> ''
GROUP BY LastName,ZipCode
HAVING COUNT(*)>1
) ab2
ON ab1.LastName=ab2.LastName
AND ab1.ZipCode =ab2.ZipCode

UNION ALL

SELECT *
FROM addresses ab1
INNER JOIN (
SELECT LastName
FROM Addresses
WHERE LastName > Space(100)
HAVING COUNT(*)>1
GROUP BY LastName
AND MIN(ZipCode)=''
) ab2
ON ab1.LastName=ab2.LastName
AND ab1.ZipCode =''

UNION ALL

SELECT *
FROM addresses ab1
INNER JOIN (
SELECT Company,ZipCode
FROM Addresses
WHERE Company > Space(100)
AND Zipcode <> ''
GROUP BY Company,ZipCode
HAVING COUNT(*)>1
AND MIN(LastName) < MAX(LastName)
) ab2
ON ab1.Company=ab2.Company
AND ab1.ZipCode=ab2.ZipCode

UNION ALL

SELECT *
FROM addresses ab1
INNER JOIN (
SELECT Company
FROM Addresses
WHERE Company > Space(100)
GROUP BY Company
HAVING COUNT(*)>1
AND MIN(LastName) < MAX(LastName)
AND MIN(ZipCode)=''
) ab2
ON ab1.Company=ab2.Company
AND ab1.ZipCode=''
) X
Order By
LastName, FirstName

Note that the predicate "AND MIN(LastName) < MAX(LastName)" tries to
eliminate duplicate duplicates. However, this may result in a missed
Company duplicate, because of existing LastName duplicates for the same
ZipCode.

Of course, if you are using TOP 1000 to just get the first 1000
duplicates (and not all duplicates), then you can also do something like
this:

Declare @count int
Set @count=0

SELECT TOP 1000 *
FROM addresses ab1
INNER JOIN (
SELECT LastName,ZipCode
FROM Addresses
WHERE LastName > Space(100)
AND Zipcode <> ''
GROUP BY LastName,ZipCode
HAVING COUNT(*)>1
) ab2
ON ab1.LastName=ab2.LastName
AND ab1.ZipCode =ab2.ZipCode
ORDER BY LastName, FirstName

Set @Count=@Count+@@rowcount
If @Count < 1000
Begin
SET ROWCOUNT 1000-@Count

SELECT TOP 1000 *
FROM addresses ab1
INNER JOIN (
SELECT LastName
FROM Addresses
WHERE LastName > Space(100)
HAVING COUNT(*)>1
GROUP BY LastName
AND MIN(ZipCode)=''
) ab2
ON ab1.LastName=ab2.LastName
AND ab1.ZipCode =''
ORDER BY LastName, FirstName

Set @Count=@Count+@@rowcount
End

If @Count < 1000
Begin
SET ROWCOUNT 1000-@Count

SELECT TOP 1000 *
FROM addresses ab1
INNER JOIN (
SELECT Company,ZipCode
FROM Addresses
WHERE Company > Space(100)
AND Zipcode <> ''
GROUP BY Company,ZipCode
HAVING COUNT(*)>1
AND MIN(LastName) < MAX(LastName)
) ab2
ON ab1.Company=ab2.Company
AND ab1.ZipCode=ab2.ZipCode
ORDER BY LastName, FirstName

Set @Count=@Count+@@rowcount
End

If @Count < 1000
Begin
SET ROWCOUNT 1000-@Count

SELECT TOP 1000 *
FROM addresses ab1
INNER JOIN (
SELECT Company
FROM Addresses
WHERE Company > Space(100)
GROUP BY Company
HAVING COUNT(*)>1
AND MIN(LastName) < MAX(LastName)
AND MIN(ZipCode)=''
) ab2
ON ab1.Company=ab2.Company
AND ab1.ZipCode=''
ORDER BY LastName, FirstName
End
SET ROWCOUNT 0
Other notes:
- The predicate "ab2.Ad_Id != ab1.Ad_Id" uses proprietary syntax. The
ANSI-SQL syntax is "ab2.Ad_Id <> ab1.Ad_Id"
- If you are comparing with an empty string, then it is useless to
perform two Trim functions. So you can simplify
"Ltrim(RTrim(ab1.Company)) != ''" to "RTrim(ab1.Company) <> ''". In the
query above, it is translated to "ab1.Company > Space(100)", because
this makes it a usuable search argument for the optimizer

Hope this helps,
Gert-Jan
Sebastian wrote:

The following query needs about 2 minutes to complete (finding dupes)
on a table of about 10000 addresses. Does anyone have an idea on how
to speed this up ?

Thanks in advance !!!

Sebastian

Select
Top 1000 *
From
addresses ab1
Where
(
Select Count(*) From addresses base ab2 Where
(
(
(ab2.LastName = ab1.LastName And Ltrim(RTrim(ab1.LastName)) != '' )
Or
(ab2.Company = ab1.Company And (Ltrim(RTrim(ab1.Company)) != '') )
)
And
(
ab2.ZipCode = ab1.ZipCode
Or
ab1.ZipCode = ''
)
)
And ab2.Ad_Id != ab1.Ad_Id
) >= 1
Order By
LastName, FirstName

Jul 23 '05 #3

by: lawrence | last post by:

I've been bad about documentation so far but I'm going to try to be better. I've mostly worked alone so I'm the only one, so far, who's suffered from my bad habits. But I'd like other programmers...

PHP

Need help with query

by: netpurpose | last post by:

I need to extract data from this table to find the lowest prices of each product as of today. The product will be listed/grouped by the name only, discarding the product code - I use...

Microsoft SQL Server

Need A97 code/query help.

by: pw | last post by:

Hi, I am having a mental block trying to figure out how to code this. Two tables: "tblQuestions" (fields = quesnum, questype, question) "tblAnswers" (fields = clientnum, quesnum, questype,...

Microsoft Access / VBA

subquery/nested query - do I need to redesign my query or my database?

by: K. Crothers | last post by:

I administer a mechanical engineering database. I need to build a query which uses the results from a subquery as its input or criterion. I am attempting to find all of the component parts of...

Microsoft Access / VBA

Need an alternative to Lookup fields

by: google | last post by:

I have a database with four table. In one of the tables, I use about five lookup fields to get populate their dropdown list. I have read that lookup fields are really bad and may cause problems...

Microsoft Access / VBA

Stuck & Need Help with "Generate Report"

by: ward | last post by:

Greetings. Ok, I admit it, I bit off a bit more than I can chew. I need to complete this "Generate Report" page for my employer and I'm a little over my head. I could use some additional...

PHP

Need help with Query

by: L. R. Du Broff | last post by:

I own a small business. Need to track a few hundred pieces of rental equipment that can be in any of a few dozen locations. I'm an old-time C language programmer (UNIX environment). If the only...

Microsoft Access / VBA

Given up - need some assistance

by: Rnykster | last post by:

I know a little about Access and have made several single table databases. Been struggling for about a month to do a multiple table database with no success. Help! There are two tables. First...

Microsoft Access / VBA

Need Some Help With Query Processing

by: pbd22 | last post by:

Hi. I need some help with structuring my query strings. I have a form with a search bar and some links. Each link is a search type (such as "community"). The HREF for the link's anchor looks...

ASP.NET

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Similar topics