473,834 Members | 2,009 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Need help speedig up query

The following query needs about 2 minutes to complete (finding dupes)
on a table of about 10000 addresses. Does anyone have an idea on how
to speed this up ?

Thanks in advance !!!

Sebastian

Select
Top 1000 *
From
addresses ab1
Where
(
Select Count(*) From addresses base ab2 Where
(
(
(ab2.LastName = ab1.LastName And Ltrim(RTrim(ab1 .LastName)) != '' )
Or
(ab2.Company = ab1.Company And (Ltrim(RTrim(ab 1.Company)) != '') )
)
And
(
ab2.ZipCode = ab1.ZipCode
Or
ab1.ZipCode = ''
)
)
And ab2.Ad_Id != ab1.Ad_Id
) >= 1
Order By
LastName, FirstName
Jul 23 '05 #1
2 1726
On 14 Feb 2005 04:06:13 -0800, Sebastian wrote:
The following query needs about 2 minutes to complete (finding dupes)
on a table of about 10000 addresses. Does anyone have an idea on how
to speed this up ?
Hi Sebastian,

I'm hope you made a mistake while copying the query. It should return an
error message in mere milliseconds:
Select Count(*) From addresses base ab2 Where ^^^^^^^^

A table can have a maximum of one alias, never two.

A quick win in this case is to replace the test for COUNT(*) >= 1 with a
test for EXISTS. With COUNT(*), SQL Server will go on to find a second,
third, etc., match after finding the first; with EXISTS it won't.

Another quick win is to not use SELECT *, but specify a column list. You
may be lucky and have a covering index that can be used to speed up the
query if you don't show all columns.

Why are you using things like "Ltrim(RTrim(ab 1.LastName)) != ''"? Do you
mean to say that your LastName column might contain empty strings, but
also a series of spaces? Why don't you use NULL to represent missing data,
that's exactly what the NULL symbol is invented for.

From your query, I get the impression that each row in your table has
exactly one of LastName and Company filled; the other column is always an
empty string or some spaces. If you had used NULLS, you could now simply
have written "ab2.LastNa me = ab1.LastName OR ab2.Company = ab1.Company".
Not necessarily faster (though certainly not slower), but a lot more
readable!

This code: And
(
ab2.ZipCode = ab1.ZipCode
Or
ab1.ZipCode = ''
)

will result in ANY zip code from ab2 being considered a match if the zip
code in ab1 is blank. Are you sure that is what you want? If you want a
blank zip code in ab1 to match only blank zip codes in ab2, reduce this to
AND ab2.ZipCode = ab1.ZipCode
Not only shorter and easier, but probably quicker as well.

For more help, you'll have to post more information: the structure of your
table (as CREATE TABLE statement, with irrelevant columns omitted, but all
constraints and properties included - and don't forget to include indexes
as well), some sample data (as INSERT statements) to illustrate how your
data looks and the output you expect to get from that sample data. Plus a
description of what you consider to be a duplicate, as your query
indicates that your definition is not trivial.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Jul 23 '05 #2
I'll have a go at it. Try this:

Select
Top 1000 *
From (
SELECT *
FROM addresses ab1
INNER JOIN (
SELECT LastName,ZipCod e
FROM Addresses
WHERE LastName > Space(100)
AND Zipcode <> ''
GROUP BY LastName,ZipCod e
HAVING COUNT(*)>1
) ab2
ON ab1.LastName=ab 2.LastName
AND ab1.ZipCode =ab2.ZipCode

UNION ALL

SELECT *
FROM addresses ab1
INNER JOIN (
SELECT LastName
FROM Addresses
WHERE LastName > Space(100)
HAVING COUNT(*)>1
GROUP BY LastName
AND MIN(ZipCode)=''
) ab2
ON ab1.LastName=ab 2.LastName
AND ab1.ZipCode =''

UNION ALL

SELECT *
FROM addresses ab1
INNER JOIN (
SELECT Company,ZipCode
FROM Addresses
WHERE Company > Space(100)
AND Zipcode <> ''
GROUP BY Company,ZipCode
HAVING COUNT(*)>1
AND MIN(LastName) < MAX(LastName)
) ab2
ON ab1.Company=ab2 .Company
AND ab1.ZipCode=ab2 .ZipCode

UNION ALL

SELECT *
FROM addresses ab1
INNER JOIN (
SELECT Company
FROM Addresses
WHERE Company > Space(100)
GROUP BY Company
HAVING COUNT(*)>1
AND MIN(LastName) < MAX(LastName)
AND MIN(ZipCode)=''
) ab2
ON ab1.Company=ab2 .Company
AND ab1.ZipCode=''
) X
Order By
LastName, FirstName

Note that the predicate "AND MIN(LastName) < MAX(LastName)" tries to
eliminate duplicate duplicates. However, this may result in a missed
Company duplicate, because of existing LastName duplicates for the same
ZipCode.

Of course, if you are using TOP 1000 to just get the first 1000
duplicates (and not all duplicates), then you can also do something like
this:

Declare @count int
Set @count=0

SELECT TOP 1000 *
FROM addresses ab1
INNER JOIN (
SELECT LastName,ZipCod e
FROM Addresses
WHERE LastName > Space(100)
AND Zipcode <> ''
GROUP BY LastName,ZipCod e
HAVING COUNT(*)>1
) ab2
ON ab1.LastName=ab 2.LastName
AND ab1.ZipCode =ab2.ZipCode
ORDER BY LastName, FirstName

Set @Count=@Count+@ @rowcount
If @Count < 1000
Begin
SET ROWCOUNT 1000-@Count

SELECT TOP 1000 *
FROM addresses ab1
INNER JOIN (
SELECT LastName
FROM Addresses
WHERE LastName > Space(100)
HAVING COUNT(*)>1
GROUP BY LastName
AND MIN(ZipCode)=''
) ab2
ON ab1.LastName=ab 2.LastName
AND ab1.ZipCode =''
ORDER BY LastName, FirstName

Set @Count=@Count+@ @rowcount
End

If @Count < 1000
Begin
SET ROWCOUNT 1000-@Count

SELECT TOP 1000 *
FROM addresses ab1
INNER JOIN (
SELECT Company,ZipCode
FROM Addresses
WHERE Company > Space(100)
AND Zipcode <> ''
GROUP BY Company,ZipCode
HAVING COUNT(*)>1
AND MIN(LastName) < MAX(LastName)
) ab2
ON ab1.Company=ab2 .Company
AND ab1.ZipCode=ab2 .ZipCode
ORDER BY LastName, FirstName

Set @Count=@Count+@ @rowcount
End

If @Count < 1000
Begin
SET ROWCOUNT 1000-@Count

SELECT TOP 1000 *
FROM addresses ab1
INNER JOIN (
SELECT Company
FROM Addresses
WHERE Company > Space(100)
GROUP BY Company
HAVING COUNT(*)>1
AND MIN(LastName) < MAX(LastName)
AND MIN(ZipCode)=''
) ab2
ON ab1.Company=ab2 .Company
AND ab1.ZipCode=''
ORDER BY LastName, FirstName
End
SET ROWCOUNT 0
Other notes:
- The predicate "ab2.Ad_Id != ab1.Ad_Id" uses proprietary syntax. The
ANSI-SQL syntax is "ab2.Ad_Id <> ab1.Ad_Id"
- If you are comparing with an empty string, then it is useless to
perform two Trim functions. So you can simplify
"Ltrim(RTrim(ab 1.Company)) != ''" to "RTrim(ab1.Comp any) <> ''". In the
query above, it is translated to "ab1.Compan y > Space(100)", because
this makes it a usuable search argument for the optimizer

Hope this helps,
Gert-Jan
Sebastian wrote:

The following query needs about 2 minutes to complete (finding dupes)
on a table of about 10000 addresses. Does anyone have an idea on how
to speed this up ?

Thanks in advance !!!

Sebastian

Select
Top 1000 *
From
addresses ab1
Where
(
Select Count(*) From addresses base ab2 Where
(
(
(ab2.LastName = ab1.LastName And Ltrim(RTrim(ab1 .LastName)) != '' )
Or
(ab2.Company = ab1.Company And (Ltrim(RTrim(ab 1.Company)) != '') )
)
And
(
ab2.ZipCode = ab1.ZipCode
Or
ab1.ZipCode = ''
)
)
And ab2.Ad_Id != ab1.Ad_Id
) >= 1
Order By
LastName, FirstName

Jul 23 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
3058
by: lawrence | last post by:
I've been bad about documentation so far but I'm going to try to be better. I've mostly worked alone so I'm the only one, so far, who's suffered from my bad habits. But I'd like other programmers to have an easier time understanding what I do. Therefore this weekend I'm going to spend 3 days just writing comments. Before I do it, I thought I'd ask other programmers what information they find useful. Below is a typical class I've...
9
3142
by: netpurpose | last post by:
I need to extract data from this table to find the lowest prices of each product as of today. The product will be listed/grouped by the name only, discarding the product code - I use SUBSTRING(ProductName, 1, CHARINDEX('(', ProductName)-2). I can get this result, but I had to use several views (totally inefficient). I think this can be done in one efficient/fast query, but I can't think of one. In the case that one query is not...
3
1868
by: pw | last post by:
Hi, I am having a mental block trying to figure out how to code this. Two tables: "tblQuestions" (fields = quesnum, questype, question) "tblAnswers" (fields = clientnum, quesnum, questype, answer) They are related by quesnum and questype. There are records in
7
2375
by: K. Crothers | last post by:
I administer a mechanical engineering database. I need to build a query which uses the results from a subquery as its input or criterion. I am attempting to find all of the component parts of which a part may be composed. I have a table of parts and their subparts. The problem is that each of those subparts may be composed of smaller component parts. The subpart would then be listed in the Part field linked to each of its subparts in...
3
10679
by: google | last post by:
I have a database with four table. In one of the tables, I use about five lookup fields to get populate their dropdown list. I have read that lookup fields are really bad and may cause problems that are hard to find. The main problem I am having right now is that I have a report that is sorted by one of these lookup fields and it only displays the record's ID number. When I add the source table to the query it makes several records...
0
2265
by: ward | last post by:
Greetings. Ok, I admit it, I bit off a bit more than I can chew. I need to complete this "Generate Report" page for my employer and I'm a little over my head. I could use some additional assistance. I say additional because I've already had help which is greatly appreciated. I do try to take the time and understand the provided script in hopes on not having to trouble others on those. But here it goes...
10
2600
by: L. R. Du Broff | last post by:
I own a small business. Need to track a few hundred pieces of rental equipment that can be in any of a few dozen locations. I'm an old-time C language programmer (UNIX environment). If the only tool you know how to use is a hammer, every problem tends to look like a nail. That said, I could solve my problem in C, but it's not the right tool. I need to come into the Windows world, and I need to get this done in Access or something...
7
2042
by: Rnykster | last post by:
I know a little about Access and have made several single table databases. Been struggling for about a month to do a multiple table database with no success. Help! There are two tables. First has about 30 fields. Every entry in this table will be unique. Second table has about 7 fields and is for reference - strictly a look up type table. I want to use one field, say FAMILY in the first table to look up any one of the 400 items in the...
3
2564
by: pbd22 | last post by:
Hi. I need some help with structuring my query strings. I have a form with a search bar and some links. Each link is a search type (such as "community"). The HREF for the link's anchor looks like the following: <a href="?searchtype=2">Community</a>
0
9799
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, weíll explore What is ONU, What Is Router, ONU & Routerís main usage, and What is the difference between ONU and Router. Letís take a closer look ! Part I. Meaning of...
0
9646
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10795
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10512
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10550
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
6957
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5627
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5796
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4427
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.