473,396 Members | 1,935 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

finding duplicate records

I'm trying to setup a way to find possible duplicate or near-duplicate
records of contact people. I have a table of 50k records doing an
inner join with itself, something like this:

SELECT p1.fields, p2.fields
FROM table AS p1 INNER JOIN table AS p2 ON p1.lastname = p2.lastname
WHERE p1.id <p2.id AND
(other criteria)

The id and name and other criteria fields are indexed as well as they
can be. It works fine as long as I add a LIMIT statement to keep it
down to just 10 or 20 records, but if I take this off, it runs on
forever -- so I can't even tell how many potentail records there will
be. If the table grows to 100k or 200k, it'll only get worse.

So my question is -- are there any better ways to go about this?

Mar 19 '07 #1
1 3984
On Mar 19, 3:49 am, rgurga...@citidc.com wrote:
I'm trying to setup a way to find possible duplicate or near-duplicate
records of contact people. I have a table of 50k records doing an
inner join with itself, something like this:

SELECT p1.fields, p2.fields
FROM table AS p1 INNER JOIN table AS p2 ON p1.lastname = p2.lastname
WHERE p1.id <p2.id AND
(other criteria)

The id and name and other criteria fields are indexed as well as they
can be. It works fine as long as I add a LIMIT statement to keep it
down to just 10 or 20 records, but if I take this off, it runs on
forever -- so I can't even tell how many potentail records there will
be. If the table grows to 100k or 200k, it'll only get worse.

So my question is -- are there any better ways to go about this?

I think '<' would be better than '<>'.

Mar 19 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: andreas.maurer1971 | last post by:
Hi all, since a few years I use the following statement to find duplicate entries in a table: SELECT t1.id, t2.id,... FROM table AS t1 INNER JOIN table AS t2 ON t1.field = t2.field WHERE...
2
by: ms | last post by:
Access 2000: I am trying to delete duplicate records imported to a staging table leaving one of the duplicates to be imported into the live table. A unique record is based on a composite key of 3...
2
by: RICHARD BROMBERG | last post by:
I have two tables of Names and addresses, CustFile1 and CustFile2. There may be duplicate records in either or both of the files and some of the records in CustFile1 may match records in...
1
by: Ike | last post by:
This may be a particularly dumb question, but I'm wrestling with it anyways. Is there a way, to discern a duplicate field among records in a table. For example, suppose I have a table such that: ...
2
by: Carroll | last post by:
I'm looking for a way in SQL to find duplicate records in a single table, that are the same based on 3 columns, regardless of what is in the other columns in the duplicate records. I would like to...
2
by: nethravathy | last post by:
Hi, The following table namely elcbtripselect contains 5147 records.I want to know wether this table contains duplicate records or not. I tried with following query 1)SELECT...
4
by: Thomas Arthur Seidel | last post by:
Hello to all, I have a small or big problem with a customer data base, where during a change of system we might have created duplicate records. This should be easy to find, you might think, but,...
2
by: nomvula | last post by:
hi guys i need some help to duplicate records on my form datasheet: here's the example of my form results: ClientLookup DateCaptured ForecastDate Description ForecastQuantity Forecast Actual UJ...
1
by: xraive | last post by:
I have a problem with this. Currently I am trying Allen's code and i am not successful. Current Design Table1 (Main Form) TravelID (PK) ApprovedBY EntreredBy BudgetCode ExpenseCode
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.