473,320 Members | 2,088 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Best way for finding duplicate entries in table?

Hi all,

since a few years I use the following statement to find duplicate
entries in a table:

SELECT t1.id, t2.id,...
FROM table AS t1 INNER JOIN table AS t2
ON t1.field = t2.field
WHERE t1.id < t2.id

Of course Indexes are set on id (Primary Key, Autoincrement) and field
(normal Index) and are, according to EXPLAIN, used (index "field").

This statement works quite well but takes ages (> 10 minutes) of time
when using it on a table with tenthousands of rows.

Now my question:
Are there better and especially faster ways to find duplicate entries
within a table without having to create a seperate table which will be
filled with a SELECT DISTINCT... statement from the "old" table?

Thank's in advance for your help and time,

Andy Maurer

Jul 23 '05 #1
3 6894
an****************@web.de wrote:
Now my question:
Are there better and especially faster ways to find duplicate entries
within a table without having to create a seperate table which will be
filled with a SELECT DISTINCT... statement from the "old" table?


select id,count(field) as amount
from table group by field having amount > 1;
Jul 23 '05 #2


Aggro schrieb:
an****************@web.de wrote:
Now my question:
Are there better and especially faster ways to find duplicate entries
within a table without having to create a seperate table which will be
filled with a SELECT DISTINCT... statement from the "old" table?


select id,count(field) as amount
from table group by field having amount > 1;


Hi Aggro,

thank you very much for your help, it shows me duplicate entries but,
as well due to a not better specified question of mine, some problems
are left open.
First of all your statement, when issued in phpmyadmin, tells me that
it found in total all rows of the table. Do you have any idea, why?
Another problem I have is: It shows me the duplicate entries in "field"
but only one of the many keys related to this entry. Therefore, if I
wanted to delete all but one of these entries, I have to issue a lot of
SELECT and DELETE statements in order to clean my table.

Of course I always could create another (temporary) table and fill it
with DISTINCT entries of table1 but this is sometimes not an option for
me.

Do you have another idea how to fix this?

Thank's again,

Andy

Jul 23 '05 #3
an****************@web.de wrote:
select id,count(field) as amount
from table group by field having amount > 1;
Another problem I have is: It shows me the duplicate entries in "field"
but only one of the many keys related to this entry. Therefore, if I
wanted to delete all but one of these entries, I have to issue a lot of
SELECT and DELETE statements in order to clean my table.


This should give the list of fields that occur more than once in the
database:

SELECT t2.`field`
FROM `table` AS t2
GROUP BY t2.`field`
HAVING COUNT(t2.`field`) > 1;

This should give the list of id's in whose records those values for
field occur:

SELECT t.id
FROM `table` AS t
WHERE t.`field` IN (
SELECT t2.`field`
FROM `table` AS t2
GROUP BY t2.`field`
HAVING COUNT(t2.`field`) > 1);

This should give the list of the lowest id for each given value in field:

SELECT MIN(t3.id)
FROM `table` AS t3
GROUP BY t3.`field`;

Putting it all together, this should delete records who contain values
for field that appear on more than one row, but do not delete such
records with the lowest id for a given value of `field`:

DELETE FROM `table`
WHERE id IN (
SELECT t.id
FROM `table` AS t
WHERE t.`field` IN (
SELECT t2.`field`
FROM `table` AS t2
GROUP BY t2.`field`
HAVING COUNT(t2.`field`) > 1))
AND id NOT IN (
SELECT MIN(t3.id)
FROM `table` AS t3
GROUP BY t3.`field`);

I'm assuming that you want to keep each record with the lowest id value.
In order to do this in a single operation, you must have some kind of
uniform policy like this, that applies to all cases. You could use
MAX(t3.id) instead to delete all but the most recent records, but other
than those two options, the alternatives get more complex to implement.

Regards,
Bill K.
Jul 23 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Gary Lundquest | last post by:
I have an application with MS Access 2000 as the front end and MySQL as the back end. All was well until I upgraded the MySQL (Linux) server. The Problem: I insert data into a cumulative table....
18
by: Elroyskimms | last post by:
I have a table using an identity column as its Primary Key and two columns (table reduced for simplicity) EmployeeNumber and ArrivalTime. CREATE TABLE ( IDENTITY (1, 1) NOT NULL , (10)...
0
by: Trey | last post by:
I am trying to make a database for tracking calls, the caller gets one free call (per 12 months), then is charged a flat fee for every call after the first call in a 12 month time span. The data is...
0
by: Cyberwolf | last post by:
OK, how to explain this. I have a table that will feed into another table using an append query. What I want to do is look at the table I am appending the record to to check for a duplicate...
2
by: Carroll | last post by:
I'm looking for a way in SQL to find duplicate records in a single table, that are the same based on 3 columns, regardless of what is in the other columns in the duplicate records. I would like to...
2
by: mivey4 | last post by:
Okay I have 2 tables: Table A - holds a list of new hardware serial numbers and their corresponding model (no constraints or indexes) Table B - holds a distinct list of current serial numbers...
8
by: desirocks | last post by:
Hi all, I am somewhat new to sql server. So help me in whatever way you can. I have two tables one of which doesnt have a primary key(table A) and other has a composite key formed of two...
4
by: ramdil | last post by:
Hi All I have table and it have around 90000 records.Its primary key is autonumber field and it has also have date column and name, then some other columns Now i have problem with the table,as my...
7
by: php_mysql_beginer911 | last post by:
Hi .. i am trying to update a table where if field contents any duplictaed entries than one of the field should be updated with random number which is unique so i can make all entries unique i...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.