473,241 Members | 1,389 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,241 software developers and data experts.

Best way for finding duplicate entries in table?

Hi all,

since a few years I use the following statement to find duplicate
entries in a table:

SELECT t1.id, t2.id,...
FROM table AS t1 INNER JOIN table AS t2
ON t1.field = t2.field
WHERE t1.id < t2.id

Of course Indexes are set on id (Primary Key, Autoincrement) and field
(normal Index) and are, according to EXPLAIN, used (index "field").

This statement works quite well but takes ages (> 10 minutes) of time
when using it on a table with tenthousands of rows.

Now my question:
Are there better and especially faster ways to find duplicate entries
within a table without having to create a seperate table which will be
filled with a SELECT DISTINCT... statement from the "old" table?

Thank's in advance for your help and time,

Andy Maurer

Jul 23 '05 #1
3 6889
an****************@web.de wrote:
Now my question:
Are there better and especially faster ways to find duplicate entries
within a table without having to create a seperate table which will be
filled with a SELECT DISTINCT... statement from the "old" table?


select id,count(field) as amount
from table group by field having amount > 1;
Jul 23 '05 #2


Aggro schrieb:
an****************@web.de wrote:
Now my question:
Are there better and especially faster ways to find duplicate entries
within a table without having to create a seperate table which will be
filled with a SELECT DISTINCT... statement from the "old" table?


select id,count(field) as amount
from table group by field having amount > 1;


Hi Aggro,

thank you very much for your help, it shows me duplicate entries but,
as well due to a not better specified question of mine, some problems
are left open.
First of all your statement, when issued in phpmyadmin, tells me that
it found in total all rows of the table. Do you have any idea, why?
Another problem I have is: It shows me the duplicate entries in "field"
but only one of the many keys related to this entry. Therefore, if I
wanted to delete all but one of these entries, I have to issue a lot of
SELECT and DELETE statements in order to clean my table.

Of course I always could create another (temporary) table and fill it
with DISTINCT entries of table1 but this is sometimes not an option for
me.

Do you have another idea how to fix this?

Thank's again,

Andy

Jul 23 '05 #3
an****************@web.de wrote:
select id,count(field) as amount
from table group by field having amount > 1;
Another problem I have is: It shows me the duplicate entries in "field"
but only one of the many keys related to this entry. Therefore, if I
wanted to delete all but one of these entries, I have to issue a lot of
SELECT and DELETE statements in order to clean my table.


This should give the list of fields that occur more than once in the
database:

SELECT t2.`field`
FROM `table` AS t2
GROUP BY t2.`field`
HAVING COUNT(t2.`field`) > 1;

This should give the list of id's in whose records those values for
field occur:

SELECT t.id
FROM `table` AS t
WHERE t.`field` IN (
SELECT t2.`field`
FROM `table` AS t2
GROUP BY t2.`field`
HAVING COUNT(t2.`field`) > 1);

This should give the list of the lowest id for each given value in field:

SELECT MIN(t3.id)
FROM `table` AS t3
GROUP BY t3.`field`;

Putting it all together, this should delete records who contain values
for field that appear on more than one row, but do not delete such
records with the lowest id for a given value of `field`:

DELETE FROM `table`
WHERE id IN (
SELECT t.id
FROM `table` AS t
WHERE t.`field` IN (
SELECT t2.`field`
FROM `table` AS t2
GROUP BY t2.`field`
HAVING COUNT(t2.`field`) > 1))
AND id NOT IN (
SELECT MIN(t3.id)
FROM `table` AS t3
GROUP BY t3.`field`);

I'm assuming that you want to keep each record with the lowest id value.
In order to do this in a single operation, you must have some kind of
uniform policy like this, that applies to all cases. You could use
MAX(t3.id) instead to delete all but the most recent records, but other
than those two options, the alternatives get more complex to implement.

Regards,
Bill K.
Jul 23 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Gary Lundquest | last post by:
I have an application with MS Access 2000 as the front end and MySQL as the back end. All was well until I upgraded the MySQL (Linux) server. The Problem: I insert data into a cumulative table....
18
by: Elroyskimms | last post by:
I have a table using an identity column as its Primary Key and two columns (table reduced for simplicity) EmployeeNumber and ArrivalTime. CREATE TABLE ( IDENTITY (1, 1) NOT NULL , (10)...
0
by: Trey | last post by:
I am trying to make a database for tracking calls, the caller gets one free call (per 12 months), then is charged a flat fee for every call after the first call in a 12 month time span. The data is...
0
by: Cyberwolf | last post by:
OK, how to explain this. I have a table that will feed into another table using an append query. What I want to do is look at the table I am appending the record to to check for a duplicate...
2
by: Carroll | last post by:
I'm looking for a way in SQL to find duplicate records in a single table, that are the same based on 3 columns, regardless of what is in the other columns in the duplicate records. I would like to...
2
by: mivey4 | last post by:
Okay I have 2 tables: Table A - holds a list of new hardware serial numbers and their corresponding model (no constraints or indexes) Table B - holds a distinct list of current serial numbers...
8
by: desirocks | last post by:
Hi all, I am somewhat new to sql server. So help me in whatever way you can. I have two tables one of which doesnt have a primary key(table A) and other has a composite key formed of two...
4
by: ramdil | last post by:
Hi All I have table and it have around 90000 records.Its primary key is autonumber field and it has also have date column and name, then some other columns Now i have problem with the table,as my...
7
by: php_mysql_beginer911 | last post by:
Hi .. i am trying to update a table where if field contents any duplictaed entries than one of the field should be updated with random number which is unique so i can make all entries unique i...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.