473,887 Members | 2,309 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

optimizing a query to delete duplicates

I have a DELETE statement that deletes duplicate data from a table. It
takes a long time to execute, so I thought I'd seek advice here. The
structure of the table is little funny. The following is NOT the table,
but the representation of the data in the table:

+-----------+
| a | b |
+-----+-----+
| 123 | 234 |
| 345 | 456 |
| 123 | 123 |
+-----+-----+

As you can see, the data is tabular. This is how it is stored in the table:

+-----+-----------+------------+
| Row | FieldName | FieldValue |
+-----+-----------+------------+
| 1 | a | 123 |
| 1 | b | 234 |
| 2 | a | 345 |
| 2 | b | 456 |
| 3 | a | 123 |
| 3 | b | 234 |
+-----+-----------+------------+

What I need is to delete all records having the same "Row" when there exists
the same set of records with a different (smaller, to be precise) "Row".
Using the example above, what I need to get is:

+-----+-----------+------------+
| Row | FieldName | FieldValue |
+-----+-----------+------------+
| 1 | a | 123 |
| 1 | b | 234 |
| 2 | a | 345 |
| 2 | b | 456 |
+-----+-----------+------------+

A slow way of doing this seem to be:

DELETE FROM X
WHERE Row IN
(SELECT DISTINCT Row FROM X x1
WHERE EXISTS
(SELECT * FROM X x2
WHERE x2.Row < x1.Row
AND NOT EXISTS
(SELECT * FROM X x3
WHERE x3.Row = x2.Row
AND x3.FieldName = x2.FieldName
AND x3.FieldValue <> x1.FieldValue)) )

Can this be done faster, better, and cheaper?
Jul 20 '05 #1
3 2423
my knee-jerk reaction is:

Why is it important to optimize it? I think you should delete the
duplicates, then create a constraint that prevents them from recurring.

If, for some reason, you are unable to fix the application that creates
these duplicates, and creating a constraint causes errors in the application
that you can't tolerate, then I suppose an alternative would be to create a
trigger that deletes them upon entry. Having a composite index on the
columns that are being duplicated would enable such a trigger to run
quickly.

But looking at your query, I find it strangely complex.

Why not just:

DELETE FROM X
WHERE EXISTS (SELECT * FROM X x2
WHERE x2.Row < x.Row
AND X.FieldName = x2.FieldName
AND X.FieldValue = x2.FieldValue)

Am I missing something? Your NOT EXISTS has me a bit confused... I think it
might delete data in situations other than described.

Also, NOT EXISTS is generally slow.
Jul 20 '05 #2
On 2004-07-15, Aaron W. West <ta******@hotma il.NO.SPAM> wrote:
Why is it important to optimize it? I think you should delete the
duplicates, then create a constraint that prevents them from recurring.
Such constraint may not be created. This table is a temporary table, where
data from an input file is loaded. Duplicate sets of records must be
deleted because the data then goes into permanent tables. Those table have
constraints against duplicates.
But looking at your query, I find it strangely complex.
Me too. I'm trying to improve it. Its complexity seems to hinder its
performance.
Why not just:

DELETE FROM X
WHERE EXISTS (SELECT * FROM X x2
WHERE x2.Row < x.Row
AND X.FieldName = x2.FieldName
AND X.FieldValue = x2.FieldValue)


This would delete records that should not be deleted. Here's an example:

+-----+-----------+------------+
| Row | FieldName | FieldValue |
+-----+-----------+------------+
| 1 | a | 123 |
| 1 | b | 234 |
| 2 | a | 345 |
| 2 | b | 456 |
| 3 | a | 123 |
| 3 | b | 666 |
+-----+-----------+------------+

Here the combination of values for "a" and "b" on every "Row" is
different. There are no duplicates here. The query that you proposed would
delete the second to last row

+-----+-----------+------------+
| 3 | a | 123 |
+-----+-----------+------------+

because it has the same FieldName and FieldValue as the first row.

Think of it the data this way:

+-----+-----+
| a | b |
+-----+-----+
| 123 | 234 |
| 345 | 456 |
| 123 | 666 |
+-----+-----+

No duplicate rows here.
Jul 20 '05 #3
Hi

You could try only selecting the correct data when you move it into the
permanent tables. But the following may work better:

DELETE FROM X1
FROM X X1 JOIN X X2
ON x2.Row < x1.Row
AND x1.Fieldvalue = x2.Fieldvalue
AND x1.FieldName = x2.FieldName

John
"Alexander Anderson" <no@spam.com> wrote in message
news:sl******** *********@Toron to-HSE-ppp3682122.symp atico.ca...
I have a DELETE statement that deletes duplicate data from a table. It
takes a long time to execute, so I thought I'd seek advice here. The
structure of the table is little funny. The following is NOT the table,
but the representation of the data in the table:

+-----------+
| a | b |
+-----+-----+
| 123 | 234 |
| 345 | 456 |
| 123 | 123 |
+-----+-----+

As you can see, the data is tabular. This is how it is stored in the table:
+-----+-----------+------------+
| Row | FieldName | FieldValue |
+-----+-----------+------------+
| 1 | a | 123 |
| 1 | b | 234 |
| 2 | a | 345 |
| 2 | b | 456 |
| 3 | a | 123 |
| 3 | b | 234 |
+-----+-----------+------------+

What I need is to delete all records having the same "Row" when there exists the same set of records with a different (smaller, to be precise) "Row".
Using the example above, what I need to get is:

+-----+-----------+------------+
| Row | FieldName | FieldValue |
+-----+-----------+------------+
| 1 | a | 123 |
| 1 | b | 234 |
| 2 | a | 345 |
| 2 | b | 456 |
+-----+-----------+------------+

A slow way of doing this seem to be:

DELETE FROM X
WHERE Row IN
(SELECT DISTINCT Row FROM X x1
WHERE EXISTS
(SELECT * FROM X x2
WHERE x2.Row < x1.Row
AND NOT EXISTS
(SELECT * FROM X x3
WHERE x3.Row = x2.Row
AND x3.FieldName = x2.FieldName
AND x3.FieldValue <> x1.FieldValue)) )

Can this be done faster, better, and cheaper?

Jul 20 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
10810
by: Patrizio | last post by:
Hi All, I've the following table with a PK defined on an IDENTITY column (INSERT_SEQ): CREATE TABLE MYDATA ( MID NUMERIC(19,0) NOT NULL, MYVALUE FLOAT NOT NULL, TIMEKEY INTEGER NOT NULL, TIMEKEY_DTTM DATETIME NULL,
2
5003
by: ms | last post by:
Access 2000: I am trying to delete duplicate records imported to a staging table leaving one of the duplicates to be imported into the live table. A unique record is based on a composite key of 3 fields (vehicleID, BattID, and ChgHrs). VehicleID and BattID are a TEXT datatype and ChrHrs are a number(long int.) datatype. Since records to be imported can have duplicate records of the composite key I need to clean all but one of the...
16
4198
by: tyrfboard | last post by:
I've been searching for awhile now on how to remove duplicates from a table within an Access db and have found plenty of articles on finding or deleting duplicates. All I want to do is remove them from within an SQL query - leaving one of the records behind of course. I have a mailing list comprised of a union query that gets records from two separate tables. I want to be able to run a query that removes one (or more) of the duplicated...
6
4601
by: das | last post by:
Hello all, I have a table with thousands of rows and is in this format: id col1 col2 col3 col4 --- ------ ----- ------ ------ 1 nm 78 xyz pir 2 bn 45 abc dir I now want to get the data from this table in this format:
1
2033
by: JC | last post by:
Hello, I am trying to change a select, find duplicates, query into a delete query. I want to get rid of the records in the main table "tblHurnsHistory." I changed the Find Duplicates query to a Delete query. I then received an input box looking for criteria that states," Enter parameter value qryHurnsHistory.Grads_creditsEarned
16
3533
by: ARC | last post by:
Hello all, So I'm knee deep in this import utility program, and am coming up with all sorts of "gotcha's!". 1st off. On a "Find Duplicates Query", does anyone have a good solution for renaming the duplicate records? My thinking was to take the results of the duplicate query, and somehow have it number each line where there is a duplicate (tried a groups query, but "count" won't work), then do an update query to change the duplicate to...
1
1905
by: bcquadrant | last post by:
Hi there I am trying to make an update query that will update information in all occurances of duplicate records based on the First record for that duplicate set. There are about 30,000 records in the table, and of them without the duplicates there is only about 4000 records.The duplicates are essential to work we do, so we cannot delete them. I have a make table query that selects only the first record for a duplicate set based on the...
2
15958
by: fperri | last post by:
Hello, I have a query that creates a table of duplicates in my database. For one set of duplicates (three records), all the fields & values are the same except for the values in one field. This field is a numeric field. I need to figure out a way to remove all the duplicates except for the one with the highest value in that field. I don't want to have to create a module for this. I want to run it all from a stored query. I was thinking the...
0
1310
by: csin | last post by:
I have an Access DB backend for the application I am running, I want to use the built in ability in Access to remove duplicate entries... Say I have table1 with fields field1 field2 and field3, none of the fields are indexed without duplicates for a reason... The way I remove duplicates is to do an append query into an identical table that has one of the fields set to indexed no duplicates. If I run this query in Access directly I get a...
0
9799
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11173
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10771
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9593
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7988
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7143
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5810
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
6011
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
4239
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.