optimizing a query to delete duplicates

Alexander Anderson

I have a DELETE statement that deletes duplicate data from a table. It
takes a long time to execute, so I thought I'd seek advice here. The
structure of the table is little funny. The following is NOT the table,
but the representation of the data in the table:

+-----------+
| a | b |
+-----+-----+
| 123 | 234 |
| 345 | 456 |
| 123 | 123 |
+-----+-----+

As you can see, the data is tabular. This is how it is stored in the table:

+-----+-----------+------------+
| Row | FieldName | FieldValue |
+-----+-----------+------------+
| 1 | a | 123 |
| 1 | b | 234 |
| 2 | a | 345 |
| 2 | b | 456 |
| 3 | a | 123 |
| 3 | b | 234 |
+-----+-----------+------------+

What I need is to delete all records having the same "Row" when there exists
the same set of records with a different (smaller, to be precise) "Row".
Using the example above, what I need to get is:

+-----+-----------+------------+
| Row | FieldName | FieldValue |
+-----+-----------+------------+
| 1 | a | 123 |
| 1 | b | 234 |
| 2 | a | 345 |
| 2 | b | 456 |
+-----+-----------+------------+

A slow way of doing this seem to be:

DELETE FROM X
WHERE Row IN
(SELECT DISTINCT Row FROM X x1
WHERE EXISTS
(SELECT * FROM X x2
WHERE x2.Row < x1.Row
AND NOT EXISTS
(SELECT * FROM X x3
WHERE x3.Row = x2.Row
AND x3.FieldName = x2.FieldName
AND x3.FieldValue <> x1.FieldValue)) )

Can this be done faster, better, and cheaper?

Jul 20 '05 #1

Subscribe Reply

2420

Aaron W. West

my knee-jerk reaction is:

Why is it important to optimize it? I think you should delete the
duplicates, then create a constraint that prevents them from recurring.

If, for some reason, you are unable to fix the application that creates
these duplicates, and creating a constraint causes errors in the application
that you can't tolerate, then I suppose an alternative would be to create a
trigger that deletes them upon entry. Having a composite index on the
columns that are being duplicated would enable such a trigger to run
quickly.

But looking at your query, I find it strangely complex.

Why not just:

DELETE FROM X
WHERE EXISTS (SELECT * FROM X x2
WHERE x2.Row < x.Row
AND X.FieldName = x2.FieldName
AND X.FieldValue = x2.FieldValue)

Am I missing something? Your NOT EXISTS has me a bit confused... I think it
might delete data in situations other than described.

Also, NOT EXISTS is generally slow.

Jul 20 '05 #2

Alexander Anderson

On 2004-07-15, Aaron W. West <ta******@hotma il.NO.SPAM> wrote:

Why is it important to optimize it? I think you should delete the
duplicates, then create a constraint that prevents them from recurring.
Such constraint may not be created. This table is a temporary table, where
data from an input file is loaded. Duplicate sets of records must be
deleted because the data then goes into permanent tables. Those table have
constraints against duplicates.
But looking at your query, I find it strangely complex.
Me too. I'm trying to improve it. Its complexity seems to hinder its
performance.
Why not just:

DELETE FROM X
WHERE EXISTS (SELECT * FROM X x2
WHERE x2.Row < x.Row
AND X.FieldName = x2.FieldName
AND X.FieldValue = x2.FieldValue)

This would delete records that should not be deleted. Here's an example:

+-----+-----------+------------+
| Row | FieldName | FieldValue |
+-----+-----------+------------+
| 1 | a | 123 |
| 1 | b | 234 |
| 2 | a | 345 |
| 2 | b | 456 |
| 3 | a | 123 |
| 3 | b | 666 |
+-----+-----------+------------+

Here the combination of values for "a" and "b" on every "Row" is
different. There are no duplicates here. The query that you proposed would
delete the second to last row

+-----+-----------+------------+
| 3 | a | 123 |
+-----+-----------+------------+

because it has the same FieldName and FieldValue as the first row.

Think of it the data this way:

+-----+-----+
| a | b |
+-----+-----+
| 123 | 234 |
| 345 | 456 |
| 123 | 666 |
+-----+-----+

No duplicate rows here.

Jul 20 '05 #3

John Bell

Hi

You could try only selecting the correct data when you move it into the
permanent tables. But the following may work better:

DELETE FROM X1
FROM X X1 JOIN X X2
ON x2.Row < x1.Row
AND x1.Fieldvalue = x2.Fieldvalue
AND x1.FieldName = x2.FieldName

John
"Alexander Anderson" <no@spam.com> wrote in message
news:sl******** *********@Toron to-HSE-ppp3682122.symp atico.ca...

I have a DELETE statement that deletes duplicate data from a table. It
takes a long time to execute, so I thought I'd seek advice here. The
structure of the table is little funny. The following is NOT the table,
but the representation of the data in the table:

+-----------+
| a | b |
+-----+-----+
| 123 | 234 |
| 345 | 456 |
| 123 | 123 |
+-----+-----+

As you can see, the data is tabular. This is how it is stored in the table:
+-----+-----------+------------+
| Row | FieldName | FieldValue |
+-----+-----------+------------+
| 1 | a | 123 |
| 1 | b | 234 |
| 2 | a | 345 |
| 2 | b | 456 |
| 3 | a | 123 |
| 3 | b | 234 |
+-----+-----------+------------+

What I need is to delete all records having the same "Row" when there exists the same set of records with a different (smaller, to be precise) "Row".
Using the example above, what I need to get is:

+-----+-----------+------------+
| Row | FieldName | FieldValue |
+-----+-----------+------------+
| 1 | a | 123 |
| 1 | b | 234 |
| 2 | a | 345 |
| 2 | b | 456 |
+-----+-----------+------------+

A slow way of doing this seem to be:

DELETE FROM X
WHERE Row IN
(SELECT DISTINCT Row FROM X x1
WHERE EXISTS
(SELECT * FROM X x2
WHERE x2.Row < x1.Row
AND NOT EXISTS
(SELECT * FROM X x3
WHERE x3.Row = x2.Row
AND x3.FieldName = x2.FieldName
AND x3.FieldValue <> x1.FieldValue)) )

Can this be done faster, better, and cheaper?

Jul 20 '05 #4

Similar topics

10809

Delete duplicate records from huge tables

by: Patrizio | last post by:

Hi All, I've the following table with a PK defined on an IDENTITY column (INSERT_SEQ): CREATE TABLE MYDATA ( MID NUMERIC(19,0) NOT NULL, MYVALUE FLOAT NOT NULL, TIMEKEY INTEGER NOT NULL, TIMEKEY_DTTM DATETIME NULL,

Microsoft SQL Server

4992

Query to delete duplicate records in staging table

by: ms | last post by:

Access 2000: I am trying to delete duplicate records imported to a staging table leaving one of the duplicates to be imported into the live table. A unique record is based on a composite key of 3 fields (vehicleID, BattID, and ChgHrs). VehicleID and BattID are a TEXT datatype and ChrHrs are a number(long int.) datatype. Since records to be imported can have duplicate records of the composite key I need to clean all but one of the...

Microsoft Access / VBA

4185

Removing duplicates from query, but not from table

by: tyrfboard | last post by:

I've been searching for awhile now on how to remove duplicates from a table within an Access db and have found plenty of articles on finding or deleting duplicates. All I want to do is remove them from within an SQL query - leaving one of the records behind of course. I have a mailing list comprised of a union query that gets records from two separate tables. I want to be able to run a query that removes one (or more) of the duplicated...

Microsoft Access / VBA

4597

Optimizing SQL - Union

by: das | last post by:

Hello all, I have a table with thousands of rows and is in this format: id col1 col2 col3 col4 --- ------ ----- ------ ------ 1 nm 78 xyz pir 2 bn 45 abc dir I now want to get the data from this table in this format:

Microsoft SQL Server

2026

Changing a duplicate select query into a delete query, help!!

by: JC | last post by:

Hello, I am trying to change a select, find duplicates, query into a delete query. I want to get rid of the records in the main table "tblHurnsHistory." I changed the Find Duplicates query to a Delete query. I then received an input box looking for criteria that states," Enter parameter value qryHurnsHistory.Grads_creditsEarned

Microsoft Access / VBA

3519

duplicates query help & strategy for update queries with SetWarnings = False

by: ARC | last post by:

Hello all, So I'm knee deep in this import utility program, and am coming up with all sorts of "gotcha's!". 1st off. On a "Find Duplicates Query", does anyone have a good solution for renaming the duplicate records? My thinking was to take the results of the duplicate query, and somehow have it number each line where there is a duplicate (tried a groups query, but "count" won't work), then do an update query to change the duplicate to...

Microsoft Access / VBA

1902

Duplicates Updates Query

by: bcquadrant | last post by:

Hi there I am trying to make an update query that will update information in all occurances of duplicate records based on the First record for that duplicate set. There are about 30,000 records in the table, and of them without the duplicates there is only about 4000 records.The duplicates are essential to work we do, so we cannot delete them. I have a make table query that selects only the first record for a duplicate set based on the...

Microsoft Access / VBA

15951

SQL Query to Select Largest Value

by: fperri | last post by:

Hello, I have a query that creates a table of duplicates in my database. For one set of duplicates (three records), all the fields & values are the same except for the values in one field. This field is a numeric field. I need to figure out a way to remove all the duplicates except for the one with the highest value in that field. I don't want to have to create a module for this. I want to run it all from a stored query. I was thinking the...

Microsoft Access / VBA

1304

vb.net Appen Query to Access DB

by: csin | last post by:

I have an Access DB backend for the application I am running, I want to use the built in ability in Access to remove duplicate entries... Say I have table1 with fields field1 field2 and field3, none of the fields are indexed without duplicates for a reason... The way I remove duplicates is to do an append query into an identical table that has one of the fields set to indexed no duplicates. If I run this query in Access directly I get a...

.NET Framework

9595

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9432

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10008

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9873

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

8891

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7420

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6682

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5313

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

2822

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General