By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,911 Members | 1,171 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,911 IT Pros & Developers. It's quick & easy.

Question about deleting table duplicates

P: n/a
I was looking for thw SQL to delete dupes from a table, and came across
this. All who saw it agreed in principle, but I can't quite figure out the
logic. If we are deleting all rows whose rowid is greater than the least of
the rowids returned from creating the subset of dupes, couldn't we
inadvertently delete some non-dupes rows that were created after the last
dupe ? I mean, any row created after the last dupe would have a greater
rowid, wouldn't it ?

Here's the SQL:

delete from table_1 a
where a.rowid >
(select min(b.rowid)
from table_1 b
where b.col_dup_values = a.col_dup_values)

By the way, should the delete ALL dupes, including the originals ? That is
all rows participating in duplicity, as it is, will be gone.

Thanks,
Scott
Jul 19 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
"ScottH" <fa*********@newsgroupsonly.com> wrote in message news:<pO********************@giganews.com>...
I was looking for thw SQL to delete dupes from a table, and came across
this. All who saw it agreed in principle, but I can't quite figure out the
logic. If we are deleting all rows whose rowid is greater than the least of
the rowids returned from creating the subset of dupes, couldn't we
inadvertently delete some non-dupes rows that were created after the last
dupe ? I mean, any row created after the last dupe would have a greater
rowid, wouldn't it ?

Here's the SQL:

delete from table_1 a
where a.rowid >
(select min(b.rowid)
from table_1 b
where b.col_dup_values = a.col_dup_values)

By the way, should the delete ALL dupes, including the originals ? That is
all rows participating in duplicity, as it is, will be gone.

Thanks,
Scott


No, you will delete rows which are identical, save for their rowids.
Rowids aren't stored, they are an internal attribute of a record.
So if you have 2 identical rows, only 1 will be deleted. A *random*
one of course, because rows are inserted randomly.
If you would limit the delete with an extra where clause to one
particular case, you should see easily that only one record is
deleted.
If more rows are deleted, the where clause in the subquery is
incorrect.

Sybrand Bakker
Senior Oracle DBA
Jul 19 '05 #2

P: n/a
"ScottH" <fa*********@newsgroupsonly.com> wrote in message news:<pO********************@giganews.com>...
I was looking for thw SQL to delete dupes from a table, and came across
this. All who saw it agreed in principle, but I can't quite figure out the
logic. If we are deleting all rows whose rowid is greater than the least of
the rowids returned from creating the subset of dupes, couldn't we
inadvertently delete some non-dupes rows that were created after the last
dupe ? I mean, any row created after the last dupe would have a greater
rowid, wouldn't it ?

Here's the SQL:

delete from table_1 a
where a.rowid >
(select min(b.rowid)
from table_1 b
where b.col_dup_values = a.col_dup_values)


The condition in the subselect (b.col_dup_values = a.col_dup_values)
links the two instances (a and b) of table_1 in this SQL. The delete
statement, therefore, only affects table_1 with the condition:
(b.col_dup_values = a.col_dup_values).

This won't delete all duplicated rows, as such, but any row that is a
duplicate of a row that already exists - leaving one row where there
were several duplicates. If you wanted to do that the SQL is much
simpler (at least, simpler to follow).

DELETE FROM table_1
WHERE col_dup_values = (SELECT col_dup_values
,COUNT(*)
FROM table_1
WHERE COUNT(*)>1
GROUP BY col_dup_values)
Jul 19 '05 #3

P: n/a
Russ Bagley wrote:
"ScottH" <fa*********@newsgroupsonly.com> wrote in message news:<pO********************@giganews.com>...
I was looking for thw SQL to delete dupes from a table, and came across
this. All who saw it agreed in principle, but I can't quite figure out the
logic. If we are deleting all rows whose rowid is greater than the least of
the rowids returned from creating the subset of dupes, couldn't we
inadvertently delete some non-dupes rows that were created after the last
dupe ? I mean, any row created after the last dupe would have a greater
rowid, wouldn't it ?

Here's the SQL:

delete from table_1 a
where a.rowid >
(select min(b.rowid)
from table_1 b
where b.col_dup_values = a.col_dup_values)

The condition in the subselect (b.col_dup_values = a.col_dup_values)
links the two instances (a and b) of table_1 in this SQL. The delete
statement, therefore, only affects table_1 with the condition:
(b.col_dup_values = a.col_dup_values).

This won't delete all duplicated rows, as such, but any row that is a
duplicate of a row that already exists - leaving one row where there
were several duplicates. If you wanted to do that the SQL is much
simpler (at least, simpler to follow).

DELETE FROM table_1
WHERE col_dup_values = (SELECT col_dup_values
,COUNT(*)
FROM table_1
WHERE COUNT(*)>1
GROUP BY col_dup_values)


Hmm. This isn't how I saw it.

Pretend data set

rowid col
1 1
2 2
3 1
4 4
5 1
6 6

Ok so the sub query said:
select min(b.rowid)
from table_1 b
where b.col_dup_values = a.col_dup_values

That would return 1, right? The min row where a.col = b.col.

So plug that into the original query:

delete from table_1 a
where a.rowid >
(select min(b.rowid)
from table_1 b
where b.col_dup_values = a.col_dup_values)

You get:

delete from table_1 a
where a.rowid > (1)

Or am I missing something?

--
Mike Nugent
Programmer/Author/DBA/Admin
In search of employment, email for credentials
ne**@remove-this.illuminatus.org

Jul 19 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.