Finding duplicate records 
August 1st, 2008, 07:15 PM
| | | |
Hello to all,
I have a small or big problem with a customer data base, where during a change of
system we might have created duplicate records. This should be easy to find, you
might think, but, we are talking about roughly 10000 records or less in a total
volume of 1 MIO records or more.
I have considered a strategy: The station ID and a field with something like a
sequence number are supposed to be unique during that period. The sequence number
will repeat roughly every two - four years, so in the period of three months,
which is my problematic time, the sequence number (SeqNo) must be unique for each
of the 40 stations (StationID) that I record in this data base table.
I did run a simple 'create index' on these two fields, and, as to be expected, the
routine fails, blahblah, meaning, there are duplicates. Fine. So, how can I write
a query which lists to me all pairs (StationID, SeqNo) of data, where two or more
rows are identically?
If it makes things easier, I have a copy of that data base on a separate server,
not on the live system I mean. I could strip off all data which is not related to
that period of time.
Can someone give to me a hint? Or a select statement?
TS | 
August 1st, 2008, 07:25 PM
| | | | re: Finding duplicate records
On Aug 1, 2:07*pm, Thomas Arthur Seidel <thomas.arthur.sei...@gmx.de>
wrote: Quote:
Hello to all,
>
I have a small or big problem with a customer data base, where during a change of
system we might have created duplicate records. This should be easy to find, you
might think, but, we are talking about roughly 10000 records or less in atotal
volume of 1 MIO records or more.
>
I have considered a strategy: The station ID and a field with something like a
sequence number are supposed to be unique during that period. The sequence number
will repeat roughly every two - four years, so in the period of three months,
which is my problematic time, the sequence number (SeqNo) must be unique for each
of the 40 stations (StationID) that I record in this data base table.
>
I did run a simple 'create index' on these two fields, and, as to be expected, the
routine fails, blahblah, meaning, there are duplicates. Fine. So, how canI write
a query which lists to me all pairs (StationID, SeqNo) of data, where twoor more
rows are identically?
>
If it makes things easier, I have a copy of that data base on a separate server,
not on the live system I mean. I could strip off all data which is not related to
that period of time.
>
Can someone give to me a hint? Or a select statement?
>
TS
| You need to first figure out what you classify as a duplicate record?
If only one field in a record is different from the next record is it
a therefor not duplicate? If the only problem is a particular field
over a given period, then you simply run a procedure to count the
number of occurances for every ID over a given time. Or better yet
run a procedure that only returns fields that have a count greater
than 1. | 
August 1st, 2008, 07:35 PM
| | | | re: Finding duplicate records
Try this:
SELECT SeqNo, StationID, COUNT(*)
FROM Table
GROUP BY SeqNo, StationID
HAVING COUNT(*) 1;
Plamen Ratchev http://www.SQLStudio.com | 
August 1st, 2008, 07:35 PM
| | | | re: Finding duplicate records
To classify what is a duplicate record: If (StationID,SeqNo) is repeating. Mainly,
if there is more then one record of the same SeqNo for one StationId. Each of the
two values individually may repeat: Many records do come from the same station,
and every station starts with the same sequence Number (SeqNo), it starts with
"1", increments with every record, continues for years, until it is reset to "1"
again some nice and warm day, when the accountant does decide this. For the period
of three months, where I want to find my problem, for sure there is no regular way
to have two or more transactions from the same station with a repeated SeqNo. The
SeqNo must be unique for each station during this time.
TS
rhaazy wrote: Quote:
On Aug 1, 2:07 pm, Thomas Arthur Seidel <thomas.arthur.sei...@gmx.de>
wrote: Quote:
>Hello to all,
>>
>I have a small or big problem with a customer data base, where during a change of
>system we might have created duplicate records. This should be easy to find, you
>might think, but, we are talking about roughly 10000 records or less in a total
>volume of 1 MIO records or more.
>>
>I have considered a strategy: The station ID and a field with something like a
>sequence number are supposed to be unique during that period. The sequence number
>will repeat roughly every two - four years, so in the period of three months,
>which is my problematic time, the sequence number (SeqNo) must be unique for each
>of the 40 stations (StationID) that I record in this data base table.
>>
>I did run a simple 'create index' on these two fields, and, as to be expected, the
>routine fails, blahblah, meaning, there are duplicates. Fine. So, how can I write
>a query which lists to me all pairs (StationID, SeqNo) of data, where two or more
>rows are identically?
>>
>If it makes things easier, I have a copy of that data base on a separate server,
>not on the live system I mean. I could strip off all data which is not related to
>that period of time.
>>
>Can someone give to me a hint? Or a select statement?
>>
>TS
| >
You need to first figure out what you classify as a duplicate record?
If only one field in a record is different from the next record is it
a therefor not duplicate? If the only problem is a particular field
over a given period, then you simply run a procedure to count the
number of occurances for every ID over a given time. Or better yet
run a procedure that only returns fields that have a count greater
than 1.
| | 
August 1st, 2008, 08:15 PM
| | | | re: Finding duplicate records
Works !
Unfortunately it proved also, that my idea of duplicate records was false.
TS
Plamen Ratchev wrote: Quote:
Try this:
>
SELECT SeqNo, StationID, COUNT(*)
FROM Table
GROUP BY SeqNo, StationID
HAVING COUNT(*) 1;
>
>
Plamen Ratchev http://www.SQLStudio.com | |  | | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 225,662 network members.
|