Connecting Tech Pros Worldwide Help | Site Map

Finding duplicate records

  #1  
Old August 1st, 2008, 07:15 PM
Thomas Arthur Seidel
Guest
 
Posts: n/a
Hello to all,

I have a small or big problem with a customer data base, where during a change of
system we might have created duplicate records. This should be easy to find, you
might think, but, we are talking about roughly 10000 records or less in a total
volume of 1 MIO records or more.

I have considered a strategy: The station ID and a field with something like a
sequence number are supposed to be unique during that period. The sequence number
will repeat roughly every two - four years, so in the period of three months,
which is my problematic time, the sequence number (SeqNo) must be unique for each
of the 40 stations (StationID) that I record in this data base table.

I did run a simple 'create index' on these two fields, and, as to be expected, the
routine fails, blahblah, meaning, there are duplicates. Fine. So, how can I write
a query which lists to me all pairs (StationID, SeqNo) of data, where two or more
rows are identically?

If it makes things easier, I have a copy of that data base on a separate server,
not on the live system I mean. I could strip off all data which is not related to
that period of time.

Can someone give to me a hint? Or a select statement?

TS
  #2  
Old August 1st, 2008, 07:25 PM
rhaazy
Guest
 
Posts: n/a

re: Finding duplicate records


On Aug 1, 2:07*pm, Thomas Arthur Seidel <thomas.arthur.sei...@gmx.de>
wrote:
Quote:
Hello to all,
>
I have a small or big problem with a customer data base, where during a change of
system we might have created duplicate records. This should be easy to find, you
might think, but, we are talking about roughly 10000 records or less in atotal
volume of 1 MIO records or more.
>
I have considered a strategy: The station ID and a field with something like a
sequence number are supposed to be unique during that period. The sequence number
will repeat roughly every two - four years, so in the period of three months,
which is my problematic time, the sequence number (SeqNo) must be unique for each
of the 40 stations (StationID) that I record in this data base table.
>
I did run a simple 'create index' on these two fields, and, as to be expected, the
routine fails, blahblah, meaning, there are duplicates. Fine. So, how canI write
a query which lists to me all pairs (StationID, SeqNo) of data, where twoor more
rows are identically?
>
If it makes things easier, I have a copy of that data base on a separate server,
not on the live system I mean. I could strip off all data which is not related to
that period of time.
>
Can someone give to me a hint? Or a select statement?
>
TS
You need to first figure out what you classify as a duplicate record?
If only one field in a record is different from the next record is it
a therefor not duplicate? If the only problem is a particular field
over a given period, then you simply run a procedure to count the
number of occurances for every ID over a given time. Or better yet
run a procedure that only returns fields that have a count greater
than 1.
  #3  
Old August 1st, 2008, 07:35 PM
Plamen Ratchev
Guest
 
Posts: n/a

re: Finding duplicate records


Try this:

SELECT SeqNo, StationID, COUNT(*)
FROM Table
GROUP BY SeqNo, StationID
HAVING COUNT(*) 1;


Plamen Ratchev
http://www.SQLStudio.com
  #4  
Old August 1st, 2008, 07:35 PM
Thomas Arthur Seidel
Guest
 
Posts: n/a

re: Finding duplicate records


To classify what is a duplicate record: If (StationID,SeqNo) is repeating. Mainly,
if there is more then one record of the same SeqNo for one StationId. Each of the
two values individually may repeat: Many records do come from the same station,
and every station starts with the same sequence Number (SeqNo), it starts with
"1", increments with every record, continues for years, until it is reset to "1"
again some nice and warm day, when the accountant does decide this. For the period
of three months, where I want to find my problem, for sure there is no regular way
to have two or more transactions from the same station with a repeated SeqNo. The
SeqNo must be unique for each station during this time.
TS



rhaazy wrote:
Quote:
On Aug 1, 2:07 pm, Thomas Arthur Seidel <thomas.arthur.sei...@gmx.de>
wrote:
Quote:
>Hello to all,
>>
>I have a small or big problem with a customer data base, where during a change of
>system we might have created duplicate records. This should be easy to find, you
>might think, but, we are talking about roughly 10000 records or less in a total
>volume of 1 MIO records or more.
>>
>I have considered a strategy: The station ID and a field with something like a
>sequence number are supposed to be unique during that period. The sequence number
>will repeat roughly every two - four years, so in the period of three months,
>which is my problematic time, the sequence number (SeqNo) must be unique for each
>of the 40 stations (StationID) that I record in this data base table.
>>
>I did run a simple 'create index' on these two fields, and, as to be expected, the
>routine fails, blahblah, meaning, there are duplicates. Fine. So, how can I write
>a query which lists to me all pairs (StationID, SeqNo) of data, where two or more
>rows are identically?
>>
>If it makes things easier, I have a copy of that data base on a separate server,
>not on the live system I mean. I could strip off all data which is not related to
>that period of time.
>>
>Can someone give to me a hint? Or a select statement?
>>
>TS
>
You need to first figure out what you classify as a duplicate record?
If only one field in a record is different from the next record is it
a therefor not duplicate? If the only problem is a particular field
over a given period, then you simply run a procedure to count the
number of occurances for every ID over a given time. Or better yet
run a procedure that only returns fields that have a count greater
than 1.
  #5  
Old August 1st, 2008, 08:15 PM
Thomas Arthur Seidel
Guest
 
Posts: n/a

re: Finding duplicate records


Works !
Unfortunately it proved also, that my idea of duplicate records was false.

TS

Plamen Ratchev wrote:
Quote:
Try this:
>
SELECT SeqNo, StationID, COUNT(*)
FROM Table
GROUP BY SeqNo, StationID
HAVING COUNT(*) 1;
>
>
Plamen Ratchev
http://www.SQLStudio.com
Closed Thread


Similar Threads
Thread Thread Starter Forum Replies Last Post
Cause of Duplicate Records? sierra7 answers 3 November 23rd, 2007 11:51 PM
finding duplicate records rgurganus@citidc.com answers 1 March 19th, 2007 09:35 AM
Finding Duplicate Records Carroll answers 2 August 30th, 2006 05:35 PM
select all the duplicate records atse answers 9 July 19th, 2005 09:05 AM