473,390 Members | 1,348 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,390 software developers and data experts.

Finding duplicate records

Hello to all,

I have a small or big problem with a customer data base, where during a change of
system we might have created duplicate records. This should be easy to find, you
might think, but, we are talking about roughly 10000 records or less in a total
volume of 1 MIO records or more.

I have considered a strategy: The station ID and a field with something like a
sequence number are supposed to be unique during that period. The sequence number
will repeat roughly every two - four years, so in the period of three months,
which is my problematic time, the sequence number (SeqNo) must be unique for each
of the 40 stations (StationID) that I record in this data base table.

I did run a simple 'create index' on these two fields, and, as to be expected, the
routine fails, blahblah, meaning, there are duplicates. Fine. So, how can I write
a query which lists to me all pairs (StationID, SeqNo) of data, where two or more
rows are identically?

If it makes things easier, I have a copy of that data base on a separate server,
not on the live system I mean. I could strip off all data which is not related to
that period of time.

Can someone give to me a hint? Or a select statement?

TS
Aug 1 '08 #1
4 4173
On Aug 1, 2:07*pm, Thomas Arthur Seidel <thomas.arthur.sei...@gmx.de>
wrote:
Hello to all,

I have a small or big problem with a customer data base, where during a change of
system we might have created duplicate records. This should be easy to find, you
might think, but, we are talking about roughly 10000 records or less in atotal
volume of 1 MIO records or more.

I have considered a strategy: The station ID and a field with something like a
sequence number are supposed to be unique during that period. The sequence number
will repeat roughly every two - four years, so in the period of three months,
which is my problematic time, the sequence number (SeqNo) must be unique for each
of the 40 stations (StationID) that I record in this data base table.

I did run a simple 'create index' on these two fields, and, as to be expected, the
routine fails, blahblah, meaning, there are duplicates. Fine. So, how canI write
a query which lists to me all pairs (StationID, SeqNo) of data, where twoor more
rows are identically?

If it makes things easier, I have a copy of that data base on a separate server,
not on the live system I mean. I could strip off all data which is not related to
that period of time.

Can someone give to me a hint? Or a select statement?

TS
You need to first figure out what you classify as a duplicate record?
If only one field in a record is different from the next record is it
a therefor not duplicate? If the only problem is a particular field
over a given period, then you simply run a procedure to count the
number of occurances for every ID over a given time. Or better yet
run a procedure that only returns fields that have a count greater
than 1.
Aug 1 '08 #2
Try this:

SELECT SeqNo, StationID, COUNT(*)
FROM Table
GROUP BY SeqNo, StationID
HAVING COUNT(*) 1;
Plamen Ratchev
http://www.SQLStudio.com
Aug 1 '08 #3
To classify what is a duplicate record: If (StationID,SeqNo) is repeating. Mainly,
if there is more then one record of the same SeqNo for one StationId. Each of the
two values individually may repeat: Many records do come from the same station,
and every station starts with the same sequence Number (SeqNo), it starts with
"1", increments with every record, continues for years, until it is reset to "1"
again some nice and warm day, when the accountant does decide this. For the period
of three months, where I want to find my problem, for sure there is no regular way
to have two or more transactions from the same station with a repeated SeqNo. The
SeqNo must be unique for each station during this time.
TS

rhaazy wrote:
On Aug 1, 2:07 pm, Thomas Arthur Seidel <thomas.arthur.sei...@gmx.de>
wrote:
>Hello to all,

I have a small or big problem with a customer data base, where during a change of
system we might have created duplicate records. This should be easy to find, you
might think, but, we are talking about roughly 10000 records or less in a total
volume of 1 MIO records or more.

I have considered a strategy: The station ID and a field with something like a
sequence number are supposed to be unique during that period. The sequence number
will repeat roughly every two - four years, so in the period of three months,
which is my problematic time, the sequence number (SeqNo) must be unique for each
of the 40 stations (StationID) that I record in this data base table.

I did run a simple 'create index' on these two fields, and, as to be expected, the
routine fails, blahblah, meaning, there are duplicates. Fine. So, how can I write
a query which lists to me all pairs (StationID, SeqNo) of data, where two or more
rows are identically?

If it makes things easier, I have a copy of that data base on a separate server,
not on the live system I mean. I could strip off all data which is not related to
that period of time.

Can someone give to me a hint? Or a select statement?

TS

You need to first figure out what you classify as a duplicate record?
If only one field in a record is different from the next record is it
a therefor not duplicate? If the only problem is a particular field
over a given period, then you simply run a procedure to count the
number of occurances for every ID over a given time. Or better yet
run a procedure that only returns fields that have a count greater
than 1.
Aug 1 '08 #4
Works !
Unfortunately it proved also, that my idea of duplicate records was false.

TS

Plamen Ratchev wrote:
Try this:

SELECT SeqNo, StationID, COUNT(*)
FROM Table
GROUP BY SeqNo, StationID
HAVING COUNT(*) 1;
Plamen Ratchev
http://www.SQLStudio.com
Aug 1 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: atse | last post by:
Hi, My table in the database may contain duplicate records, which means except the primary key (auto_increment) field is unique, all or almost of all the fields are with the same content. How...
2
by: ms | last post by:
Access 2000: I am trying to delete duplicate records imported to a staging table leaving one of the duplicates to be imported into the live table. A unique record is based on a composite key of 3...
4
by: KT | last post by:
Is there any one click solution that would do the trick? I would like to create a button, so the person who maintains the database can perform clean up work to delete duplicate records which...
2
by: Carroll | last post by:
I'm looking for a way in SQL to find duplicate records in a single table, that are the same based on 3 columns, regardless of what is in the other columns in the duplicate records. I would like to...
1
by: rgurganus | last post by:
I'm trying to setup a way to find possible duplicate or near-duplicate records of contact people. I have a table of 50k records doing an inner join with itself, something like this: SELECT...
2
by: nethravathy | last post by:
Hi, The following table namely elcbtripselect contains 5147 records.I want to know wether this table contains duplicate records or not. I tried with following query 1)SELECT...
3
by: sierra7 | last post by:
I have found that I have duplicates in my Products table. I am using the DLookup() function to find a ProductID by a product 'criteria' and only create a new one if can not find a match e.g. ...
2
by: nomvula | last post by:
hi guys i need some help to duplicate records on my form datasheet: here's the example of my form results: ClientLookup DateCaptured ForecastDate Description ForecastQuantity Forecast Actual UJ...
6
by: Dilip1983 | last post by:
Hi All, I want to delete duplicate records from a large table. There is one index(INDEX_U1) on 4 columns(col1,col2,col3,col4) which is in unusable state. First of all when i tried to rebuild...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.