Finding duplicate records - Microsoft SQL Server

Thomas Arthur Seidel

Hello to all,

I have a small or big problem with a customer data base, where during a change of
system we might have created duplicate records. This should be easy to find, you
might think, but, we are talking about roughly 10000 records or less in a total
volume of 1 MIO records or more.

I have considered a strategy: The station ID and a field with something like a
sequence number are supposed to be unique during that period. The sequence number
will repeat roughly every two - four years, so in the period of three months,
which is my problematic time, the sequence number (SeqNo) must be unique for each
of the 40 stations (StationID) that I record in this data base table.

I did run a simple 'create index' on these two fields, and, as to be expected, the
routine fails, blahblah, meaning, there are duplicates. Fine. So, how can I write
a query which lists to me all pairs (StationID, SeqNo) of data, where two or more
rows are identically?

If it makes things easier, I have a copy of that data base on a separate server,
not on the live system I mean. I could strip off all data which is not related to
that period of time.

Can someone give to me a hint? Or a select statement?

TS

Aug 1 '08 #1

Subscribe Reply

4180

rhaazy

On Aug 1, 2:07*pm, Thomas Arthur Seidel <thomas.arthur.sei...@gmx.de>
wrote:

Hello to all,

I have a small or big problem with a customer data base, where during a change of
system we might have created duplicate records. This should be easy to find, you
might think, but, we are talking about roughly 10000 records or less in atotal
volume of 1 MIO records or more.

I have considered a strategy: The station ID and a field with something like a
sequence number are supposed to be unique during that period. The sequence number
will repeat roughly every two - four years, so in the period of three months,
which is my problematic time, the sequence number (SeqNo) must be unique for each
of the 40 stations (StationID) that I record in this data base table.

I did run a simple 'create index' on these two fields, and, as to be expected, the
routine fails, blahblah, meaning, there are duplicates. Fine. So, how canI write
a query which lists to me all pairs (StationID, SeqNo) of data, where twoor more
rows are identically?

If it makes things easier, I have a copy of that data base on a separate server,
not on the live system I mean. I could strip off all data which is not related to
that period of time.

Can someone give to me a hint? Or a select statement?

TS

You need to first figure out what you classify as a duplicate record?
If only one field in a record is different from the next record is it
a therefor not duplicate? If the only problem is a particular field
over a given period, then you simply run a procedure to count the
number of occurances for every ID over a given time. Or better yet
run a procedure that only returns fields that have a count greater
than 1.

Aug 1 '08 #2

Plamen Ratchev

Try this:

SELECT SeqNo, StationID, COUNT(*)
FROM Table
GROUP BY SeqNo, StationID
HAVING COUNT(*) 1;
Plamen Ratchev
http://www.SQLStudio.com

Aug 1 '08 #3

Thomas Arthur Seidel

To classify what is a duplicate record: If (StationID,SeqNo) is repeating. Mainly,
if there is more then one record of the same SeqNo for one StationId. Each of the
two values individually may repeat: Many records do come from the same station,
and every station starts with the same sequence Number (SeqNo), it starts with
"1", increments with every record, continues for years, until it is reset to "1"
again some nice and warm day, when the accountant does decide this. For the period
of three months, where I want to find my problem, for sure there is no regular way
to have two or more transactions from the same station with a repeated SeqNo. The
SeqNo must be unique for each station during this time.
TS

rhaazy wrote:

On Aug 1, 2:07 pm, Thomas Arthur Seidel <thomas.arthur.sei...@gmx.de>
wrote:
>Hello to all,

I have a small or big problem with a customer data base, where during a change of
system we might have created duplicate records. This should be easy to find, you
might think, but, we are talking about roughly 10000 records or less in a total
volume of 1 MIO records or more.

I have considered a strategy: The station ID and a field with something like a
sequence number are supposed to be unique during that period. The sequence number
will repeat roughly every two - four years, so in the period of three months,
which is my problematic time, the sequence number (SeqNo) must be unique for each
of the 40 stations (StationID) that I record in this data base table.

I did run a simple 'create index' on these two fields, and, as to be expected, the
routine fails, blahblah, meaning, there are duplicates. Fine. So, how can I write
a query which lists to me all pairs (StationID, SeqNo) of data, where two or more
rows are identically?

If it makes things easier, I have a copy of that data base on a separate server,
not on the live system I mean. I could strip off all data which is not related to
that period of time.

Can someone give to me a hint? Or a select statement?

TS

You need to first figure out what you classify as a duplicate record?
If only one field in a record is different from the next record is it
a therefor not duplicate? If the only problem is a particular field
over a given period, then you simply run a procedure to count the
number of occurances for every ID over a given time. Or better yet
run a procedure that only returns fields that have a count greater
than 1.

Aug 1 '08 #4

Thomas Arthur Seidel

Works !
Unfortunately it proved also, that my idea of duplicate records was false.

TS

Plamen Ratchev wrote:

Try this:

SELECT SeqNo, StationID, COUNT(*)
FROM Table
GROUP BY SeqNo, StationID
HAVING COUNT(*) 1;
Plamen Ratchev
http://www.SQLStudio.com

Aug 1 '08 #5

Similar topics

5972

select all the duplicate records

by: atse | last post by:

Hi, My table in the database may contain duplicate records, which means except the primary key (auto_increment) field is unique, all or almost of all the fields are with the same content. How...

ASP / Active Server Pages

4975

Query to delete duplicate records in staging table

by: ms | last post by:

Access 2000: I am trying to delete duplicate records imported to a staging table leaving one of the duplicates to be imported into the live table. A unique record is based on a composite key of 3...

Microsoft Access / VBA

6150

How to delete duplicate records and leave one?

by: KT | last post by:

Is there any one click solution that would do the trick? I would like to create a button, so the person who maintains the database can perform clean up work to delete duplicate records which...

Microsoft Access / VBA

28859

Finding Duplicate Records

by: Carroll | last post by:

I'm looking for a way in SQL to find duplicate records in a single table, that are the same based on 3 columns, regardless of what is in the other columns in the duplicate records. I would like to...

DB2 Database

3990

finding duplicate records

by: rgurganus | last post by:

I'm trying to setup a way to find possible duplicate or near-duplicate records of contact people. I have a table of 50k records doing an inner join with itself, something like this: SELECT...

MySQL Database

2043

please suggest whether this table contains duplicate records

by: nethravathy | last post by:

Hi, The following table namely elcbtripselect contains 5147 records.I want to know wether this table contains duplicate records or not. I tried with following query 1)SELECT...

General

2178

Cause of Duplicate Records?

by: sierra7 | last post by:

I have found that I have duplicates in my Products table. I am using the DLookup() function to find a ProductID by a product 'criteria' and only create a new one if can not find a match e.g. ...

Microsoft Access / VBA

3993

create duplicate records on a form datasheet

by: nomvula | last post by:

hi guys i need some help to duplicate records on my form datasheet: here's the example of my form results: ClientLookup DateCaptured ForecastDate Description ForecastQuantity Forecast Actual UJ...

Microsoft Access / VBA

5911

Delete Duplicate Records from a large table

by: Dilip1983 | last post by:

Hi All, I want to delete duplicate records from a large table. There is one index(INDEX_U1) on 4 columns(col1,col2,col3,col4) which is in unusable state. First of all when i tried to rebuild...

Oracle Database

7100

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

6964

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

7175

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

6842

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

7330

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

5434

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

4865

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

598

php

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

262

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

General