473,748 Members | 9,913 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Finding duplicate records

Hello to all,

I have a small or big problem with a customer data base, where during a change of
system we might have created duplicate records. This should be easy to find, you
might think, but, we are talking about roughly 10000 records or less in a total
volume of 1 MIO records or more.

I have considered a strategy: The station ID and a field with something like a
sequence number are supposed to be unique during that period. The sequence number
will repeat roughly every two - four years, so in the period of three months,
which is my problematic time, the sequence number (SeqNo) must be unique for each
of the 40 stations (StationID) that I record in this data base table.

I did run a simple 'create index' on these two fields, and, as to be expected, the
routine fails, blahblah, meaning, there are duplicates. Fine. So, how can I write
a query which lists to me all pairs (StationID, SeqNo) of data, where two or more
rows are identically?

If it makes things easier, I have a copy of that data base on a separate server,
not on the live system I mean. I could strip off all data which is not related to
that period of time.

Can someone give to me a hint? Or a select statement?

TS
Aug 1 '08 #1
4 4199
On Aug 1, 2:07*pm, Thomas Arthur Seidel <thomas.arthur. sei...@gmx.de>
wrote:
Hello to all,

I have a small or big problem with a customer data base, where during a change of
system we might have created duplicate records. This should be easy to find, you
might think, but, we are talking about roughly 10000 records or less in atotal
volume of 1 MIO records or more.

I have considered a strategy: The station ID and a field with something like a
sequence number are supposed to be unique during that period. The sequence number
will repeat roughly every two - four years, so in the period of three months,
which is my problematic time, the sequence number (SeqNo) must be unique for each
of the 40 stations (StationID) that I record in this data base table.

I did run a simple 'create index' on these two fields, and, as to be expected, the
routine fails, blahblah, meaning, there are duplicates. Fine. So, how canI write
a query which lists to me all pairs (StationID, SeqNo) of data, where twoor more
rows are identically?

If it makes things easier, I have a copy of that data base on a separate server,
not on the live system I mean. I could strip off all data which is not related to
that period of time.

Can someone give to me a hint? Or a select statement?

TS
You need to first figure out what you classify as a duplicate record?
If only one field in a record is different from the next record is it
a therefor not duplicate? If the only problem is a particular field
over a given period, then you simply run a procedure to count the
number of occurances for every ID over a given time. Or better yet
run a procedure that only returns fields that have a count greater
than 1.
Aug 1 '08 #2
Try this:

SELECT SeqNo, StationID, COUNT(*)
FROM Table
GROUP BY SeqNo, StationID
HAVING COUNT(*) 1;
Plamen Ratchev
http://www.SQLStudio.com
Aug 1 '08 #3
To classify what is a duplicate record: If (StationID,SeqN o) is repeating. Mainly,
if there is more then one record of the same SeqNo for one StationId. Each of the
two values individually may repeat: Many records do come from the same station,
and every station starts with the same sequence Number (SeqNo), it starts with
"1", increments with every record, continues for years, until it is reset to "1"
again some nice and warm day, when the accountant does decide this. For the period
of three months, where I want to find my problem, for sure there is no regular way
to have two or more transactions from the same station with a repeated SeqNo. The
SeqNo must be unique for each station during this time.
TS

rhaazy wrote:
On Aug 1, 2:07 pm, Thomas Arthur Seidel <thomas.arthur. sei...@gmx.de>
wrote:
>Hello to all,

I have a small or big problem with a customer data base, where during a change of
system we might have created duplicate records. This should be easy to find, you
might think, but, we are talking about roughly 10000 records or less in a total
volume of 1 MIO records or more.

I have considered a strategy: The station ID and a field with something like a
sequence number are supposed to be unique during that period. The sequence number
will repeat roughly every two - four years, so in the period of three months,
which is my problematic time, the sequence number (SeqNo) must be unique for each
of the 40 stations (StationID) that I record in this data base table.

I did run a simple 'create index' on these two fields, and, as to be expected, the
routine fails, blahblah, meaning, there are duplicates. Fine. So, how can I write
a query which lists to me all pairs (StationID, SeqNo) of data, where two or more
rows are identically?

If it makes things easier, I have a copy of that data base on a separate server,
not on the live system I mean. I could strip off all data which is not related to
that period of time.

Can someone give to me a hint? Or a select statement?

TS

You need to first figure out what you classify as a duplicate record?
If only one field in a record is different from the next record is it
a therefor not duplicate? If the only problem is a particular field
over a given period, then you simply run a procedure to count the
number of occurances for every ID over a given time. Or better yet
run a procedure that only returns fields that have a count greater
than 1.
Aug 1 '08 #4
Works !
Unfortunately it proved also, that my idea of duplicate records was false.

TS

Plamen Ratchev wrote:
Try this:

SELECT SeqNo, StationID, COUNT(*)
FROM Table
GROUP BY SeqNo, StationID
HAVING COUNT(*) 1;
Plamen Ratchev
http://www.SQLStudio.com
Aug 1 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
5983
by: atse | last post by:
Hi, My table in the database may contain duplicate records, which means except the primary key (auto_increment) field is unique, all or almost of all the fields are with the same content. How can I select them to display and delete them? Thanks for any idea. Atse
2
4991
by: ms | last post by:
Access 2000: I am trying to delete duplicate records imported to a staging table leaving one of the duplicates to be imported into the live table. A unique record is based on a composite key of 3 fields (vehicleID, BattID, and ChgHrs). VehicleID and BattID are a TEXT datatype and ChrHrs are a number(long int.) datatype. Since records to be imported can have duplicate records of the composite key I need to clean all but one of the...
4
6162
by: KT | last post by:
Is there any one click solution that would do the trick? I would like to create a button, so the person who maintains the database can perform clean up work to delete duplicate records which contain same information in the ID field and the account number field once a week. Thanks in advance! KT
2
28902
by: Carroll | last post by:
I'm looking for a way in SQL to find duplicate records in a single table, that are the same based on 3 columns, regardless of what is in the other columns in the duplicate records. I would like to keep both records (or it could be more than 2 as well) where duplicate records are found. Also, I am interested in selecting all columns from the duplicate records. Thanks, Carroll Rinehart
1
4010
by: rgurganus | last post by:
I'm trying to setup a way to find possible duplicate or near-duplicate records of contact people. I have a table of 50k records doing an inner join with itself, something like this: SELECT p1.fields, p2.fields FROM table AS p1 INNER JOIN table AS p2 ON p1.lastname = p2.lastname WHERE p1.id <p2.id AND (other criteria) The id and name and other criteria fields are indexed as well as they
2
2072
by: nethravathy | last post by:
Hi, The following table namely elcbtripselect contains 5147 records.I want to know wether this table contains duplicate records or not. I tried with following query 1)SELECT elcbtripselect.ELCBTRIP_voltsMIN, elcbtripselect.ELCBTRIP_voltsMAX, elcbtripselect.ELCBTrip_is_partwinding, elcbtripselect.ELCBTrip_is_ydelta, elcbtripselect.ELCBTRIP_starter_size, elcbtripselect.ELCBTRIP_UnitFunction, elcbtripselect.ELCBTRIP_strcb_speedi_frame_ty,...
3
2298
by: sierra7 | last post by:
I have found that I have duplicates in my Products table. I am using the DLookup() function to find a ProductID by a product 'criteria' and only create a new one if can not find a match e.g. varX=dlookup("","ProductsTableName", strCriteria) if isnull(varX) then 'Proceed to add new record and create new ProductID (autonumber) else 'Use ProductID currently held in varX endif There are about 14,000 rows in the Products...
2
4015
by: nomvula | last post by:
hi guys i need some help to duplicate records on my form datasheet: here's the example of my form results: ClientLookup DateCaptured ForecastDate Description ForecastQuantity Forecast Actual UJ 18-Apr-08 01-Mar-08 Fees: Asset 1 R 31,200.00 R 31,200.00 NMBM 22-Apr-08 23-Mar-08 P-MI (E) 07/2006 3 R 47,485 R 38,849 i have 200 records deplayed in the form i'm using access2007 and i have a command button which is the built in command button to...
6
5937
by: Dilip1983 | last post by:
Hi All, I want to delete duplicate records from a large table. There is one index(INDEX_U1) on 4 columns(col1,col2,col3,col4) which is in unusable state. First of all when i tried to rebuild index it showed error as unique key violation. So i want to delete duplicate records for col1,col2,col3,col4 combination. How can i delete the duplicate records from this large table?
0
8987
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8826
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9534
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
6073
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4597
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4867
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3303
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2777
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2211
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.