Hi All,
I've the following table with a PK defined on an IDENTITY column
(INSERT_SEQ):
CREATE TABLE MYDATA (
MID NUMERIC(19,0) NOT NULL,
MYVALUE FLOAT NOT NULL,
TIMEKEY INTEGER NOT NULL,
TIMEKEY_DTTM DATETIME NULL,
IID NUMERIC(19,0) NOT NULL,
EID NUMERIC(19,0) NOT NULL,
INSERT_SEQ NUMERIC(19,0) IDENTITY(1,1) NOT NULL
)
GO
ALTER TABLE MYDATA
ADD CONSTRAINT PK_MYDATA
PRIMARY KEY (INSERT_SEQ)
GO
The TIMEKEY_DTTM field is generated, from the value actually inserted
into the
TIMEKEY field, by the following trigger:
CREATE TRIGGER TIMEKEY1
ON MYDATA
FOR INSERT AS
BEGIN
DECLARE @M_TIMEKEY_DTTM DATETIME
SELECT @M_TIMEKEY_DTTM = DATEADD(SECOND, INS.TIMEKEY +
EP.GMT_OFFSET * 0 ,'1970-01-01 00:00:00.000')
FROM INSERTED INS, LOCATIONINFO EP
WHERE INS.EID = EP.EID
UPDATE MYDATA
SET TIMEKEY_DTTM = @M_TIMEKEY_DTTM
FROM INSERTED INS, MYDATA MD
WHERE MD.INSERT_SEQ = INS.INSERT_SEQ
END
GO
There is also a composite, non unique, index defined on the
tuple:(MID,IID, TIMEKEY,EID)
CREATE INDEX IX_METDATA ON MYDATA (MID,IID,TIMEKE Y,EID)
GO
As a consequence of an application design change, I would also change
this index to be UNIQUE, but when I try to drop and create it I get an
error, because the tables stores some duplicated rows...
In order to succesfully upgrade the index definition, I wrote some DML
staements
to lookup and remove the duplicated rows, keeping only the first
record inserted, i.e. the one with the lowest INSERT_SEQ:
--
-- This table stores then umber of duplicated records eventually
discovered
-- into the MYDATA table; the initial value for the NUM_DUPLICATES
field is
-- 0 (no duplicated record)
--
DROP TABLE DUPLICATES
GO
CREATE TABLE DUPLICATES (
TABLENAME VARCHAR(17),
NUM_DUPLICATES NUMERIC(19,0) )
GO
INSERT INTO DUPLICATES VALUES ('MYDATA',0)
GO
INSERT INTO DUPLICATES VALUES ('CATEGORIESDAT A',0)
GO
--
-- ///////// CLEAN UP OF MYDATA TABLE
--
DROP TABLE TMP_MYDATA
GO
CREATE TABLE TMP_MYDATA (
MID NUMERIC(19,0) NOT NULL,
TIMEKEY INTEGER NOT NULL,
IID NUMERIC(19,0) NOT NULL,
EID NUMERIC(19,0) NOT NULL,
INSERT_SEQ NUMERIC(19,0) )
GO
--
-- Insert into the TMP_MYDATA table all the duplicated records for
-- the tuple (MID,IID,TIMEKE Y,EID) and NULL for the INSERT_SEQ field
--
INSERT INTO TMP_MYDATA (MID,IID,TIMEKE Y,EID)
SELECT MID,IID,TIMEKEY ,EID
FROM MYDATA
GROUP BY MID,IID,TIMEKEY ,EID
HAVING COUNT(*)>1
GO
--
-- Updates the INSERT_SEQ field to the lowest value in the group
-- of duplicated records
--
UPDATE TMP_MYDATA
SET TMP_MYDATA.INSE RT_SEQ = (
SELECT MIN(INSERT_SEQ)
FROM MYDATA
WHERE TMP_MYDATA.MID = MYDATA.MID AND
TMP_MYDATA.IID = MYDATA.IID AND
TMP_MYDATA.TIME KEY = MYDATA.TIMEKEY AND
TMP_MYDATA.EID = MYDATA.EID )
GO
--
-- Updates the value of NUM_DUPLICATES for the MYDATA table.
--
UPDATE DUPLICATES
SET NUM_DUPLICATES = (SELECT COUNT(*) FROM TMP_MYDATA)
WHERE TABLENAME = 'MYDATA'
GO
--
-- Delete from the MYDATA table all the duplicated records,
-- keeping only the row with the lowest INSERT_SEQ
-- The delete is performed only if there are duplicated recors;
-- this is achieved using a "short circuit" AND on the number of
records
-- stored into the NUM_DUPLICATES field of the DUPLICATES table for
-- the MYDATA table...
--
DELETE FROM MYDATA
WHERE ( SELECT NUM_DUPLICATES FROM DUPLICATES WHERE TABLENAME =
'MYDATA') > 0 AND
EXISTS ( SELECT 1
FROM TMP_MYDATA
WHERE MYDATA.MID = TMP_MYDATA.MID AND
MYDATA.IID = TMP_MYDATA.IID AND
MYDATA.TIMEKEY = TMP_MYDATA.TIME KEY AND
MYDATA.EID = TMP_MYDATA.EID AND
MYDATA.INSERT_S EQ > TMP_MYDATA.INSE RT_SEQ )
GO
This tecnique works fine on a normal table (1M recs) but is not very
performant
on huge tables (>10M records)!
Do you know a better way to achieve the task of removing all the
duplicates records, preserving the lowest INSERT_SEQ betwee the
duplicates and also preserving the sequence seed, so that a new record
inserted at time t1>t0 is enumerated with an INSERT_SEQ|t1 >
max(INSERT_SEQ) |t0 ?
Thanks a lot for your help!
Patrizio
PS. sorry for such a large post! 1 10807
[posted and mailed, per favore rispondere in news]
This reply is coming late, so may be you have resolved the issue yourself,
but here is a suggestion.
Patrizio (pt********@qui po.it) writes: The TIMEKEY_DTTM field is generated, from the value actually inserted into the TIMEKEY field, by the following trigger:
CREATE TRIGGER TIMEKEY1 ON MYDATA FOR INSERT AS BEGIN DECLARE @M_TIMEKEY_DTTM DATETIME SELECT @M_TIMEKEY_DTTM = DATEADD(SECOND, INS.TIMEKEY + EP.GMT_OFFSET * 0 ,'1970-01-01 00:00:00.000') FROM INSERTED INS, LOCATIONINFO EP WHERE INS.EID = EP.EID UPDATE MYDATA SET TIMEKEY_DTTM = @M_TIMEKEY_DTTM FROM INSERTED INS, MYDATA MD WHERE MD.INSERT_SEQ = INS.INSERT_SEQ END GO
This looks a little funny. If you insert multiple rows with different
vales in TIMEKEY, then all rows will still get the same value in
TIMEKEY_DTTM.
I have to admit that I didn't see where TIMEKEY_DTTM came into your
problem of getting rid of the duplicates?
-- Delete from the MYDATA table all the duplicated records, -- keeping only the row with the lowest INSERT_SEQ -- The delete is performed only if there are duplicated recors; -- this is achieved using a "short circuit" AND on the number of -- records stored into the NUM_DUPLICATES field of the DUPLICATES table for -- the MYDATA table... -- DELETE FROM MYDATA WHERE ( SELECT NUM_DUPLICATES FROM DUPLICATES WHERE TABLENAME = 'MYDATA') > 0 AND EXISTS ( SELECT 1 FROM TMP_MYDATA WHERE MYDATA.MID = TMP_MYDATA.MID AND MYDATA.IID = TMP_MYDATA.IID AND MYDATA.TIMEKEY = TMP_MYDATA.TIME KEY AND MYDATA.EID = TMP_MYDATA.EID AND MYDATA.INSERT_S EQ > TMP_MYDATA.INSE RT_SEQ ) GO
This tecnique works fine on a normal table (1M recs) but is not very performant on huge tables (>10M records)!
I don't why this performs bad, but you should probably move your
shortcut to an IF statement:
IF (SELECT num_duplicates FROM duplicates WHERE...) > 0
BEGIN
DELETE ...
The way the query is now, the extra condition may lead the optimizer
astray.
But an alteraitve appraoch could be:
SELECT mid, timekey, iid, eid, min_seq = MIN(insert_seq)
INTO #duplkeys
FROM mydata
GROUP BY mid, timekey, iid, eid
HAVING COUNT(*) > 1
IF EXISTS (SELECT * FROM #duplicates)
BEGIN
SELECT m.insert_seq
INTO #seqs_to_delete
FROM #duplkeys d
JOIN mydata m ON d.mid = m.mid
AND d.timekey = m.timekey
AND d.iid = m.iid
AND d.eid = m.eid
WHERE m.insert_seq > d.insert_seq
DELETE mydata
FROM mydata m
JOIN #seqs_to_delete s ON m.insert_seq = s.insert_seq
END
--
Erland Sommarskog, SQL Server MVP, so****@algonet. se
Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techinf...2000/books.asp This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Alexander Anderson |
last post by:
I have a DELETE statement that deletes duplicate data from a table. It
takes a long time to execute, so I thought I'd seek advice here. The
structure of the table is little funny. The following is NOT the table,
but the representation of the data in the table:
+-----------+
| a | b |
+-----+-----+
| 123 | 234 |
| 345 | 456 |
|
by: ms |
last post by:
Access 2000:
I am trying to delete duplicate records imported to a staging table leaving one
of the duplicates to be imported into the live table. A unique record is based
on a composite key of 3 fields (vehicleID, BattID, and ChgHrs). VehicleID and
BattID are a TEXT datatype and ChrHrs are a number(long int.) datatype. Since
records to be imported can have duplicate records of the composite key I need to
clean all but one of the...
|
by: KT |
last post by:
Is there any one click solution that would do the trick? I would like
to create a button, so the person who maintains the database can
perform clean up work to delete duplicate records which contain same
information in the ID field and the account number field once a week.
Thanks in advance!
KT
|
by: (PeteCresswell) |
last post by:
I've got apps where you *really* wouldn't want to delete certain items by
accident, but the users just have to have a "Delete" button.
My current strategies:
Plan A:
------------------------------------------------------------------------
1) Make the cmd button black and do not give it an accelerator key.
2) Issue two levels of confirmation (Do you want to delete... Do you REALLY want
|
by: Radu |
last post by:
Hi.
I have a "union" table which results of a union of two tables.
Occasionally I could have duplicates, when the same PIN has been added
to both tables, albeit at different Datees/Times, such as:
PIN Name Added Date
100411 A 7/11/2007 10:12:58 AM
100411 A 7/17/2007 10:54:23 AM
100413 B 7/11/2007 10:13:28 AM
| |
by: rajeshkrsingh |
last post by:
Hi friends,
Step1-
create table duplicate
(
intId int, varName varchar(50)
)
insert into duplicate(intId,varName) values(1,'rajesh')
insert into duplicate(intId,varName) values(2,'raj12')
insert into duplicate(intId,varName) values(1,'rajesh')
|
by: farouqdin |
last post by:
Hi all i have code which loops through table and deletes the duplicate
records. This code does it for one table. How do i change it so it
goes through several tables?
On Error Resume Next
Dim db As DAO.Database, rst As DAO.Recordset
Dim strDupName As String, strSaveName As String
Set db = CurrentDb()
|
by: Dilip1983 |
last post by:
Hi All,
I want to delete duplicate records from a large table.
There is one index(INDEX_U1) on 4 columns(col1,col2,col3,col4) which is in unusable state.
First of all when i tried to rebuild index it showed error as unique key violation.
So i want to delete duplicate records for col1,col2,col3,col4 combination.
How can i delete the duplicate records from this large table?
|
by: watertraveller |
last post by:
Hi all. My ultimate goal is to return two columns, where no single value appears anywhere twice. This means that not only do I want to check that nothing from column A appears in column B and vice-versa, but I also don't want the same value appearing twice in A and twice in B.
So far I have:
--Diff the columns
INSERT INTO @Table
SELECT One, Two
FROM @Column1 a
FULL OUTER JOIN @Column2 b
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |