Very chalanging question - Explanation

groupy

Ok, here is a asample table representing the problem more clearly

A | B | C | D
-----------------
a1 b1 c1 d1
a1 b2 c2 d2
a3 b3 c1 d3
a4 b4 c4 d3
a5 b5 c5 d5
a6 b6 c6 d3

Tha duplications are:
row 1+2 in param A
row 1+3 in param C
row 3+4+6 in param D
only row 5 is unique in all parameters.
conclusion: row 1+2+3+4+6 are the same user
goal: to find all duplicated rows & to delete them all accept one
instance to leave.

Note:
Finding that row 1similar to 2 in A & deleting it will loose data
because we won't know that row 1 is ALSO similar to 3 on C & later on
finding that 3 is similar to 4 & 6 on D & so on

The simple time consuming (about 2 weaks) query to acomplish the task
is:
SELECT count(*),A.B,C,D
FROM tbl
GROUP BY A,B,C,D
HAVING count(*)>1

I THANK YOU ALL

May 31 '06 #1

Subscribe Post Reply

1364

Tony Rogerson

> SELECT count(*),A.B,C,D

FROM tbl
GROUP BY A,B,C,D
HAVING count(*)>1
If that query takes 2 weeks to run then you've got serious and I mean
serious problems with your hardware, that type of query should take minutes
if not seconds most desktops let alone servers.

To aid performance for that specific query create your clustered index on A,
B, C and D.

--
Tony Rogerson
SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
Server Consultant
http://sqlserverfaq.com - free video tutorials
"groupy" <li*******@gmail.com> wrote in message
news:11**********************@i39g2000cwa.googlegr oups.com... Ok, here is a asample table representing the problem more clearly

A | B | C | D
-----------------
a1 b1 c1 d1
a1 b2 c2 d2
a3 b3 c1 d3
a4 b4 c4 d3
a5 b5 c5 d5
a6 b6 c6 d3

Tha duplications are:
row 1+2 in param A
row 1+3 in param C
row 3+4+6 in param D
only row 5 is unique in all parameters.
conclusion: row 1+2+3+4+6 are the same user
goal: to find all duplicated rows & to delete them all accept one
instance to leave.

Note:
Finding that row 1similar to 2 in A & deleting it will loose data
because we won't know that row 1 is ALSO similar to 3 on C & later on
finding that 3 is similar to 4 & 6 on D & so on

The simple time consuming (about 2 weaks) query to acomplish the task
is:
SELECT count(*),A.B,C,D
FROM tbl
GROUP BY A,B,C,D
HAVING count(*)>1

I THANK YOU ALL

May 31 '06 #2

groupy

Thanks Tony i've just managed something..
Tony Rogerson ×›×ª×‘:

SELECT count(*),A.B,C,D
FROM tbl
GROUP BY A,B,C,D
HAVING count(*)>1

If that query takes 2 weeks to run then you've got serious and I mean
serious problems with your hardware, that type of query should take minutes
if not seconds most desktops let alone servers.

To aid performance for that specific query create your clustered index onA,
B, C and D.

--
Tony Rogerson
SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
Server Consultant
http://sqlserverfaq.com - free video tutorials
"groupy" <li*******@gmail.com> wrote in message
news:11**********************@i39g2000cwa.googlegr oups.com...
Ok, here is a asample table representing the problem more clearly

A | B | C | D
-----------------
a1 b1 c1 d1
a1 b2 c2 d2
a3 b3 c1 d3
a4 b4 c4 d3
a5 b5 c5 d5
a6 b6 c6 d3

Tha duplications are:
row 1+2 in param A
row 1+3 in param C
row 3+4+6 in param D
only row 5 is unique in all parameters.
conclusion: row 1+2+3+4+6 are the same user
goal: to find all duplicated rows & to delete them all accept one
instance to leave.

Note:
Finding that row 1similar to 2 in A & deleting it will loose data
because we won't know that row 1 is ALSO similar to 3 on C & later on
finding that 3 is similar to 4 & 6 on D & so on

The simple time consuming (about 2 weaks) query to acomplish the task
is:
SELECT count(*),A.B,C,D
FROM tbl
GROUP BY A,B,C,D
HAVING count(*)>1

I THANK YOU ALL

May 31 '06 #3

--CELKO--

A | B | C | D
-----------------
a1 b1 c1 d1
a1 b2 c2 d2
a3 b3 c1 d3
a4 b4 c4 d3
a5 b5 c5 d5
a6 b6 c6 d3

I am going to guess at the DDL and add another column

CREATE TABLE Foobar
(a CHAR(2) NOT NULL,
b CHAR(2) NOT NULL,
c CHAR(2) NOT NULL,
d CHAR(2) NOT NULL,
dups INTEGER DEFAULT 0 NOT NULL
CHECK(dups >= 0),
PRIMARY KEY(a, b, c, d));

INSERT INTO Foobar(a, b, c, d) VALUES ('a1', 'b1', 'c1', 'd1');
INSERT INTO Foobar(a, b, c, d) VALUES ('a1', 'b2', 'c2', 'd2');
INSERT INTO Foobar(a, b, c, d) VALUES ('a3', 'b3', 'c1', 'd3');
INSERT INTO Foobar(a, b, c, d) VALUES ('a4', 'b4', 'c4', 'd3');
INSERT INTO Foobar(a, b, c, d) VALUES ('a5', 'b5', 'c5', 'd5');
INSERT INTO Foobar(a, b, c, d) VALUES ('a6', 'b6', 'c6', 'd3');

Since you seem to want to preserve some of the information about
duplications, you can keep a tally

UPDATE Foobar
SET dups
= dups
+ (SELECT 1
FROM Foobar AS F1
WHERE F1.a = Foobar.a
HAVING COUNT(*) > 1)
+ (SELECT 1
FROM Foobar AS F1
WHERE F1.b = Foobar.b
HAVING COUNT(*) > 1)
+ (SELECT 1
FROM Foobar AS F1
WHERE F1.c = Foobar.c
HAVING COUNT(*) > 1)
+ (SELECT 1
FROM Foobar AS F1
WHERE F1.d = Foobar.d
HAVING COUNT(*) > 1);

The duplications are:
row 1+2 in param A
row 1+3 in param C
row 3+4+6 in param D
only row 5 is unique in all parameters.
conclusion: row 1+2+3+4+6 are the same user

May 31 '06 #4

Similar topics

Very strange behaviour WaveInProc.

by: Diego | last post by:

Hello. I need help. I'm writing an application that requires audio recording and I have decided to use waveform audio interface. I'm able to process WIM_DATA message but when i call waveInStop the...

.NET Framework

A VERY strange problem when Updating a Dataset

by: Lars Netzel | last post by:

A little background: I use three Datagrids that are in a child parent relation. I Use Negative Autoincrement on the the DataTables and that's workning nice. My problem is when I Update these...

Visual Basic .NET

VS2005 C# is very BUGGY, please hlp

by: Rob Dob | last post by:

Hi, I'm amazed!!! I am using VS2005, I create a new web project, c# and then within the default.aspx form right mouse click and select "View Compnent Designer" , I then select and drag a...

C# / C Sharp

Very simple question from a novice

by: Knut Olsen-Solberg | last post by:

I try to change the text in a <p> using getElementById(). I wonder what properties exists, and which one to use here. (The following does not work.) Regards Knut ______________________ ...

Javascript

VERY chalanging question

by: groupy | last post by:

input: 1.5 million records table consisting users with 4 nvchar fields:A,B,C,D the problem: there are many records with dublicates A's or duplicates B's or duplicates A+B's or duplicates B+C+D's &...

Microsoft SQL Server

C++ and "flush" -- So very confused

by: Evan | last post by:

So I have two versions of a program (these are complete programs, not excerpts): Version 1: template <class T > void foo() { return bar( T() ); }

C / C++

how to insert very large text into CLOB datafield?urgent.....

by: shivapadma | last post by:

I want explanation for CLOB datafield 1.I created clobtable with the query create table clobexample(id number,text CLOB); 2.I tried to insert very large text by the following query insert...

Oracle Database

112

My Very Strange Webhost, SBI! -- Opinions, Please

by: Prisoner at War | last post by:

Friends, your opinions and advice, please: I have a very simple JavaScript image-swap which works on my end but when uploaded to my host at http://buildit.sitesell.com/sunnyside.html does not...

Javascript

How to process a very large (4Gb) tarfile from python?

by: Terry Carroll | last post by:

I am trying to do something with a very large tarfile from within Python, and am running into memory constraints. The tarfile in question is a 4-gigabyte datafile from freedb.org,...

Python

126

A very interesting book

by: jacob navia | last post by:

Buffer overflows are a fact of life, and, more specifically, a fact of C. All is not lost however. In the book "Value Range Analysis of C programs" Axel Simon tries to establish a...

C / C++

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice