473,408 Members | 1,683 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,408 software developers and data experts.

Very chalanging question - Explanation

Ok, here is a asample table representing the problem more clearly

A | B | C | D
-----------------
a1 b1 c1 d1
a1 b2 c2 d2
a3 b3 c1 d3
a4 b4 c4 d3
a5 b5 c5 d5
a6 b6 c6 d3

Tha duplications are:
row 1+2 in param A
row 1+3 in param C
row 3+4+6 in param D
only row 5 is unique in all parameters.
conclusion: row 1+2+3+4+6 are the same user
goal: to find all duplicated rows & to delete them all accept one
instance to leave.

Note:
Finding that row 1similar to 2 in A & deleting it will loose data
because we won't know that row 1 is ALSO similar to 3 on C & later on
finding that 3 is similar to 4 & 6 on D & so on

The simple time consuming (about 2 weaks) query to acomplish the task
is:
SELECT count(*),A.B,C,D
FROM tbl
GROUP BY A,B,C,D
HAVING count(*)>1

I THANK YOU ALL

May 31 '06 #1
3 1364
> SELECT count(*),A.B,C,D
FROM tbl
GROUP BY A,B,C,D
HAVING count(*)>1
If that query takes 2 weeks to run then you've got serious and I mean
serious problems with your hardware, that type of query should take minutes
if not seconds most desktops let alone servers.

To aid performance for that specific query create your clustered index on A,
B, C and D.

--
Tony Rogerson
SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
Server Consultant
http://sqlserverfaq.com - free video tutorials
"groupy" <li*******@gmail.com> wrote in message
news:11**********************@i39g2000cwa.googlegr oups.com... Ok, here is a asample table representing the problem more clearly

A | B | C | D
-----------------
a1 b1 c1 d1
a1 b2 c2 d2
a3 b3 c1 d3
a4 b4 c4 d3
a5 b5 c5 d5
a6 b6 c6 d3

Tha duplications are:
row 1+2 in param A
row 1+3 in param C
row 3+4+6 in param D
only row 5 is unique in all parameters.
conclusion: row 1+2+3+4+6 are the same user
goal: to find all duplicated rows & to delete them all accept one
instance to leave.

Note:
Finding that row 1similar to 2 in A & deleting it will loose data
because we won't know that row 1 is ALSO similar to 3 on C & later on
finding that 3 is similar to 4 & 6 on D & so on

The simple time consuming (about 2 weaks) query to acomplish the task
is:
SELECT count(*),A.B,C,D
FROM tbl
GROUP BY A,B,C,D
HAVING count(*)>1

I THANK YOU ALL

May 31 '06 #2
Thanks Tony i've just managed something..
Tony Rogerson כתב:
SELECT count(*),A.B,C,D
FROM tbl
GROUP BY A,B,C,D
HAVING count(*)>1


If that query takes 2 weeks to run then you've got serious and I mean
serious problems with your hardware, that type of query should take minutes
if not seconds most desktops let alone servers.

To aid performance for that specific query create your clustered index onA,
B, C and D.

--
Tony Rogerson
SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
Server Consultant
http://sqlserverfaq.com - free video tutorials
"groupy" <li*******@gmail.com> wrote in message
news:11**********************@i39g2000cwa.googlegr oups.com...
Ok, here is a asample table representing the problem more clearly

A | B | C | D
-----------------
a1 b1 c1 d1
a1 b2 c2 d2
a3 b3 c1 d3
a4 b4 c4 d3
a5 b5 c5 d5
a6 b6 c6 d3

Tha duplications are:
row 1+2 in param A
row 1+3 in param C
row 3+4+6 in param D
only row 5 is unique in all parameters.
conclusion: row 1+2+3+4+6 are the same user
goal: to find all duplicated rows & to delete them all accept one
instance to leave.

Note:
Finding that row 1similar to 2 in A & deleting it will loose data
because we won't know that row 1 is ALSO similar to 3 on C & later on
finding that 3 is similar to 4 & 6 on D & so on

The simple time consuming (about 2 weaks) query to acomplish the task
is:
SELECT count(*),A.B,C,D
FROM tbl
GROUP BY A,B,C,D
HAVING count(*)>1

I THANK YOU ALL


May 31 '06 #3
A | B | C | D
-----------------
a1 b1 c1 d1
a1 b2 c2 d2
a3 b3 c1 d3
a4 b4 c4 d3
a5 b5 c5 d5
a6 b6 c6 d3

I am going to guess at the DDL and add another column

CREATE TABLE Foobar
(a CHAR(2) NOT NULL,
b CHAR(2) NOT NULL,
c CHAR(2) NOT NULL,
d CHAR(2) NOT NULL,
dups INTEGER DEFAULT 0 NOT NULL
CHECK(dups >= 0),
PRIMARY KEY(a, b, c, d));

INSERT INTO Foobar(a, b, c, d) VALUES ('a1', 'b1', 'c1', 'd1');
INSERT INTO Foobar(a, b, c, d) VALUES ('a1', 'b2', 'c2', 'd2');
INSERT INTO Foobar(a, b, c, d) VALUES ('a3', 'b3', 'c1', 'd3');
INSERT INTO Foobar(a, b, c, d) VALUES ('a4', 'b4', 'c4', 'd3');
INSERT INTO Foobar(a, b, c, d) VALUES ('a5', 'b5', 'c5', 'd5');
INSERT INTO Foobar(a, b, c, d) VALUES ('a6', 'b6', 'c6', 'd3');

Since you seem to want to preserve some of the information about
duplications, you can keep a tally

UPDATE Foobar
SET dups
= dups
+ (SELECT 1
FROM Foobar AS F1
WHERE F1.a = Foobar.a
HAVING COUNT(*) > 1)
+ (SELECT 1
FROM Foobar AS F1
WHERE F1.b = Foobar.b
HAVING COUNT(*) > 1)
+ (SELECT 1
FROM Foobar AS F1
WHERE F1.c = Foobar.c
HAVING COUNT(*) > 1)
+ (SELECT 1
FROM Foobar AS F1
WHERE F1.d = Foobar.d
HAVING COUNT(*) > 1);

The duplications are:
row 1+2 in param A
row 1+3 in param C
row 3+4+6 in param D
only row 5 is unique in all parameters.
conclusion: row 1+2+3+4+6 are the same user

May 31 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Diego | last post by:
Hello. I need help. I'm writing an application that requires audio recording and I have decided to use waveform audio interface. I'm able to process WIM_DATA message but when i call waveInStop the...
14
by: Lars Netzel | last post by:
A little background: I use three Datagrids that are in a child parent relation. I Use Negative Autoincrement on the the DataTables and that's workning nice. My problem is when I Update these...
10
by: Rob Dob | last post by:
Hi, I'm amazed!!! I am using VS2005, I create a new web project, c# and then within the default.aspx form right mouse click and select "View Compnent Designer" , I then select and drag a...
29
by: Knut Olsen-Solberg | last post by:
I try to change the text in a <p> using getElementById(). I wonder what properties exists, and which one to use here. (The following does not work.) Regards Knut ______________________ ...
13
by: groupy | last post by:
input: 1.5 million records table consisting users with 4 nvchar fields:A,B,C,D the problem: there are many records with dublicates A's or duplicates B's or duplicates A+B's or duplicates B+C+D's &...
3
by: Evan | last post by:
So I have two versions of a program (these are complete programs, not excerpts): Version 1: template <class T > void foo() { return bar( T() ); }
7
by: shivapadma | last post by:
I want explanation for CLOB datafield 1.I created clobtable with the query create table clobexample(id number,text CLOB); 2.I tried to insert very large text by the following query insert...
112
by: Prisoner at War | last post by:
Friends, your opinions and advice, please: I have a very simple JavaScript image-swap which works on my end but when uploaded to my host at http://buildit.sitesell.com/sunnyside.html does not...
6
by: Terry Carroll | last post by:
I am trying to do something with a very large tarfile from within Python, and am running into memory constraints. The tarfile in question is a 4-gigabyte datafile from freedb.org,...
126
by: jacob navia | last post by:
Buffer overflows are a fact of life, and, more specifically, a fact of C. All is not lost however. In the book "Value Range Analysis of C programs" Axel Simon tries to establish a...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.