473,414 Members | 1,947 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,414 software developers and data experts.

Comparing two sets of data

I have the following situation. One set of data has 274 rows (set2)
and anther has 264 (set1). Both data sets are similar in structure as
well as values for both of them were extracts from the same parent
table. Hope the info would substitute DDL. I need to find the "gap"
rows between these two sets.
Attempted to run a query like
select count(*)
from set2
where not exists
(select *
from set1)
did not yield what I desired. What else to try?

TIA.

Jul 23 '05 #1
12 5107
nib
NickName wrote:
I have the following situation. One set of data has 274 rows (set2)
and anther has 264 (set1). Both data sets are similar in structure as
well as values for both of them were extracts from the same parent
table.
Hope the info would substitute DDL.
It doesn't.
I need to find the "gap"
rows between these two sets.
Attempted to run a query like
select count(*)
from set2
where not exists
(select *
from set1)
did not yield what I desired. What else to try?


Try posting your DDL. We at least need column names and keys to help you
here.

Zach
Jul 23 '05 #2
OK,

I've proven that the "EXISTS" keyword/function can't solve this
problem. But then what?

-- test equality between two data sets
-- desired resultset: return "gap" rows

-- DDL and DML
create table #tmp1 (col1 int, col2 char(1));

insert into #tmp1
values(1,'a')
insert into #tmp1
values(2,'b')
insert into #tmp1
values(3,'c')
insert into #tmp1
values(4,'d')
insert into #tmp1
values(5,'e');

select * into #tmp2
from #tmp1
where col1 < 5;
select *
from #tmp1
where not exists
(select *
from #tmp2)

Jul 23 '05 #3
NickName (da****@rock.com) writes:
I've proven that the "EXISTS" keyword/function can't solve this
problem. But then what?


It certainly can, but you must specify how the NOT EXISTS is to work.
SQL is not about telepathy.

For your repro, you could do

select *
from #tmp1 t1
where not exists (Select *
from #tmp2 t2
WHERE t1.col1 = t2.col1)

or

select *
from #tmp1 t1
where not exists (Select *
from #tmp2 t
WHERE t1.col1 = t2.col1
AND t1.col2 = t2.col2)

depending on what you are looking for. The first just lists missing key
values, the second list all mismatches.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #4
Thanks, Erland.
*) laziness is an enemy, I thought about that but the table has a large
number of columns, so, was wondering if there's another approach. Will
try it tomorrow when the db is available.
*) btw, I like the word, "telepathy".

Erland Sommarskog wrote:
NickName (da****@rock.com) writes:
I've proven that the "EXISTS" keyword/function can't solve this
problem. But then what?
It certainly can, but you must specify how the NOT EXISTS is to work.
SQL is not about telepathy.

For your repro, you could do

select *
from #tmp1 t1
where not exists (Select *
from #tmp2 t2
WHERE t1.col1 = t2.col1)

or

select *
from #tmp1 t1
where not exists (Select *
from #tmp2 t
WHERE t1.col1 = t2.col1
AND t1.col2 = t2.col2)

depending on what you are looking for. The first just lists missing

key values, the second list all mismatches.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp


Jul 23 '05 #5
NickName (da****@rock.com) writes:
Thanks, Erland.
*) laziness is an enemy, I thought about that but the table has a large
number of columns, so, was wondering if there's another approach. Will
try it tomorrow when the db is available.


If your aim is to find differences in any column, you will indeed have
to write code that has all column. There is no shortcut. What you can
do, if you have many tables and columns, is to generate code by reading
metadata. But this requires that you have a clear understanding for which
columns you want to compare, and which you do not.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #6
Erland,

Something seems odd. Here's two test cases. Both are successful.
-- test equality between two data sets (with exact meta data)
-- desired resultset: return "gap" rows

-- DDL and DML
create table #tmp1 (col1 int, col2 char(1));

insert into #tmp1
values(1,'a')
insert into #tmp1
values(2,'b')
insert into #tmp1
values(3,'c')
insert into #tmp1
values(4,'d')
insert into #tmp1
values(5,'e');
select * into #tmp2
from #tmp1
where col1 < 5;

select *
from #tmp1 t1
where not exists
(select *
from #tmp2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2)

-- outcome = success

-- test equality between two data sets
-- with minor meta data difference (one table has 3 attributes while
the other has 2)
-- note: comparison seems still successful
-- desired resultset: return "gap" rows

drop table #tmp1
drop table #tmp2;

-- DDL and DML
create table #tmp1 (col1 int, col2 char(1),col3 int);

insert into #tmp1
values(1,'a',11)
insert into #tmp1
values(2,'b',22)
insert into #tmp1
values(3,'c',33)
insert into #tmp1
values(4,'d',44)
insert into #tmp1
values(5,'e',55);

select col1,col2 into #tmp2
from #tmp1
where col1 < 5;

select *
from #tmp1 t1
where not exists
(select *
from #tmp2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2)

-- outcome = success

HOWEVER, when I applied the above to my tables (sorry I can't post
exact structure nor data here, PARENT table has 13 columns and the
derived table has 12 of them, comparison is between them), sql still
failed to find the "gap". What could possibly stands in the way?
Many thanks.

Jul 23 '05 #7
Ahe, I think I've found the problem. In "my" tables, both has
duplicate rows, however, the number of duplicate rows are not the same,
(distinct rows are sure the same).

Jul 23 '05 #8
NickName (da****@rock.com) writes:
Ahe, I think I've found the problem. In "my" tables, both has
duplicate rows, however, the number of duplicate rows are not the same,
(distinct rows are sure the same).


I'm not sure how this should be addressed. Are you saying that in
one table you have two rows with the same value, but in another you
only have one?

If there are no disctinct keys in the data, all relational operations will
be problematic, that's for sure.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #9
You might find it easier to export the table contents as ordered data and
then use a text file compare utility. Command-prompt example:

BCP "SELECT * FROM MyDatabase..Table1 ORDER BY MyPK" queryout
"C:\temp\Table1.txt" /T /c /SMyServer

BCP "SELECT * FROM MyDatabase..Table2 ORDER BY MyPK" queryout
"C:\temp\Table2.txt" /T /c /SMyServer

WINDIFF "C:\temp\Table1.txt" "C:\temp\Table2.txt"

--
Happy Holidays

Dan Guzman
SQL Server MVP

"NickName" <da****@rock.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...
Ahe, I think I've found the problem. In "my" tables, both has
duplicate rows, however, the number of duplicate rows are not the same,
(distinct rows are sure the same).

Jul 23 '05 #10
Dan Guzman (gu******@nospam-online.sbcglobal.net) writes:
You might find it easier to export the table contents as ordered data and
then use a text file compare utility. Command-prompt example:

BCP "SELECT * FROM MyDatabase..Table1 ORDER BY MyPK" queryout
"C:\temp\Table1.txt" /T /c /SMyServer

BCP "SELECT * FROM MyDatabase..Table2 ORDER BY MyPK" queryout
"C:\temp\Table2.txt" /T /c /SMyServer

WINDIFF "C:\temp\Table1.txt" "C:\temp\Table2.txt"


However, the use of windiff will break down if there is a column which is
permitted to be different in the tables. But there is a better alternative:
Beyond Compare, from http://www.scootersoftware.com. Beyond Compare
offers comparison on character level.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #11
Erland,

You're exactly right. Maybe they did that to test my analytic skill,
who knows :)

Jul 23 '05 #12
Dan,

Thanks. It seems a good way to try. Will do tomorrow.

Erland, no columns are supposed to contain different data sets though
one table may have an extra column, that is supposedly the only
difference. OK, I'll try the recommended software as well when get a
chance. I appreciate it.

Don

Jul 23 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Dan Stromberg | last post by:
We will soon have 3 copies, for testing purposes, of what should be about 4.5 terrabytes of data. Rather than cmp'ing twice, to verify data integrity, I was thinking we could speed up the...
5
by: Skip Montanaro | last post by:
I'd like to compare two xml.dom.minidom objects, but the naive attempt fails: >>> import xml.dom.minidom >>> d1 = xml.dom.minidom.parse("ES.xml") >>> d2 = xml.dom.minidom.parse("ES.xml") >>> d1...
26
by: William Park | last post by:
How do you compare 2 strings, and determine how much they are "close" to each other? Eg. aqwerty qwertyb are similar to each other, except for first/last char. But, how do I quantify that? ...
41
by: Odd-R. | last post by:
I have to lists, A and B, that may, or may not be equal. If they are not identical, I want the output to be three new lists, X,Y and Z where X has all the elements that are in A, but not in B, and...
19
by: Dennis | last post by:
I have a public variable in a class of type color declared as follows: public mycolor as color = color.Empty I want to check to see if the user has specified a color like; if mycolor =...
11
by: John Salerno | last post by:
I'd like to compare the values in two different sets to test if any of the positions in either set share the same value (e.g., if the third element of each set is an 'a', then the test fails). I...
2
by: Yash | last post by:
Hi, We are in the process of tuning the performance of our stored procs in SQL 2000 and are looking for a tool that would help us in comparing the result sets of an old SP and a modified SP. The...
0
by: DomiNeug | last post by:
Hello, Since a while i have to find a way of comparing "Sets" (multiple int Values) and so to find equal sets. There simply 3 tables ValueList with: ID int ValueListHasValue: ID int, ValueListID...
3
by: Sean Dalton | last post by:
Hello, I have a two sets OLDLIST and REMOVE. I would like to remove every element in OLDLIST if it is also occuring in REMOVE and store the remaining elements from OLDLIST into NEWLIST. So...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.