By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
439,977 Members | 1,369 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 439,977 IT Pros & Developers. It's quick & easy.

Resolving duplicates entries in table among 10 databases

P: n/a
AK
Hi

Our product uses MS-SQL Server 2000. One of our customer has 10
installations with each installation stroring data in its own database.
Now the customer wants to consolidate these databases into one and we
already have plan for that by consolidating one DB at a time. But first
they want to find how many unique or duplicate entries they have across
all the 10 databases

Assumptions:
1. All the databases reside on the same server. (This is just an
assumption, not the real environment at customer site)
2. Databases can not be merged before it is found how many unique or
duplicate rows exist.

Table under consideration:
Message
(
HashID PK,
....
)

# of rows in Message table in each of databases: 1 Million

Here is my question: How can I find how many unique or duplicate
entries they have across all the 10 databases. I easily find unique
rows for two databases with a query like this:

SELECT COUNT(A.HasID) FROM db1.dbo.Message A LEFT OUTER JOIN ON
db2.dbo.Message B ON A.HashID = B.HashID WHERE B.HashID IS NULL

How can I do this for 10 databases. This will require factorial of 10
queries to solve this problem.

I will appreciate if someone can provide hint on this.

Regards
AK

Feb 11 '06 #1
Share this Question
Share on Google+
3 Replies


P: n/a
> Here is my question: How can I find how many unique or duplicate
entries they have across all the 10 databases.
The following will list the count of unique values (Duplicates = 0) as well
has the non-unique values grouped by the number of duplicates (1-9).

SELECT
Duplicates,
(Duplicates + 1) * COUNT(*) AS TotalHashIDCount
FROM (
SELECT HashID, COUNT(*) - 1 AS Duplicates
FROM (
SELECT HashID FROM db1.dbo.Message
UNION ALL SELECT HashID FROM db2.dbo.Message
UNION ALL SELECT HashID FROM db3.dbo.Message
UNION ALL SELECT HashID FROM db4.dbo.Message
UNION ALL SELECT HashID FROM db5.dbo.Message
UNION ALL SELECT HashID FROM db6.dbo.Message
UNION ALL SELECT HashID FROM db7.dbo.Message
UNION ALL SELECT HashID FROM db8.dbo.Message
UNION ALL SELECT HashID FROM db9.dbo.Message
UNION ALL SELECT HashID FROM db10.dbo.Message
) AS Messages
GROUP BY HashID) AS HashIDCounts
GROUP BY Duplicates
ORDER BY Duplicates

--
Hope this helps.

Dan Guzman
SQL Server MVP

"AK" <am***@yahoo.com> wrote in message
news:11*********************@g44g2000cwa.googlegro ups.com... Hi

Our product uses MS-SQL Server 2000. One of our customer has 10
installations with each installation stroring data in its own database.
Now the customer wants to consolidate these databases into one and we
already have plan for that by consolidating one DB at a time. But first
they want to find how many unique or duplicate entries they have across
all the 10 databases

Assumptions:
1. All the databases reside on the same server. (This is just an
assumption, not the real environment at customer site)
2. Databases can not be merged before it is found how many unique or
duplicate rows exist.

Table under consideration:
Message
(
HashID PK,
...
)

# of rows in Message table in each of databases: 1 Million

Here is my question: How can I find how many unique or duplicate
entries they have across all the 10 databases. I easily find unique
rows for two databases with a query like this:

SELECT COUNT(A.HasID) FROM db1.dbo.Message A LEFT OUTER JOIN ON
db2.dbo.Message B ON A.HashID = B.HashID WHERE B.HashID IS NULL

How can I do this for 10 databases. This will require factorial of 10
queries to solve this problem.

I will appreciate if someone can provide hint on this.

Regards
AK

Feb 11 '06 #2

P: n/a
AK
Thank you Dan. This is exactly what I needed (in fact more than what I
needed :)

Regards
AK

Dan Guzman wrote:
Here is my question: How can I find how many unique or duplicate
entries they have across all the 10 databases.


The following will list the count of unique values (Duplicates = 0) as well
has the non-unique values grouped by the number of duplicates (1-9).

SELECT
Duplicates,
(Duplicates + 1) * COUNT(*) AS TotalHashIDCount
FROM (
SELECT HashID, COUNT(*) - 1 AS Duplicates
FROM (
SELECT HashID FROM db1.dbo.Message
UNION ALL SELECT HashID FROM db2.dbo.Message
UNION ALL SELECT HashID FROM db3.dbo.Message
UNION ALL SELECT HashID FROM db4.dbo.Message
UNION ALL SELECT HashID FROM db5.dbo.Message
UNION ALL SELECT HashID FROM db6.dbo.Message
UNION ALL SELECT HashID FROM db7.dbo.Message
UNION ALL SELECT HashID FROM db8.dbo.Message
UNION ALL SELECT HashID FROM db9.dbo.Message
UNION ALL SELECT HashID FROM db10.dbo.Message
) AS Messages
GROUP BY HashID) AS HashIDCounts
GROUP BY Duplicates
ORDER BY Duplicates

--
Hope this helps.

Dan Guzman
SQL Server MVP

"AK" <am***@yahoo.com> wrote in message
news:11*********************@g44g2000cwa.googlegro ups.com...
Hi

Our product uses MS-SQL Server 2000. One of our customer has 10
installations with each installation stroring data in its own database.
Now the customer wants to consolidate these databases into one and we
already have plan for that by consolidating one DB at a time. But first
they want to find how many unique or duplicate entries they have across
all the 10 databases

Assumptions:
1. All the databases reside on the same server. (This is just an
assumption, not the real environment at customer site)
2. Databases can not be merged before it is found how many unique or
duplicate rows exist.

Table under consideration:
Message
(
HashID PK,
...
)

# of rows in Message table in each of databases: 1 Million

Here is my question: How can I find how many unique or duplicate
entries they have across all the 10 databases. I easily find unique
rows for two databases with a query like this:

SELECT COUNT(A.HasID) FROM db1.dbo.Message A LEFT OUTER JOIN ON
db2.dbo.Message B ON A.HashID = B.HashID WHERE B.HashID IS NULL

How can I do this for 10 databases. This will require factorial of 10
queries to solve this problem.

I will appreciate if someone can provide hint on this.

Regards
AK


Feb 14 '06 #3

P: n/a
Better too much than too little :-)

--
Hope this helps.

Dan Guzman
SQL Server MVP

"AK" <am***@yahoo.com> wrote in message
news:11**********************@g43g2000cwa.googlegr oups.com...
Thank you Dan. This is exactly what I needed (in fact more than what I
needed :)

Regards
AK

Dan Guzman wrote:
> Here is my question: How can I find how many unique or duplicate
> entries they have across all the 10 databases.


The following will list the count of unique values (Duplicates = 0) as
well
has the non-unique values grouped by the number of duplicates (1-9).

SELECT
Duplicates,
(Duplicates + 1) * COUNT(*) AS TotalHashIDCount
FROM (
SELECT HashID, COUNT(*) - 1 AS Duplicates
FROM (
SELECT HashID FROM db1.dbo.Message
UNION ALL SELECT HashID FROM db2.dbo.Message
UNION ALL SELECT HashID FROM db3.dbo.Message
UNION ALL SELECT HashID FROM db4.dbo.Message
UNION ALL SELECT HashID FROM db5.dbo.Message
UNION ALL SELECT HashID FROM db6.dbo.Message
UNION ALL SELECT HashID FROM db7.dbo.Message
UNION ALL SELECT HashID FROM db8.dbo.Message
UNION ALL SELECT HashID FROM db9.dbo.Message
UNION ALL SELECT HashID FROM db10.dbo.Message
) AS Messages
GROUP BY HashID) AS HashIDCounts
GROUP BY Duplicates
ORDER BY Duplicates

--
Hope this helps.

Dan Guzman
SQL Server MVP

"AK" <am***@yahoo.com> wrote in message
news:11*********************@g44g2000cwa.googlegro ups.com...
> Hi
>
> Our product uses MS-SQL Server 2000. One of our customer has 10
> installations with each installation stroring data in its own database.
> Now the customer wants to consolidate these databases into one and we
> already have plan for that by consolidating one DB at a time. But first
> they want to find how many unique or duplicate entries they have across
> all the 10 databases
>
> Assumptions:
> 1. All the databases reside on the same server. (This is just an
> assumption, not the real environment at customer site)
> 2. Databases can not be merged before it is found how many unique or
> duplicate rows exist.
>
> Table under consideration:
> Message
> (
> HashID PK,
> ...
> )
>
> # of rows in Message table in each of databases: 1 Million
>
> Here is my question: How can I find how many unique or duplicate
> entries they have across all the 10 databases. I easily find unique
> rows for two databases with a query like this:
>
> SELECT COUNT(A.HasID) FROM db1.dbo.Message A LEFT OUTER JOIN ON
> db2.dbo.Message B ON A.HashID = B.HashID WHERE B.HashID IS NULL
>
> How can I do this for 10 databases. This will require factorial of 10
> queries to solve this problem.
>
> I will appreciate if someone can provide hint on this.
>
> Regards
> AK
>

Feb 15 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.