Merging rows within same table - Microsoft SQL Server

Jason

I need to populate a table from several sources of raw data. For a
given security (stock) it is possible to only receive PARTS of
information from each of the different sources. It is also possible
to have conflicting data.

I am looking to make a composite picture of a given security using the
following rules:

1) The goal is to replace all NULL and Blank values with data

2) Order of precedence (from highest to lowest) is Non-NULL Non-Blank
--> Blank --> NULL

3) In the case of Non-NULL Non-Blank values that conflict (are
different) leave existing value (even if NULL or Blank)

For example:

Given the following rows:

Symbol Identity IdSource Exchange Type SubType Name
-------- ------------ --------- --------- ------- ---------
------------------
TZA 901145102 CUSIP XNYS Stock NULL TV AZTECA
TZA 901145102 NULL NULL NULL NULL

WSM 969904101 CUSIP XNYS Stock NULL WILLIAMS
SONOMA
WSM 969904101 NULL XNYS Stock NULL
WILLIAMS-SONOMA
WSM CUSIP XNYS Stock Common NULL
WSM NULL CUSIP XASE Stock NULL WILLIAMS
SONOMA

TYC 902124106 CUSIP XNYS Stock NULL TYCO
TYC 902124106 CUSIP XNYS Stock NULL TYCO
INTERNATIONAL
I am looking for the following results ('*' indicates changed value)

Symbol Identity IdSource Exchange Type SubType Name
-------- ------------ --------- --------- ------- ---------
------------------
TZA 901145102 CUSIP XNYS Stock NULL TV AZTECA
TZA 901145102 *CUSIP *XNYS *Stock NULL *TV AZTECA

WSM 969904101 CUSIP XNYS Stock *Common WILLIAMS
SONOMA
WSM 969904101 *CUSIP XNYS Stock *Common
WILLIAMS-SONOMA
WSM *969904101 CUSIP NULL Stock Common NULL
WSM *969904101 CUSIP XASE Stock *Common WILLIAMS
SONOMA

TYC 902124106 CUSIP XNYS Stock NULL TYCO
TYC 902124106 CUSIP XNYS Stock NULL TYCO
INTERNATIONAL

Jul 20 '05 #1

Subscribe Post Reply

8308

David Portas

SELECT S.symbol,
COALESCE(T.xidentity,S.xidentity), COALESCE(T.idsource,S.idsource),
COALESCE(T.exchange,S.exchange), COALESCE(T.type,S.type),
COALESCE(T.subtype,S.subtype), COALESCE(T.xname,S.xname)
FROM Stocks AS S
JOIN
(SELECT symbol,
CASE COUNT(DISTINCT NULLIF(xidentity,''))
WHEN 1 THEN MAX(xidentity) END,
CASE COUNT(DISTINCT NULLIF(idsource,''))
WHEN 1 THEN MAX(idsource) END,
CASE COUNT(DISTINCT NULLIF(exchange,''))
WHEN 1 THEN MAX(exchange) END,
CASE COUNT(DISTINCT NULLIF(type,''))
WHEN 1 THEN MAX(type) END,
CASE COUNT(DISTINCT NULLIF(subtype,''))
WHEN 1 THEN MAX(subtype) END,
CASE COUNT(DISTINCT NULLIF(xname,''))
WHEN 1 THEN MAX(xname) END
FROM Stocks
GROUP BY symbol) AS T
(symbol, xidentity, idsource, exchange, type, subtype, xname)
ON S.symbol = T.symbol

Help others to help you by posting DDL for your table(s) and including
sample data as INSERT statements. That way people can test results and don't
have to guess at datatypes, constraints and keys:

CREATE TABLE Stocks (symbol CHAR(4) NOT NULL, xidentity CHAR(9) NULL,
idsource CHAR(5) NULL, exchange CHAR(4) NULL, type CHAR(5) NULL, subtype
CHAR(6) NULL, xname CHAR(20) NULL /* PRIMARY KEY ??? */)

INSERT INTO Stocks VALUES
('TZA', '901145102', 'CUSIP', 'XNYS', 'Stock', NULL, 'TV AZTECA')
INSERT INTO Stocks VALUES
('TZA', '901145102', NULL, NULL, '', NULL, NULL)
INSERT INTO Stocks VALUES
('WSM', '969904101', 'CUSIP', 'XNYS', 'Stock', NULL, 'WILLIAMS SONOMA')
INSERT INTO Stocks VALUES
('WSM', '969904101', NULL, 'XNYS', 'Stock', NULL, 'WILLIAMS-SONOMA')
INSERT INTO Stocks VALUES
('WSM', '', 'CUSIP', 'XNYS', 'Stock', 'Common', NULL)
INSERT INTO Stocks VALUES
('WSM', NULL, 'CUSIP', 'XASE', 'Stock', NULL, 'WILLIAMS SONOMA')
INSERT INTO Stocks VALUES
('TYC', '902124106', 'CUSIP', 'XNYS', 'Stock', NULL, 'TYCO')
INSERT INTO Stocks VALUES
('TYC', '902124106', 'CUSIP', 'XNYS', 'Stock', NULL, 'TYCO INTERNATIONAL')

--
David Portas
------------
Please reply only to the newsgroup
--

Jul 20 '05 #2

Jason

"David Portas" <RE****************************@acm.org> wrote in message news:<4u********************@giganews.com>...

SELECT S.symbol,
COALESCE(T.xidentity,S.xidentity), COALESCE(T.idsource,S.idsource),
COALESCE(T.exchange,S.exchange), COALESCE(T.type,S.type),
COALESCE(T.subtype,S.subtype), COALESCE(T.xname,S.xname)
FROM Stocks AS S
JOIN
(SELECT symbol,
CASE COUNT(DISTINCT NULLIF(xidentity,''))
WHEN 1 THEN MAX(xidentity) END,
CASE COUNT(DISTINCT NULLIF(idsource,''))
WHEN 1 THEN MAX(idsource) END,
CASE COUNT(DISTINCT NULLIF(exchange,''))
WHEN 1 THEN MAX(exchange) END,
CASE COUNT(DISTINCT NULLIF(type,''))
WHEN 1 THEN MAX(type) END,
CASE COUNT(DISTINCT NULLIF(subtype,''))
WHEN 1 THEN MAX(subtype) END,
CASE COUNT(DISTINCT NULLIF(xname,''))
WHEN 1 THEN MAX(xname) END
FROM Stocks
GROUP BY symbol) AS T
(symbol, xidentity, idsource, exchange, type, subtype, xname)
ON S.symbol = T.symbol

Help others to help you by posting DDL for your table(s) and including
sample data as INSERT statements. That way people can test results and don't
have to guess at datatypes, constraints and keys:

CREATE TABLE Stocks (symbol CHAR(4) NOT NULL, xidentity CHAR(9) NULL,
idsource CHAR(5) NULL, exchange CHAR(4) NULL, type CHAR(5) NULL, subtype
CHAR(6) NULL, xname CHAR(20) NULL /* PRIMARY KEY ??? */)

INSERT INTO Stocks VALUES
('TZA', '901145102', 'CUSIP', 'XNYS', 'Stock', NULL, 'TV AZTECA')
INSERT INTO Stocks VALUES
('TZA', '901145102', NULL, NULL, '', NULL, NULL)
INSERT INTO Stocks VALUES
('WSM', '969904101', 'CUSIP', 'XNYS', 'Stock', NULL, 'WILLIAMS SONOMA')
INSERT INTO Stocks VALUES
('WSM', '969904101', NULL, 'XNYS', 'Stock', NULL, 'WILLIAMS-SONOMA')
INSERT INTO Stocks VALUES
('WSM', '', 'CUSIP', 'XNYS', 'Stock', 'Common', NULL)
INSERT INTO Stocks VALUES
('WSM', NULL, 'CUSIP', 'XASE', 'Stock', NULL, 'WILLIAMS SONOMA')
INSERT INTO Stocks VALUES
('TYC', '902124106', 'CUSIP', 'XNYS', 'Stock', NULL, 'TYCO')
INSERT INTO Stocks VALUES
('TYC', '902124106', 'CUSIP', 'XNYS', 'Stock', NULL, 'TYCO INTERNATIONAL')

Thank you very much David. Your solution was clean and efficient. I
thought of using a join but did not even think about that mixture of
case, nullif, max, etc.

Jul 20 '05 #3

Jason

"David Portas" <RE****************************@acm.org> wrote in message news:<4u********************@giganews.com>...

SELECT S.symbol,
COALESCE(T.xidentity,S.xidentity), COALESCE(T.idsource,S.idsource),
COALESCE(T.exchange,S.exchange), COALESCE(T.type,S.type),
COALESCE(T.subtype,S.subtype), COALESCE(T.xname,S.xname)
FROM Stocks AS S
JOIN
(SELECT symbol,
CASE COUNT(DISTINCT NULLIF(xidentity,''))
WHEN 1 THEN MAX(xidentity) END,
CASE COUNT(DISTINCT NULLIF(idsource,''))
WHEN 1 THEN MAX(idsource) END,
CASE COUNT(DISTINCT NULLIF(exchange,''))
WHEN 1 THEN MAX(exchange) END,
CASE COUNT(DISTINCT NULLIF(type,''))
WHEN 1 THEN MAX(type) END,
CASE COUNT(DISTINCT NULLIF(subtype,''))
WHEN 1 THEN MAX(subtype) END,
CASE COUNT(DISTINCT NULLIF(xname,''))
WHEN 1 THEN MAX(xname) END
FROM Stocks
GROUP BY symbol) AS T
(symbol, xidentity, idsource, exchange, type, subtype, xname)
ON S.symbol = T.symbol

Help others to help you by posting DDL for your table(s) and including
sample data as INSERT statements. That way people can test results and don't
have to guess at datatypes, constraints and keys:

CREATE TABLE Stocks (symbol CHAR(4) NOT NULL, xidentity CHAR(9) NULL,
idsource CHAR(5) NULL, exchange CHAR(4) NULL, type CHAR(5) NULL, subtype
CHAR(6) NULL, xname CHAR(20) NULL /* PRIMARY KEY ??? */)

INSERT INTO Stocks VALUES
('TZA', '901145102', 'CUSIP', 'XNYS', 'Stock', NULL, 'TV AZTECA')
INSERT INTO Stocks VALUES
('TZA', '901145102', NULL, NULL, '', NULL, NULL)
INSERT INTO Stocks VALUES
('WSM', '969904101', 'CUSIP', 'XNYS', 'Stock', NULL, 'WILLIAMS SONOMA')
INSERT INTO Stocks VALUES
('WSM', '969904101', NULL, 'XNYS', 'Stock', NULL, 'WILLIAMS-SONOMA')
INSERT INTO Stocks VALUES
('WSM', '', 'CUSIP', 'XNYS', 'Stock', 'Common', NULL)
INSERT INTO Stocks VALUES
('WSM', NULL, 'CUSIP', 'XASE', 'Stock', NULL, 'WILLIAMS SONOMA')
INSERT INTO Stocks VALUES
('TYC', '902124106', 'CUSIP', 'XNYS', 'Stock', NULL, 'TYCO')
INSERT INTO Stocks VALUES
('TYC', '902124106', 'CUSIP', 'XNYS', 'Stock', NULL, 'TYCO INTERNATIONAL')

I did forget to ask a question. When I run your code I receive the
following message: 'Warning: Null value is eliminated by an aggregate
or other SET operation.'

Should I care about this?

Jul 20 '05 #4

Erland Sommarskog

Jason (Ja*******@hotmail.com) writes:

I did forget to ask a question. When I run your code I receive the
following message: 'Warning: Null value is eliminated by an aggregate
or other SET operation.'

Should I care about this?

No. The cause are these expressions:

COUNT(DISTINCT NULLIF(subtype,''))

You might be able to rewrite this, but I leave that to David. :-) If the
messages bother you, you can embed the query with SET ANSI_WARNINGS OFF and
SET ANSI_WARNINGS ON.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #5

Jason

Ja*******@hotmail.com (Jason) wrote in message news:<f0*************************@posting.google.c om>...

"David Portas" <RE****************************@acm.org> wrote in message news:<4u********************@giganews.com>...
SELECT S.symbol,
COALESCE(T.xidentity,S.xidentity), COALESCE(T.idsource,S.idsource),
COALESCE(T.exchange,S.exchange), COALESCE(T.type,S.type),
COALESCE(T.subtype,S.subtype), COALESCE(T.xname,S.xname)
FROM Stocks AS S
JOIN
(SELECT symbol,
CASE COUNT(DISTINCT NULLIF(xidentity,''))
WHEN 1 THEN MAX(xidentity) END,
CASE COUNT(DISTINCT NULLIF(idsource,''))
WHEN 1 THEN MAX(idsource) END,
CASE COUNT(DISTINCT NULLIF(exchange,''))
WHEN 1 THEN MAX(exchange) END,
CASE COUNT(DISTINCT NULLIF(type,''))
WHEN 1 THEN MAX(type) END,
CASE COUNT(DISTINCT NULLIF(subtype,''))
WHEN 1 THEN MAX(subtype) END,
CASE COUNT(DISTINCT NULLIF(xname,''))
WHEN 1 THEN MAX(xname) END
FROM Stocks
GROUP BY symbol) AS T
(symbol, xidentity, idsource, exchange, type, subtype, xname)
ON S.symbol = T.symbol

Hit another small issue. For SOME (in this case [type]) columns I need
to set a priority. If two rows have conflicting data (where COUNT > 1)
on a particular column, I want to use the value from the first row in
the set. (I would make sure that rows get inserted in the order I of
priority.) I thought of using TOP 1 somehow but cannot figure out how
to replace the MAX function with it (I know MAX is a function while
TOP is a statement).

Jul 20 '05 #6

David Portas

> on a particular column, I want to use the value from the first row in

the set. (I would make sure that rows get inserted in the order I of
priority.) I thought of using TOP 1 somehow but cannot figure out how

A table has no inherent ordering so you will have to add a column to
identify the sequence. Here's an example using Seq_No as a sequence number:

CREATE TABLE Stocks (symbol CHAR(4) NOT NULL, xidentity CHAR(9) NULL,
idsource CHAR(5) NULL, exchange CHAR(4) NULL, type CHAR(5) NULL, subtype
CHAR(6) NULL, xname CHAR(20) NULL, seq_no INTEGER NOT NULL UNIQUE /* PRIMARY
KEY ??? */)

I guess that you actually want the to take the value from the first row
which has a *populated* value for the column:

SELECT S.symbol,
COALESCE(T.xidentity,S.xidentity), COALESCE(T.idsource,S.idsource),
COALESCE(T.exchange,S.exchange), M.type,
COALESCE(T.subtype,S.subtype), COALESCE(T.xname,S.xname)
FROM Stocks AS S
JOIN
(SELECT symbol,
CASE COUNT(DISTINCT NULLIF(xidentity,''))
WHEN 1 THEN MAX(xidentity) END,
CASE COUNT(DISTINCT NULLIF(idsource,''))
WHEN 1 THEN MAX(idsource) END,
CASE COUNT(DISTINCT NULLIF(exchange,''))
WHEN 1 THEN MAX(exchange) END,
CASE COUNT(DISTINCT NULLIF(subtype,''))
WHEN 1 THEN MAX(subtype) END,
CASE COUNT(DISTINCT NULLIF(xname,''))
WHEN 1 THEN MAX(xname) END,
MIN(CASE WHEN type>'' THEN seq_no END)
FROM Stocks
GROUP BY symbol) AS T
(symbol, xidentity, idsource, exchange, subtype, xname, seq_no)
ON S.symbol = T.symbol
LEFT JOIN Stocks AS M
ON T.seq_no = M.seq_no

--
David Portas
------------
Please reply only to the newsgroup
--

Jul 20 '05 #7

Similar topics

READING/MERGING XML DATA

by: Klatuu | last post by:

Whew, I've struggled my way through figuring out how to use XML to transport data..now I can imagine what having a baby is like :) But, I'm stuck now. I generate the XML (single table, no...

.NET Framework

Merging two datasets deletes parent elements, preserves all children

by: Walt Borders | last post by:

Hi, My problem: Merging two datasets deletes parent elements, preserves all children. I've created two dataSets. Each use the same schema, parent-child nested tables. The first dataSet is...

.NET Framework

merging multiple fields from various tables in one word document

by: svdh | last post by:

I have posed a question last saturday and have advanced alot in the meantime. But I am still not there Problem is that I try to merging various fields from various tables in one document in Word...

Microsoft Access / VBA

complex merging query in sql server 2000

by: mrclash | last post by:

Hello, I have a Database in a SQL Server 2000 where I have different users tables with equal fields like this: id (int) email (varchar) name (varchar) address (varchar) joinedon (datetime)

Microsoft SQL Server

merging data?

by: Ralph Smith | last post by:

I have two identical databases on two different servers and I need to add the data in tables from one server to the tables in the other server. Is there a way to do that in mysql? thanks, Ralph

MySQL Database

Merging two rows within same table

by: sudhaMurugesan | last post by:

Hi, I have a table like this. IOMode Date EmployeeID EmpName O 2007-02-28 16:46:00.000 FI0001258 M.MANIGANDAN I 2007-02-28...

Microsoft SQL Server

Merging Rows with Similar Values

by: dannylam4 | last post by:

Hello, I've got a question about merging/concatenating rows. There's a similar topic here: Combining Multiple Rows of one Field into One Result but I didn't know if I should hijack it. Basically, I...

Microsoft Access / VBA

dataset merging

by: Neil Chambers | last post by:

I am trying to get my head around dataset merging but despite a little research I could still use a pointer (or ten). Basically I want to perform an outer join operation on a dataset (created from...

ASP.NET

merging rows for the same time stamp

by: akdemirc | last post by:

Hi, My question is about retrieving single records based on a time column, i mean the result set should not include duplicate rows for a unique time value as an example: A B C ...

Microsoft SQL Server

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware