By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,462 Members | 3,388 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,462 IT Pros & Developers. It's quick & easy.

Stored Proc To Copy Unnormalized to Normalized Table

P: n/a
I have a "source" table that is being populated by a DTS bulk import
of a text file. I need to scrub the source table after the import
step by running appropriate stored proc(s) to copy the source data to
2 normalized tables. The problem is that table "Companies" needs to
be populated first in order to generate the Identity ID and then use
that as the foreign key in the other table.

Here is the DDL:

CREATE TABLE [dbo].[OriginalList] (
[FirstName] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[LastName] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[Company] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Addr1] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[City] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[State] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Zip] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Phone] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[Companies] (
[ID] [int] IDENTITY (1, 1) NOT NULL ,
[Name] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[CompanyLocations] (
[ID] [int] IDENTITY (1, 1) NOT NULL ,
[CompanyID] [int] NOT NULL ,
[Addr1] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[City] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[State] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Zip] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Phone] [varchar] (14) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
This is the stored proc I have at this time that does NOT work. It
uses the last Company insert for all the CompanyLocations which is not
correct.

CREATE PROCEDURE DataScrubSP AS
Begin Transaction
insert Companies (Name) select Company from OriginalList
IF @@Error <> 0
GOTO ErrorHandler

declare @COID int
select @COID=@@identity

insert CompanyLocations (CompanyID, Addr1, City, State, Zip) select
@COID, Addr1, City, State, Zip from OriginalList
IF @@Error <> 0
GOTO ErrorHandler

COMMIT TRANSACTION

ErrorHandler:
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION
RETURN
GO

Thanks for any help.

Alex.
Jul 20 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
On 9 Nov 2004 08:57:44 -0800, Alex wrote:
I have a "source" table that is being populated by a DTS bulk import
of a text file. I need to scrub the source table after the import
step by running appropriate stored proc(s) to copy the source data to
2 normalized tables. The problem is that table "Companies" needs to
be populated first in order to generate the Identity ID and then use
that as the foreign key in the other table.
(snip DDL)
This is the stored proc I have at this time that does NOT work. It
uses the last Company insert for all the CompanyLocations which is not
correct.


(snip code)

Hi Alex,

First some comments on the DDL. OriginalList is obviously some staging
table so I can understand the funny datatypes, but are you really sure
that all these columns can be NULL? And that no column (or even
combination of columns) can be used in a PRIMARY KEY or UNIQUE constraint?

In the Companies table, you forgot to include a UNIQUE constraint on the
Name column. Your company names are unique, aren't they? If not, you
should of course not add the UNIQUE constraint on that column only, but
add more columns, to ensure that you'll ahve a natural, verifiable key in
addition to the convenient but meaningless identity column. Remember:
without verifiable key, you'll never be able to know if the below example
data is correct or a result of udplicated data entry:
ID | Name
----+------------
17 | Microsoft
36 | Microsoft

In the CompanyLocations table, you'll gain a few bytes, a little bit of
speed and improved self-documenting if you change State to char(2). Also,
I'm surprised that you allow all columns to be NULL - most forms I have to
fill out require me to supply at least address, city, state (if US) or
country (if not US) and ZIP. Only phone is often optional.
Now, on to your question. If you have a real strong urge to use @@IDENTITY
or SCOPE_IDENTITY (), then you'll have to use a cursor or another way to
loop through the rows in OriginalList and process them one by one. But
that is not needed at all - as long as you have a good natural key (and
you always should have!), there are other ways to find the IDENTITY value
of a row: by using the natural key.

Try this snippet of (untested) code. My assumption is that Companies.Name
is indeed unique - if it isn't, you'll have to adapt the code (and your
tables).

CREATE PROCEDURE DataScrubSP AS
BEGIN TRANSACTION
-- Insert all companies from OriginalList
INSERT Companies (Name)
SELECT Company
FROM OriginalList
IF @@Error <> 0
GOTO ErrorHandler

-- Now, insert all companylocations.
-- Use companyname to find the assigned ID value.
INSERT CompanyLocations (CompanyID, Addr1, City, State, Zip)
SELECT c.ID, o.Addr1, o.City, o.State, o.Zip
FROM OriginalList AS o
INNER JOIN Companies AS c
ON c.Name = o.Company
IF @@Error <> 0
GOTO ErrorHandler

COMMIT TRANSACTION

ErrorHandler:
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION
RETURN
GO
Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Jul 20 '05 #2

P: n/a
Alex (hf*****@yahoo.com) writes:
I have a "source" table that is being populated by a DTS bulk import
of a text file. I need to scrub the source table after the import
step by running appropriate stored proc(s) to copy the source data to
2 normalized tables. The problem is that table "Companies" needs to
be populated first in order to generate the Identity ID and then use
that as the foreign key in the other table.
...
This is the stored proc I have at this time that does NOT work. It
uses the last Company insert for all the CompanyLocations which is not
correct.


Remove the IDENTITY property on Companies, and roll your own:

CREATE PROCEDURE DataScrubSP AS

CREATE TABLE #temp (ident int IDENTITY,
name varchar(50) NOT NULL)

insert #temp (name) select Company from OriginalList
IF @@Error <> 0
GOTO ErrorHandler

Begin Transaction

declare @first_id int
select @first_id = colaesce(MAX(ID), 0) FROM Companies (UPDLOCK)

insert Companies(ID, name)
SELECT @first_id + ident, name FROM #temp
IF @@Error <> 0
GOTO ErrorHandler

insert CompanyLocations (CompanyID, Addr1, City, State, Zip)
select c.ID, o.Addr1, o.City, o.State, o.Zip
from OriginalList o
Join Companies C ON o.Company = C.name
IF @@Error <> 0
GOTO ErrorHandler

Actually, this example you don't even need that temp table or
@first_id. Or even to drop the IDENTITY property on Companies.
All you need the join in the INSERT statement. But I suspect
that the due to the shrubbing there is some more complexity, so
I included that technique.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #3

P: n/a
Thanks Hugo. I implemented the suggestion and it worked.

Alex.
"Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message
news:rk********************************@4ax.com...
On 9 Nov 2004 08:57:44 -0800, Alex wrote:
I have a "source" table that is being populated by a DTS bulk import
of a text file. I need to scrub the source table after the import
step by running appropriate stored proc(s) to copy the source data to
2 normalized tables. The problem is that table "Companies" needs to
be populated first in order to generate the Identity ID and then use
that as the foreign key in the other table.


(snip DDL)
This is the stored proc I have at this time that does NOT work. It
uses the last Company insert for all the CompanyLocations which is not
correct.


(snip code)

Hi Alex,

First some comments on the DDL. OriginalList is obviously some staging
table so I can understand the funny datatypes, but are you really sure
that all these columns can be NULL? And that no column (or even
combination of columns) can be used in a PRIMARY KEY or UNIQUE constraint?

In the Companies table, you forgot to include a UNIQUE constraint on the
Name column. Your company names are unique, aren't they? If not, you
should of course not add the UNIQUE constraint on that column only, but
add more columns, to ensure that you'll ahve a natural, verifiable key in
addition to the convenient but meaningless identity column. Remember:
without verifiable key, you'll never be able to know if the below example
data is correct or a result of udplicated data entry:
ID | Name
----+------------
17 | Microsoft
36 | Microsoft

In the CompanyLocations table, you'll gain a few bytes, a little bit of
speed and improved self-documenting if you change State to char(2). Also,
I'm surprised that you allow all columns to be NULL - most forms I have to
fill out require me to supply at least address, city, state (if US) or
country (if not US) and ZIP. Only phone is often optional.
Now, on to your question. If you have a real strong urge to use @@IDENTITY
or SCOPE_IDENTITY (), then you'll have to use a cursor or another way to
loop through the rows in OriginalList and process them one by one. But
that is not needed at all - as long as you have a good natural key (and
you always should have!), there are other ways to find the IDENTITY value
of a row: by using the natural key.

Try this snippet of (untested) code. My assumption is that Companies.Name
is indeed unique - if it isn't, you'll have to adapt the code (and your
tables).

CREATE PROCEDURE DataScrubSP AS
BEGIN TRANSACTION
-- Insert all companies from OriginalList
INSERT Companies (Name)
SELECT Company
FROM OriginalList
IF @@Error <> 0
GOTO ErrorHandler

-- Now, insert all companylocations.
-- Use companyname to find the assigned ID value.
INSERT CompanyLocations (CompanyID, Addr1, City, State, Zip)
SELECT c.ID, o.Addr1, o.City, o.State, o.Zip
FROM OriginalList AS o
INNER JOIN Companies AS c
ON c.Name = o.Company
IF @@Error <> 0
GOTO ErrorHandler

COMMIT TRANSACTION

ErrorHandler:
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION
RETURN
GO
Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

Jul 20 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.