By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,695 Members | 1,317 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,695 IT Pros & Developers. It's quick & easy.

Data import - foreign key question

P: n/a
This is a rather abstract question about data design, but I ask it here
because a) the database is SQL Server, and b) you're such a learned
bunch!

Let's assume the classic relation of Customers and Orders, where an
Order may reference a single Customer. If I was designing such a
relation from scratch, I would create the Customer table with an
Identity column and call it CustomerID. The Order table would contain
a column called CustomerID, a foreign key to the Customer table.

So far, so unexceptional. However, in my current project I have to
work with legacy data that comes from a number of old Access systems
where the data was not normalised. I wish to normalise it.

The main table in this new system contains reports on parts. Each
report may reference a single part. However, the old data which I have
to import allowed the user to type in the part number. This has led to
dirty data (for example, '40-7889-9098' appears, as does '40-7889-
9098') so I will clean this data up. In the application, the part
number will be selected from a drop down list, though the administrator
will have access to a builder to add, amend or delete part numbers.

So, my report table needs to store a reference to a part. When I
import the data into my SQL Report table, I will initially bring across
the part number. I will then populate the Part Numbers table with all
discrete, distinct part numbers from the Report table. My question is
should I then create a PartNumberID column in both tables, and "back
populate" the Report table with the PartNumberID which corresponds with
the matching PartNumber - e.g.

UPDATE
R
SET
R.fldPartNumberID = PN.fldPartNumberID
FROM
tblReports R
INNER JOIN tblPartNumbers RN
ON R.fldPartNumber = RN.fldPartNumber

I could then drop the fldPartNumber from the tblReports table.

My question is - should I bother? Or can I just leave the actual
PartNumber in the Reports table, and leave the tblPartNumbers table
with a single column which is both Primary key and Foreign key?

Sorry if this is poorly expressed - I had a tough weekend!

Edward
--
The reading group's reading group:
http://www.bookgroup.org.uk

Jul 23 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
I'm not sure why you say you would leave the PartNumbers table with a
single column. Would it not be better to have a Parts table, whose
primary key is PartNumber? Presumably you have things like names,
descriptions, flags etc. for each part (and if you don't, then you
probably will at some point), so you would need a Parts table anyway.

And what would the difference be between PartNumberID and PartNumber? I
would guess that PartNumber must be unique (after you clean them up,
that is), so why would you want to invent your own ID?

Simon

Jul 23 '05 #2

P: n/a
"Simon Hayes" <sq*@hayes.ch> wrote in message news:<11*********************@g14g2000cwa.googlegr oups.com>...
I'm not sure why you say you would leave the PartNumbers table with a
single column. Would it not be better to have a Parts table, whose
primary key is PartNumber? Presumably you have things like names,
descriptions, flags etc. for each part (and if you don't, then you
probably will at some point), so you would need a Parts table anyway.

And what would the difference be between PartNumberID and PartNumber? I
would guess that PartNumber must be unique (after you clean them up,
that is), so why would you want to invent your own ID?


This is exactly my point. The trouble is that I am so used to
creating tables from scratch:

fldPartNumberID
fldPartNumber
fldDescription
fldCurrent
fldOEMID
etc.

that I find myself itching to add the redundant Primary Key, even
though the PartNumber is itself unique or, if it is not, it is unique
in combination with one or more other fields.

Ah, that's the point. I knew I'd get there in the end. If the part
number is NOT unique, but in combination with, say, fldOEMID, then in
order to store a reference to a row in the parts table the Reports
table would need to have TWO foreign keys on the Parts table. So, my
rule of thumb is:

If the PartNumber is unique, it's the ID. If it isn't, it isn't the
ID.

Thanks for your help in clarifying my thoughts.

Edward
Jul 23 '05 #3

P: n/a

"Edward" <te********@hotmail.com> wrote in message
news:25**************************@posting.google.c om...
"Simon Hayes" <sq*@hayes.ch> wrote in message
news:<11*********************@g14g2000cwa.googlegr oups.com>...
I'm not sure why you say you would leave the PartNumbers table with a
single column. Would it not be better to have a Parts table, whose
primary key is PartNumber? Presumably you have things like names,
descriptions, flags etc. for each part (and if you don't, then you
probably will at some point), so you would need a Parts table anyway.

And what would the difference be between PartNumberID and PartNumber? I
would guess that PartNumber must be unique (after you clean them up,
that is), so why would you want to invent your own ID?


This is exactly my point. The trouble is that I am so used to
creating tables from scratch:

fldPartNumberID
fldPartNumber
fldDescription
fldCurrent
fldOEMID
etc.

that I find myself itching to add the redundant Primary Key, even
though the PartNumber is itself unique or, if it is not, it is unique
in combination with one or more other fields.

Ah, that's the point. I knew I'd get there in the end. If the part
number is NOT unique, but in combination with, say, fldOEMID, then in
order to store a reference to a row in the parts table the Reports
table would need to have TWO foreign keys on the Parts table. So, my
rule of thumb is:

If the PartNumber is unique, it's the ID. If it isn't, it isn't the
ID.

Thanks for your help in clarifying my thoughts.

Edward


Not exactly - in the case you describe, you would have a composite primary
key on (PartNumber, OEMID):

create table dbo.Parts (
PartNumber char(12) not null,
OEMID int not null,
...
constraint PK_Parts primary key (PartNumber, OEMID)
)

It would be then referenced by one foreign key made up of two columns:

create table dbo.Reports (
ReportID int not null,
PartNumber char(12) not null,
OEMID int not null,
...
constraint PK_Reports primary key (ReportID),
constraint FK_Reports_Parts foreign key (PartNumber, OEMID) references
dbo.Parts (PartNumber, OEMID)
)

This isn't unusual, and in general, you should always use the table's
natural primary key, even if it's made up of several columns, and only
introduce your own when you have a good reason - there's no rule to say that
a primary key must be a single column.

For example, a certain table's natural key might be made up of 10 columns,
including wide varchar columns - that would probably cause performance
problems at some point. So in a situation like that, it's not unreasonable
to introduce your own artificial key, but you would also put a UNIQUE
constraint on the natural key anyway, to make sure the data is valid.

Simon
Jul 23 '05 #4

P: n/a
te********@hotmail.com wrote:
...
So far, so unexceptional. However, in my current project I have to
work with legacy data that comes from a number of old Access systems
where the data was not normalised. I wish to normalise it.

The main table in this new system contains reports on parts. Each
report may reference a single part. However, the old data which I have to import allowed the user to type in the part number. This has led to dirty data (for example, '40-7889-9098' appears, as does '40-7889-
9098') so I will clean this data up. From my perspective, this is where your biggest potential problem lies,

although hidden. I once had to do a similar project where data from
Access tables had to be translated into a different format that could
be imported into a different system. Almost all this was done in code
since it involved complex decisions based on interrelationships of
existing data. The part I didn't like was that one field had to be put
into different categories that didn't exist in the original. The other
fields had nice concrete rules for the conversion. So I created a form
and suggested that their regular employees choose the category from a
combobox for each record similar to the way your application will
select part numbers. All the data coverted smoothly into the new
system except for one of the categories that had been miscatalogued.
The customer wanted me to reduce my invoice based on the grief caused
by having the wrong category. I pointed out that I had selected none
of the categories. Showing which computer was used for the improper
selections would have been even better. Avoid cleaning data yourself
if possible. If not, keep a backup and get the cleanup rules approved.
'Replace' can also do some pretty unexpected things if you're not
careful. Insist that they have someone else check the "clean" data for
final approval.

James A. Fortune

Jul 23 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.