473,404 Members | 2,178 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,404 software developers and data experts.

Data import - foreign key question

This is a rather abstract question about data design, but I ask it here
because a) the database is SQL Server, and b) you're such a learned
bunch!

Let's assume the classic relation of Customers and Orders, where an
Order may reference a single Customer. If I was designing such a
relation from scratch, I would create the Customer table with an
Identity column and call it CustomerID. The Order table would contain
a column called CustomerID, a foreign key to the Customer table.

So far, so unexceptional. However, in my current project I have to
work with legacy data that comes from a number of old Access systems
where the data was not normalised. I wish to normalise it.

The main table in this new system contains reports on parts. Each
report may reference a single part. However, the old data which I have
to import allowed the user to type in the part number. This has led to
dirty data (for example, '40-7889-9098' appears, as does '40-7889-
9098') so I will clean this data up. In the application, the part
number will be selected from a drop down list, though the administrator
will have access to a builder to add, amend or delete part numbers.

So, my report table needs to store a reference to a part. When I
import the data into my SQL Report table, I will initially bring across
the part number. I will then populate the Part Numbers table with all
discrete, distinct part numbers from the Report table. My question is
should I then create a PartNumberID column in both tables, and "back
populate" the Report table with the PartNumberID which corresponds with
the matching PartNumber - e.g.

UPDATE
R
SET
R.fldPartNumberID = PN.fldPartNumberID
FROM
tblReports R
INNER JOIN tblPartNumbers RN
ON R.fldPartNumber = RN.fldPartNumber

I could then drop the fldPartNumber from the tblReports table.

My question is - should I bother? Or can I just leave the actual
PartNumber in the Reports table, and leave the tblPartNumbers table
with a single column which is both Primary key and Foreign key?

Sorry if this is poorly expressed - I had a tough weekend!

Edward
--
The reading group's reading group:
http://www.bookgroup.org.uk

Jul 23 '05 #1
4 3153
I'm not sure why you say you would leave the PartNumbers table with a
single column. Would it not be better to have a Parts table, whose
primary key is PartNumber? Presumably you have things like names,
descriptions, flags etc. for each part (and if you don't, then you
probably will at some point), so you would need a Parts table anyway.

And what would the difference be between PartNumberID and PartNumber? I
would guess that PartNumber must be unique (after you clean them up,
that is), so why would you want to invent your own ID?

Simon

Jul 23 '05 #2
"Simon Hayes" <sq*@hayes.ch> wrote in message news:<11*********************@g14g2000cwa.googlegr oups.com>...
I'm not sure why you say you would leave the PartNumbers table with a
single column. Would it not be better to have a Parts table, whose
primary key is PartNumber? Presumably you have things like names,
descriptions, flags etc. for each part (and if you don't, then you
probably will at some point), so you would need a Parts table anyway.

And what would the difference be between PartNumberID and PartNumber? I
would guess that PartNumber must be unique (after you clean them up,
that is), so why would you want to invent your own ID?


This is exactly my point. The trouble is that I am so used to
creating tables from scratch:

fldPartNumberID
fldPartNumber
fldDescription
fldCurrent
fldOEMID
etc.

that I find myself itching to add the redundant Primary Key, even
though the PartNumber is itself unique or, if it is not, it is unique
in combination with one or more other fields.

Ah, that's the point. I knew I'd get there in the end. If the part
number is NOT unique, but in combination with, say, fldOEMID, then in
order to store a reference to a row in the parts table the Reports
table would need to have TWO foreign keys on the Parts table. So, my
rule of thumb is:

If the PartNumber is unique, it's the ID. If it isn't, it isn't the
ID.

Thanks for your help in clarifying my thoughts.

Edward
Jul 23 '05 #3

"Edward" <te********@hotmail.com> wrote in message
news:25**************************@posting.google.c om...
"Simon Hayes" <sq*@hayes.ch> wrote in message
news:<11*********************@g14g2000cwa.googlegr oups.com>...
I'm not sure why you say you would leave the PartNumbers table with a
single column. Would it not be better to have a Parts table, whose
primary key is PartNumber? Presumably you have things like names,
descriptions, flags etc. for each part (and if you don't, then you
probably will at some point), so you would need a Parts table anyway.

And what would the difference be between PartNumberID and PartNumber? I
would guess that PartNumber must be unique (after you clean them up,
that is), so why would you want to invent your own ID?


This is exactly my point. The trouble is that I am so used to
creating tables from scratch:

fldPartNumberID
fldPartNumber
fldDescription
fldCurrent
fldOEMID
etc.

that I find myself itching to add the redundant Primary Key, even
though the PartNumber is itself unique or, if it is not, it is unique
in combination with one or more other fields.

Ah, that's the point. I knew I'd get there in the end. If the part
number is NOT unique, but in combination with, say, fldOEMID, then in
order to store a reference to a row in the parts table the Reports
table would need to have TWO foreign keys on the Parts table. So, my
rule of thumb is:

If the PartNumber is unique, it's the ID. If it isn't, it isn't the
ID.

Thanks for your help in clarifying my thoughts.

Edward


Not exactly - in the case you describe, you would have a composite primary
key on (PartNumber, OEMID):

create table dbo.Parts (
PartNumber char(12) not null,
OEMID int not null,
...
constraint PK_Parts primary key (PartNumber, OEMID)
)

It would be then referenced by one foreign key made up of two columns:

create table dbo.Reports (
ReportID int not null,
PartNumber char(12) not null,
OEMID int not null,
...
constraint PK_Reports primary key (ReportID),
constraint FK_Reports_Parts foreign key (PartNumber, OEMID) references
dbo.Parts (PartNumber, OEMID)
)

This isn't unusual, and in general, you should always use the table's
natural primary key, even if it's made up of several columns, and only
introduce your own when you have a good reason - there's no rule to say that
a primary key must be a single column.

For example, a certain table's natural key might be made up of 10 columns,
including wide varchar columns - that would probably cause performance
problems at some point. So in a situation like that, it's not unreasonable
to introduce your own artificial key, but you would also put a UNIQUE
constraint on the natural key anyway, to make sure the data is valid.

Simon
Jul 23 '05 #4
te********@hotmail.com wrote:
...
So far, so unexceptional. However, in my current project I have to
work with legacy data that comes from a number of old Access systems
where the data was not normalised. I wish to normalise it.

The main table in this new system contains reports on parts. Each
report may reference a single part. However, the old data which I have to import allowed the user to type in the part number. This has led to dirty data (for example, '40-7889-9098' appears, as does '40-7889-
9098') so I will clean this data up. From my perspective, this is where your biggest potential problem lies,

although hidden. I once had to do a similar project where data from
Access tables had to be translated into a different format that could
be imported into a different system. Almost all this was done in code
since it involved complex decisions based on interrelationships of
existing data. The part I didn't like was that one field had to be put
into different categories that didn't exist in the original. The other
fields had nice concrete rules for the conversion. So I created a form
and suggested that their regular employees choose the category from a
combobox for each record similar to the way your application will
select part numbers. All the data coverted smoothly into the new
system except for one of the categories that had been miscatalogued.
The customer wanted me to reduce my invoice based on the grief caused
by having the wrong category. I pointed out that I had selected none
of the categories. Showing which computer was used for the improper
selections would have been even better. Avoid cleaning data yourself
if possible. If not, keep a backup and get the cleanup rules approved.
'Replace' can also do some pretty unexpected things if you're not
careful. Insist that they have someone else check the "clean" data for
final approval.

James A. Fortune

Jul 23 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: J Belly | last post by:
Hi, all: I'm a newbie trying to understand the concept of referential integrity and dealing with Primary and Foreign Keys. I'm sure mine is a simple problem... I've created 3 tables as...
4
by: Edward | last post by:
Access 2k -> SQL Server 2k My client has an app that is A2k FE with A2k BE. They have asked me to move the BE to SQL Server. I have a bit of experience with SQL Server, and I'm happy with...
1
by: James E | last post by:
I have a question about best practices of how to deal with lookup data from my C# apps. On a couple of occasions I have come across a problem where I have to automate inserting a record into a...
4
by: teddysnips | last post by:
This is a rather abstract question about data design, but I ask it here because a) the database is SQL Server, and b) you're such a learned bunch! Let's assume the classic relation of Customers...
0
by: Jerry | last post by:
Below is ALL the code for all the databases... Here's the problem: I callup the aspx file in IE and the form comes up just fine. When I select a person to update, I get the subject error. ...
1
by: tony.pahl | last post by:
We are converting a data warehouse to a Unicode database to get ready for multilingual support. If we will have 95% of our data in English as we currently do, and less than 5% in other foreign...
3
by: laredotornado | last post by:
Hello, My hosting company has MySQL 5 and so do I on my local Fedora Core 5 Linux machine. However, when I import a dump file from their enviornment into mine, I get this error ERROR 1064...
0
by: David Linsin | last post by:
I created a simple test case to reproduce the problem (also check Bug #15500): import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.Statement;...
6
by: Janick Bernet | last post by:
We have a table (document_folders), which simulates a directory hierarchy, so in this table, each folder has an attribute "TopFolderID" to indicate which folder its contained in (for the root...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.