473,320 Members | 2,145 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Data import - foreign key question

This is a rather abstract question about data design, but I ask it here
because a) the database is SQL Server, and b) you're such a learned
bunch!

Let's assume the classic relation of Customers and Orders, where an
Order may reference a single Customer. If I was designing such a
relation from scratch, I would create the Customer table with an
Identity column and call it CustomerID. The Order table would contain
a column called CustomerID, a foreign key to the Customer table.

So far, so unexceptional. However, in my current project I have to
work with legacy data that comes from a number of old Access systems
where the data was not normalised. I wish to normalise it.

The main table in this new system contains reports on parts. Each
report may reference a single part. However, the old data which I have
to import allowed the user to type in the part number. This has led to
dirty data (for example, '40-7889-9098' appears, as does '40-7889-
9098') so I will clean this data up. In the application, the part
number will be selected from a drop down list, though the administrator
will have access to a builder to add, amend or delete part numbers.

So, my report table needs to store a reference to a part. When I
import the data into my SQL Report table, I will initially bring across
the part number. I will then populate the Part Numbers table with all
discrete, distinct part numbers from the Report table. My question is
should I then create a PartNumberID column in both tables, and "back
populate" the Report table with the PartNumberID which corresponds with
the matching PartNumber - e.g.

UPDATE
R
SET
R.fldPartNumberID = PN.fldPartNumberID
FROM
tblReports R
INNER JOIN tblPartNumbers RN
ON R.fldPartNumber = RN.fldPartNumber

I could then drop the fldPartNumber from the tblReports table.

My question is - should I bother? Or can I just leave the actual
PartNumber in the Reports table, and leave the tblPartNumbers table
with a single column which is both Primary key and Foreign key?

Sorry if this is poorly expressed - I had a tough weekend!

Edward
--
The reading group's reading group:
http://www.bookgroup.org.uk

Nov 13 '05 #1
4 1682
I'm not sure why you say you would leave the PartNumbers table with a
single column. Would it not be better to have a Parts table, whose
primary key is PartNumber? Presumably you have things like names,
descriptions, flags etc. for each part (and if you don't, then you
probably will at some point), so you would need a Parts table anyway.

And what would the difference be between PartNumberID and PartNumber? I
would guess that PartNumber must be unique (after you clean them up,
that is), so why would you want to invent your own ID?

Simon

Nov 13 '05 #2
"Simon Hayes" <sq*@hayes.ch> wrote in message news:<11*********************@g14g2000cwa.googlegr oups.com>...
I'm not sure why you say you would leave the PartNumbers table with a
single column. Would it not be better to have a Parts table, whose
primary key is PartNumber? Presumably you have things like names,
descriptions, flags etc. for each part (and if you don't, then you
probably will at some point), so you would need a Parts table anyway.

And what would the difference be between PartNumberID and PartNumber? I
would guess that PartNumber must be unique (after you clean them up,
that is), so why would you want to invent your own ID?


This is exactly my point. The trouble is that I am so used to
creating tables from scratch:

fldPartNumberID
fldPartNumber
fldDescription
fldCurrent
fldOEMID
etc.

that I find myself itching to add the redundant Primary Key, even
though the PartNumber is itself unique or, if it is not, it is unique
in combination with one or more other fields.

Ah, that's the point. I knew I'd get there in the end. If the part
number is NOT unique, but in combination with, say, fldOEMID, then in
order to store a reference to a row in the parts table the Reports
table would need to have TWO foreign keys on the Parts table. So, my
rule of thumb is:

If the PartNumber is unique, it's the ID. If it isn't, it isn't the
ID.

Thanks for your help in clarifying my thoughts.

Edward
Nov 13 '05 #3

"Edward" <te********@hotmail.com> wrote in message
news:25**************************@posting.google.c om...
"Simon Hayes" <sq*@hayes.ch> wrote in message
news:<11*********************@g14g2000cwa.googlegr oups.com>...
I'm not sure why you say you would leave the PartNumbers table with a
single column. Would it not be better to have a Parts table, whose
primary key is PartNumber? Presumably you have things like names,
descriptions, flags etc. for each part (and if you don't, then you
probably will at some point), so you would need a Parts table anyway.

And what would the difference be between PartNumberID and PartNumber? I
would guess that PartNumber must be unique (after you clean them up,
that is), so why would you want to invent your own ID?


This is exactly my point. The trouble is that I am so used to
creating tables from scratch:

fldPartNumberID
fldPartNumber
fldDescription
fldCurrent
fldOEMID
etc.

that I find myself itching to add the redundant Primary Key, even
though the PartNumber is itself unique or, if it is not, it is unique
in combination with one or more other fields.

Ah, that's the point. I knew I'd get there in the end. If the part
number is NOT unique, but in combination with, say, fldOEMID, then in
order to store a reference to a row in the parts table the Reports
table would need to have TWO foreign keys on the Parts table. So, my
rule of thumb is:

If the PartNumber is unique, it's the ID. If it isn't, it isn't the
ID.

Thanks for your help in clarifying my thoughts.

Edward


Not exactly - in the case you describe, you would have a composite primary
key on (PartNumber, OEMID):

create table dbo.Parts (
PartNumber char(12) not null,
OEMID int not null,
...
constraint PK_Parts primary key (PartNumber, OEMID)
)

It would be then referenced by one foreign key made up of two columns:

create table dbo.Reports (
ReportID int not null,
PartNumber char(12) not null,
OEMID int not null,
...
constraint PK_Reports primary key (ReportID),
constraint FK_Reports_Parts foreign key (PartNumber, OEMID) references
dbo.Parts (PartNumber, OEMID)
)

This isn't unusual, and in general, you should always use the table's
natural primary key, even if it's made up of several columns, and only
introduce your own when you have a good reason - there's no rule to say that
a primary key must be a single column.

For example, a certain table's natural key might be made up of 10 columns,
including wide varchar columns - that would probably cause performance
problems at some point. So in a situation like that, it's not unreasonable
to introduce your own artificial key, but you would also put a UNIQUE
constraint on the natural key anyway, to make sure the data is valid.

Simon
Nov 13 '05 #4
te********@hotmail.com wrote:
...
So far, so unexceptional. However, in my current project I have to
work with legacy data that comes from a number of old Access systems
where the data was not normalised. I wish to normalise it.

The main table in this new system contains reports on parts. Each
report may reference a single part. However, the old data which I have to import allowed the user to type in the part number. This has led to dirty data (for example, '40-7889-9098' appears, as does '40-7889-
9098') so I will clean this data up. From my perspective, this is where your biggest potential problem lies,

although hidden. I once had to do a similar project where data from
Access tables had to be translated into a different format that could
be imported into a different system. Almost all this was done in code
since it involved complex decisions based on interrelationships of
existing data. The part I didn't like was that one field had to be put
into different categories that didn't exist in the original. The other
fields had nice concrete rules for the conversion. So I created a form
and suggested that their regular employees choose the category from a
combobox for each record similar to the way your application will
select part numbers. All the data coverted smoothly into the new
system except for one of the categories that had been miscatalogued.
The customer wanted me to reduce my invoice based on the grief caused
by having the wrong category. I pointed out that I had selected none
of the categories. Showing which computer was used for the improper
selections would have been even better. Avoid cleaning data yourself
if possible. If not, keep a backup and get the cleanup rules approved.
'Replace' can also do some pretty unexpected things if you're not
careful. Insist that they have someone else check the "clean" data for
final approval.

James A. Fortune

Nov 13 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: J Belly | last post by:
Hi, all: I'm a newbie trying to understand the concept of referential integrity and dealing with Primary and Foreign Keys. I'm sure mine is a simple problem... I've created 3 tables as...
4
by: Edward | last post by:
Access 2k -> SQL Server 2k My client has an app that is A2k FE with A2k BE. They have asked me to move the BE to SQL Server. I have a bit of experience with SQL Server, and I'm happy with...
4
by: teddysnips | last post by:
This is a rather abstract question about data design, but I ask it here because a) the database is SQL Server, and b) you're such a learned bunch! Let's assume the classic relation of Customers...
6
by: sam | last post by:
I have to import data into a empty database, that has many tables. some tables have to be inserted first than others due to the foreign keys. How do I find out the order of the tables that I...
1
by: James E | last post by:
I have a question about best practices of how to deal with lookup data from my C# apps. On a couple of occasions I have come across a problem where I have to automate inserting a record into a...
0
by: Jerry | last post by:
Below is ALL the code for all the databases... Here's the problem: I callup the aspx file in IE and the form comes up just fine. When I select a person to update, I get the subject error. ...
1
by: tony.pahl | last post by:
We are converting a data warehouse to a Unicode database to get ready for multilingual support. If we will have 95% of our data in English as we currently do, and less than 5% in other foreign...
3
by: laredotornado | last post by:
Hello, My hosting company has MySQL 5 and so do I on my local Fedora Core 5 Linux machine. However, when I import a dump file from their enviornment into mine, I get this error ERROR 1064...
0
by: David Linsin | last post by:
I created a simple test case to reproduce the problem (also check Bug #15500): import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.Statement;...
6
by: Janick Bernet | last post by:
We have a table (document_folders), which simulates a directory hierarchy, so in this table, each folder has an attribute "TopFolderID" to indicate which folder its contained in (for the root...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.