By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,934 Members | 1,527 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,934 IT Pros & Developers. It's quick & easy.

INNER JOIN using surrogate ID, or [Date] BETWEEN?

P: n/a
{CREATE TABLEs and INSERTs follow...}

Gents,

I have a main table that is in ONE-MANY with many other tables. For example, if
the main table is named A, there are these realtionships:

A-->B
A-->C
A-->D
A-->E

With one field in Common (Person). The tables B, C, D and E are History tables,
with Start and End dates. Each person has a Program history (table B, ie), an
Experience history (table C, ie), and so on...many differernt types of
histories, and it may grow from here....table F, G, etc.

The included CREATE TABLEs and INSERTs contain tables A, B and C.

The problem: Each tblCase (table A) record has a date. When joining all of the
history tables to tblCase on Person, obviously you get a cross-product of each
history unless you specify a WHERE clause that extracts one single record from
each of the histories (duh...that's the point...to extract a single record from
each history, because there can only be one value in effect at the time of the
Case.)

QUESTION: From a performance standpoint, would it behoove me to maintain the
surrogate ***HistoryID from each history table in tblCase, or, assuming the
indexes are set up properly, would a WHERE condition for each history be
sufficient? For example, the following select works as expected:

SELECT CasePerson, CaseDate, ProCode, ExpYear
FROM tblExperienceHistory INNER JOIN (tblCase INNER JOIN tblProgramHistory ON
tblCase.CasePerson = tblProgramHistory.ProPerson) ON tblCase.CasePerson =
tblExperienceHistory.ExpPerson
WHERE CaseDate BETWEEN ProStartDate and ProEndDate
AND CaseDate BETWEEN ExpStartDate and ExpEndDate

It extracts the single record from each history for each person for each case.
But I'm afraid of performace with such a scenario.

Instead, I could store each ***HistoryID in the table tblCase, and then just
join on that...no WHERE needed. But the trade-off is that I'd have to build
processes to maintain that. ("Hey, when you insert a record into tblCase, make
sure to go get each HistoryID from the History tables!" or "If the user changes
the date ranges in one of histories, make sure to update tblCase to match the
new historyID!")

Maybe a clustered index on each ***History table on Person/StartDate combined
with the WHERE clause should perform as well as a real JOIN on surrogate
integers.

It seems cheesey to have to resort to surrogate IDs...but the performance
increase might be worth it. Also, if I go that route, whenever I add a new
history table, I'd have to change the design of tblCase AND any SPs that
reference it. With the WHERE solution, I'd only have to change the SPs.

Comments are welcome! (tblCase grows at 250,000 records per year; the history
tables will increase about 1000 records per year)

DCMFAN

CREATE TABLE [dbo].[tblCase] (
[CaseID] [char] (5) CONSTRAINT [PK_tblCase] PRIMARY KEY CLUSTERED NOT NULL ,
[CaseDate] [smalldatetime] NOT NULL ,
[CasePerson] [char] (5) NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[tblExperienceHistory] (
[ExperienceHistID] [int] IDENTITY (1, 1) NOT NULL ,
[ExpPerson] [char] (5) NOT NULL ,
[ExpStartDate] [smalldatetime] NOT NULL ,
[ExpEndDate] [smalldatetime] NOT NULL ,
[ExpYear] [int] NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[tblProgramHistory] (
[ProgramHistID] [int] IDENTITY (1, 1) NOT NULL ,
[ProPerson] [char] (5) NOT NULL ,
[ProStartDate] [smalldatetime] NOT NULL ,
[ProEndDate] [smalldatetime] NOT NULL ,
[ProCode] [int] NOT NULL
) ON [PRIMARY]
GO

INSERT INTO [tblCase]([CaseID], [CaseDate], [CasePerson])
VALUES('12345', '3/1/03', '00000')

INSERT INTO [tblCase]([CaseID], [CaseDate], [CasePerson])
VALUES('A1G34', '4/23/03', '00001')

INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],
[ExpYear])
VALUES('00000', '1/1/03', '5/19/03', 1)

INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],
[ExpYear])
VALUES('00000', '5/20/03', '12/31/03', 2)

INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],
[ExpYear])
VALUES('00001', '4/20/03', '11/1/03', 0)

INSERT INTO [tblProgramHistory]([ProPerson], [ProStartDate], [ProEndDate],
[ProCode])
VALUES( '00000', '2/1/03', '9/30/03', '55555')

INSERT INTO [tblProgramHistory]([ProPerson], [ProStartDate], [ProEndDate],
[ProCode])
VALUES( '00000', '10/1/03', '5/1/04', '55555')

INSERT INTO [tblProgramHistory]([ProPerson], [ProStartDate], [ProEndDate],
[ProCode])
VALUES( '00001', '1/1/03', '12/31/03', '55555')

Jul 20 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
DCM Fan (dc****@aol.comSPNOAM) writes:
With one field in Common (Person). The tables B, C, D and E are History
tables, with Start and End dates. Each person has a Program history
(table B, ie), an Experience history (table C, ie), and so on...many
differernt types of histories, and it may grow from here....table F, G,
etc.


Permit to try to understand the business rules here. Your data model
looks a little bit funny.

The way your table stands, one person can be on several cases? And these
histories are not related to the cases? That is, person 1234 can have
an experience of ZYX from 2002-12-12 to 2003-03-12. Can he also have
an experience of UHG from 2003-02-02 to 2003-08-07, or must intervals
not overlap?

If person 1234 has a case on 2003-02-27, then the experience record
that includes this date should be listed. But if the case is on
2003-08-10, and there is no experience record for this date, the
case should not be listed at all?

Maybe rather than having sparse table with start_date and end_date, you
should have history tables like:

CREATE TABLE experience_history
(person char(5) NOT NULL,
date smalldatetime NOT NULL,
exp_code char(2) NULL,
CONSRAINT pk_exphist PRIMARY KEY (person, date))

The tables will be bigger, and you will have to find a way to fill
them up, but they will be very easy to use, and queries like the
one you had will be swift. And there will not be any surrogate keys.
--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #2

P: n/a
<<The way your table stands, one person can be on several cases? >>

Certainly...a Case record is created every day of the year for every person.

<<And these histories are not related to the cases>>

Of course they are...sort of...on person! The histories have nothing to do with
the cases, other than the fact that the Person is stored in each. And try not
to think of Cases as "Cases in litigation."

Technically, you're correct in that the Cases are not related to teh history
tables in a RI sense...the true structure has ONE-MANY from a Person table to
each if the histories and the Case table. But the bottom line is that each Case
data record can only map back to one of the MANY history records for each
PERSON, based on Person(ID) and tblCase.DATE

Each Person has a "program history," an "experience history" and a few others.
Each of those histories has Effectvie dates: From date1 to date2, so-and-so is
in program X; they can't overlap. The key need is to extract the ONE record in
each history table at the time of the Case record.

<<But if the case is on
2003-08-10, and there is no experience record for this date, the
case should not be listed at all?>>

That's right! And that should come out in the Audit section of the
application...showing a "gap" in the history for the Case date.

<<
CREATE TABLE experience_history
(person char(5) NOT NULL,
date smalldatetime NOT NULL,
exp_code char(2) NULL,
CONSRAINT pk_exphist PRIMARY KEY (person, date))


Are you suggesting a record for every date? That is, if a person is in exp_code
XX all year there should be 365 records in there for 2003?? That would
certainly ease my main concern (speed), but add a bunch of application code to
maintain them. Hmmm...how would I even show the user what the intervals are
based on that data? I would need some other SET-based grouping mechanism to
keep each interval unique:

CREATE TABLE experience_history
(person char(5) NOT NULL,
date smalldatetime NOT NULL,
exp_code char(2) NULL,
IntervalSet (type) NOT NULL
CONSRAINT pk_exphist PRIMARY KEY (person, date))

That way, I could use a GROUP BY to return the list of intervals back to the
application:

SELECT MIN([date]),MAX([date]), MAX(exp_code)
FROM experience_history
GROUP BY IntervalSet
ORDER BY 1

I'll ponder that one for a little while...THANKS!

DCMFAN
Jul 20 '05 #3

P: n/a
DCM Fan (dc****@aol.comSPNOAM) writes:
Are you suggesting a record for every date? That is, if a person is in
exp_code XX all year there should be 365 records in there for 2003??
Yes.

This is an extreme denormalization, but simplifies programming. We did
this in our application for currency prices. The table grew large, and
we later had limit which currency pairs we keep prices for. Overall
programming was grossly simplified.
That would certainly ease my main concern (speed), but add a bunch of
application code to maintain them.
A bunch? Nah, some code. The good thing is that you only need that code
in one place, whereas you can benefit from for each new report you need.
Hmmm...how would I even show the user what the intervals are based on
that data? I would need some other SET-based grouping mechanism to keep
each interval unique:

CREATE TABLE experience_history
(person char(5) NOT NULL,
date smalldatetime NOT NULL,
exp_code char(2) NULL,
IntervalSet (type) NOT NULL
CONSRAINT pk_exphist PRIMARY KEY (person, date))

That way, I could use a GROUP BY to return the list of intervals back to
the application:

SELECT MIN([date]),MAX([date]), MAX(exp_code)
FROM experience_history
GROUP BY IntervalSet
ORDER BY 1


Yes, if a person can be in the same code on two distinct intervals. But
maybe rather a running number for each interval the person is in the
exp_code.

SELECT person, exp_code, MIN(date), MAX(date)
FROM experience_history
GROUP BY person, exp_code, exp_code_no
ORDER BY person, 3

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #4

P: n/a
<<
Yes, if a person can be in the same code on two distinct intervals. But
maybe rather a running number for each interval the person is in the
exp_code.

SELECT person, exp_code, MIN(date), MAX(date)
FROM experience_history
GROUP BY person, exp_code, exp_code_no
ORDER BY person, 3


Right on. Thanks again.
Jul 20 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.