473,399 Members | 2,858 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

INNER JOIN using surrogate ID, or [Date] BETWEEN?

{CREATE TABLEs and INSERTs follow...}

Gents,

I have a main table that is in ONE-MANY with many other tables. For example, if
the main table is named A, there are these realtionships:

A-->B
A-->C
A-->D
A-->E

With one field in Common (Person). The tables B, C, D and E are History tables,
with Start and End dates. Each person has a Program history (table B, ie), an
Experience history (table C, ie), and so on...many differernt types of
histories, and it may grow from here....table F, G, etc.

The included CREATE TABLEs and INSERTs contain tables A, B and C.

The problem: Each tblCase (table A) record has a date. When joining all of the
history tables to tblCase on Person, obviously you get a cross-product of each
history unless you specify a WHERE clause that extracts one single record from
each of the histories (duh...that's the point...to extract a single record from
each history, because there can only be one value in effect at the time of the
Case.)

QUESTION: From a performance standpoint, would it behoove me to maintain the
surrogate ***HistoryID from each history table in tblCase, or, assuming the
indexes are set up properly, would a WHERE condition for each history be
sufficient? For example, the following select works as expected:

SELECT CasePerson, CaseDate, ProCode, ExpYear
FROM tblExperienceHistory INNER JOIN (tblCase INNER JOIN tblProgramHistory ON
tblCase.CasePerson = tblProgramHistory.ProPerson) ON tblCase.CasePerson =
tblExperienceHistory.ExpPerson
WHERE CaseDate BETWEEN ProStartDate and ProEndDate
AND CaseDate BETWEEN ExpStartDate and ExpEndDate

It extracts the single record from each history for each person for each case.
But I'm afraid of performace with such a scenario.

Instead, I could store each ***HistoryID in the table tblCase, and then just
join on that...no WHERE needed. But the trade-off is that I'd have to build
processes to maintain that. ("Hey, when you insert a record into tblCase, make
sure to go get each HistoryID from the History tables!" or "If the user changes
the date ranges in one of histories, make sure to update tblCase to match the
new historyID!")

Maybe a clustered index on each ***History table on Person/StartDate combined
with the WHERE clause should perform as well as a real JOIN on surrogate
integers.

It seems cheesey to have to resort to surrogate IDs...but the performance
increase might be worth it. Also, if I go that route, whenever I add a new
history table, I'd have to change the design of tblCase AND any SPs that
reference it. With the WHERE solution, I'd only have to change the SPs.

Comments are welcome! (tblCase grows at 250,000 records per year; the history
tables will increase about 1000 records per year)

DCMFAN

CREATE TABLE [dbo].[tblCase] (
[CaseID] [char] (5) CONSTRAINT [PK_tblCase] PRIMARY KEY CLUSTERED NOT NULL ,
[CaseDate] [smalldatetime] NOT NULL ,
[CasePerson] [char] (5) NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[tblExperienceHistory] (
[ExperienceHistID] [int] IDENTITY (1, 1) NOT NULL ,
[ExpPerson] [char] (5) NOT NULL ,
[ExpStartDate] [smalldatetime] NOT NULL ,
[ExpEndDate] [smalldatetime] NOT NULL ,
[ExpYear] [int] NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[tblProgramHistory] (
[ProgramHistID] [int] IDENTITY (1, 1) NOT NULL ,
[ProPerson] [char] (5) NOT NULL ,
[ProStartDate] [smalldatetime] NOT NULL ,
[ProEndDate] [smalldatetime] NOT NULL ,
[ProCode] [int] NOT NULL
) ON [PRIMARY]
GO

INSERT INTO [tblCase]([CaseID], [CaseDate], [CasePerson])
VALUES('12345', '3/1/03', '00000')

INSERT INTO [tblCase]([CaseID], [CaseDate], [CasePerson])
VALUES('A1G34', '4/23/03', '00001')

INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],
[ExpYear])
VALUES('00000', '1/1/03', '5/19/03', 1)

INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],
[ExpYear])
VALUES('00000', '5/20/03', '12/31/03', 2)

INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],
[ExpYear])
VALUES('00001', '4/20/03', '11/1/03', 0)

INSERT INTO [tblProgramHistory]([ProPerson], [ProStartDate], [ProEndDate],
[ProCode])
VALUES( '00000', '2/1/03', '9/30/03', '55555')

INSERT INTO [tblProgramHistory]([ProPerson], [ProStartDate], [ProEndDate],
[ProCode])
VALUES( '00000', '10/1/03', '5/1/04', '55555')

INSERT INTO [tblProgramHistory]([ProPerson], [ProStartDate], [ProEndDate],
[ProCode])
VALUES( '00001', '1/1/03', '12/31/03', '55555')

Jul 20 '05 #1
4 8076
DCM Fan (dc****@aol.comSPNOAM) writes:
With one field in Common (Person). The tables B, C, D and E are History
tables, with Start and End dates. Each person has a Program history
(table B, ie), an Experience history (table C, ie), and so on...many
differernt types of histories, and it may grow from here....table F, G,
etc.


Permit to try to understand the business rules here. Your data model
looks a little bit funny.

The way your table stands, one person can be on several cases? And these
histories are not related to the cases? That is, person 1234 can have
an experience of ZYX from 2002-12-12 to 2003-03-12. Can he also have
an experience of UHG from 2003-02-02 to 2003-08-07, or must intervals
not overlap?

If person 1234 has a case on 2003-02-27, then the experience record
that includes this date should be listed. But if the case is on
2003-08-10, and there is no experience record for this date, the
case should not be listed at all?

Maybe rather than having sparse table with start_date and end_date, you
should have history tables like:

CREATE TABLE experience_history
(person char(5) NOT NULL,
date smalldatetime NOT NULL,
exp_code char(2) NULL,
CONSRAINT pk_exphist PRIMARY KEY (person, date))

The tables will be bigger, and you will have to find a way to fill
them up, but they will be very easy to use, and queries like the
one you had will be swift. And there will not be any surrogate keys.
--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #2
<<The way your table stands, one person can be on several cases? >>

Certainly...a Case record is created every day of the year for every person.

<<And these histories are not related to the cases>>

Of course they are...sort of...on person! The histories have nothing to do with
the cases, other than the fact that the Person is stored in each. And try not
to think of Cases as "Cases in litigation."

Technically, you're correct in that the Cases are not related to teh history
tables in a RI sense...the true structure has ONE-MANY from a Person table to
each if the histories and the Case table. But the bottom line is that each Case
data record can only map back to one of the MANY history records for each
PERSON, based on Person(ID) and tblCase.DATE

Each Person has a "program history," an "experience history" and a few others.
Each of those histories has Effectvie dates: From date1 to date2, so-and-so is
in program X; they can't overlap. The key need is to extract the ONE record in
each history table at the time of the Case record.

<<But if the case is on
2003-08-10, and there is no experience record for this date, the
case should not be listed at all?>>

That's right! And that should come out in the Audit section of the
application...showing a "gap" in the history for the Case date.

<<
CREATE TABLE experience_history
(person char(5) NOT NULL,
date smalldatetime NOT NULL,
exp_code char(2) NULL,
CONSRAINT pk_exphist PRIMARY KEY (person, date))


Are you suggesting a record for every date? That is, if a person is in exp_code
XX all year there should be 365 records in there for 2003?? That would
certainly ease my main concern (speed), but add a bunch of application code to
maintain them. Hmmm...how would I even show the user what the intervals are
based on that data? I would need some other SET-based grouping mechanism to
keep each interval unique:

CREATE TABLE experience_history
(person char(5) NOT NULL,
date smalldatetime NOT NULL,
exp_code char(2) NULL,
IntervalSet (type) NOT NULL
CONSRAINT pk_exphist PRIMARY KEY (person, date))

That way, I could use a GROUP BY to return the list of intervals back to the
application:

SELECT MIN([date]),MAX([date]), MAX(exp_code)
FROM experience_history
GROUP BY IntervalSet
ORDER BY 1

I'll ponder that one for a little while...THANKS!

DCMFAN
Jul 20 '05 #3
DCM Fan (dc****@aol.comSPNOAM) writes:
Are you suggesting a record for every date? That is, if a person is in
exp_code XX all year there should be 365 records in there for 2003??
Yes.

This is an extreme denormalization, but simplifies programming. We did
this in our application for currency prices. The table grew large, and
we later had limit which currency pairs we keep prices for. Overall
programming was grossly simplified.
That would certainly ease my main concern (speed), but add a bunch of
application code to maintain them.
A bunch? Nah, some code. The good thing is that you only need that code
in one place, whereas you can benefit from for each new report you need.
Hmmm...how would I even show the user what the intervals are based on
that data? I would need some other SET-based grouping mechanism to keep
each interval unique:

CREATE TABLE experience_history
(person char(5) NOT NULL,
date smalldatetime NOT NULL,
exp_code char(2) NULL,
IntervalSet (type) NOT NULL
CONSRAINT pk_exphist PRIMARY KEY (person, date))

That way, I could use a GROUP BY to return the list of intervals back to
the application:

SELECT MIN([date]),MAX([date]), MAX(exp_code)
FROM experience_history
GROUP BY IntervalSet
ORDER BY 1


Yes, if a person can be in the same code on two distinct intervals. But
maybe rather a running number for each interval the person is in the
exp_code.

SELECT person, exp_code, MIN(date), MAX(date)
FROM experience_history
GROUP BY person, exp_code, exp_code_no
ORDER BY person, 3

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #4
<<
Yes, if a person can be in the same code on two distinct intervals. But
maybe rather a running number for each interval the person is in the
exp_code.

SELECT person, exp_code, MIN(date), MAX(date)
FROM experience_history
GROUP BY person, exp_code, exp_code_no
ORDER BY person, 3


Right on. Thanks again.
Jul 20 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Prem | last post by:
Hi, I am having many problems with inner join. my first problem is : 1) I want to know the precedance while evaluating query with multiple joins. eg. select Employees.FirstName,...
112
by: Andy | last post by:
Hi All! We are doing new development for SQL Server 2000 and also moving from SQL 7.0 to SQL Server 2000. What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000? Please,...
2
by: psuaudi | last post by:
I have a main query that I would like to call two different subqueries. In MS Access, I usually just save the two subqueries as separate queries which are then called by a third separate and main...
3
by: chaitanya02 | last post by:
I have problem joining four tables using Inner Join in ACCESS DB, I want to join 4 tables: 1, 2, 3 , 4 ....and query data in all these to see, i get a right url associated. sql1-account id,...
12
by: Chamnap | last post by:
Hello, everyone I have one question about the standard join and inner join, which one is faster and more reliable? Can you recommend me to use? Please, explain me... Thanks Chamnap
0
by: Ironr4ge | last post by:
Hi everyone, I was wondering whether is was possible to select only the filtered records of one form with Recordsource = "C" and open a diffrent form with recordsource "L" . The way I was...
0
by: stanlew | last post by:
Happy New Year everyone! I'm new to both T-SQL and this forum. I'm currently doing an internship and my first task was to create a small program which will send an email detailing the sales of the...
11
by: YZXIA | last post by:
Is there any difference between explicit inner join and implicit inner join Example of an explicit inner join: SELECT * FROM employee INNER JOIN department ON employee.DepartmentID =...
1
by: ewokspy | last post by:
I have 2 tables that I need to use an INNER JOIN on. tblEMPLOYEES has FIRSTNAME, LASTNAME, EMPILID, DATETRAINED, DEPT tblMAIN has EMPILID, OBSERVATIONDATE In tblMAIN, the EMPILID is not the...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.