INNER JOIN using surrogate ID, or [Date] BETWEEN?

DCM Fan

{CREATE TABLEs and INSERTs follow...}

Gents,

I have a main table that is in ONE-MANY with many other tables. For example, if
the main table is named A, there are these realtionships:

A-->B
A-->C
A-->D
A-->E

With one field in Common (Person). The tables B, C, D and E are History tables,
with Start and End dates. Each person has a Program history (table B, ie), an
Experience history (table C, ie), and so on...many differernt types of
histories, and it may grow from here....table F, G, etc.

The included CREATE TABLEs and INSERTs contain tables A, B and C.

The problem: Each tblCase (table A) record has a date. When joining all of the
history tables to tblCase on Person, obviously you get a cross-product of each
history unless you specify a WHERE clause that extracts one single record from
each of the histories (duh...that's the point...to extract a single record from
each history, because there can only be one value in effect at the time of the
Case.)

QUESTION: From a performance standpoint, would it behoove me to maintain the
surrogate ***HistoryID from each history table in tblCase, or, assuming the
indexes are set up properly, would a WHERE condition for each history be
sufficient? For example, the following select works as expected:

SELECT CasePerson, CaseDate, ProCode, ExpYear
FROM tblExperienceHistory INNER JOIN (tblCase INNER JOIN tblProgramHistory ON
tblCase.CasePerson = tblProgramHistory.ProPerson) ON tblCase.CasePerson =
tblExperienceHistory.ExpPerson
WHERE CaseDate BETWEEN ProStartDate and ProEndDate
AND CaseDate BETWEEN ExpStartDate and ExpEndDate

It extracts the single record from each history for each person for each case.
But I'm afraid of performace with such a scenario.

Instead, I could store each ***HistoryID in the table tblCase, and then just
join on that...no WHERE needed. But the trade-off is that I'd have to build
processes to maintain that. ("Hey, when you insert a record into tblCase, make
sure to go get each HistoryID from the History tables!" or "If the user changes
the date ranges in one of histories, make sure to update tblCase to match the
new historyID!")

Maybe a clustered index on each ***History table on Person/StartDate combined
with the WHERE clause should perform as well as a real JOIN on surrogate
integers.

It seems cheesey to have to resort to surrogate IDs...but the performance
increase might be worth it. Also, if I go that route, whenever I add a new
history table, I'd have to change the design of tblCase AND any SPs that
reference it. With the WHERE solution, I'd only have to change the SPs.

Comments are welcome! (tblCase grows at 250,000 records per year; the history
tables will increase about 1000 records per year)

DCMFAN

CREATE TABLE [dbo].[tblCase] (
[CaseID] [char] (5) CONSTRAINT [PK_tblCase] PRIMARY KEY CLUSTERED NOT NULL ,
[CaseDate] [smalldatetime] NOT NULL ,
[CasePerson] [char] (5) NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[tblExperienceHistory] (
[ExperienceHistID] [int] IDENTITY (1, 1) NOT NULL ,
[ExpPerson] [char] (5) NOT NULL ,
[ExpStartDate] [smalldatetime] NOT NULL ,
[ExpEndDate] [smalldatetime] NOT NULL ,
[ExpYear] [int] NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[tblProgramHistory] (
[ProgramHistID] [int] IDENTITY (1, 1) NOT NULL ,
[ProPerson] [char] (5) NOT NULL ,
[ProStartDate] [smalldatetime] NOT NULL ,
[ProEndDate] [smalldatetime] NOT NULL ,
[ProCode] [int] NOT NULL
) ON [PRIMARY]
GO

INSERT INTO [tblCase]([CaseID], [CaseDate], [CasePerson])
VALUES('12345', '3/1/03', '00000')

INSERT INTO [tblCase]([CaseID], [CaseDate], [CasePerson])
VALUES('A1G34', '4/23/03', '00001')

INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],
[ExpYear])
VALUES('00000', '1/1/03', '5/19/03', 1)

INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],
[ExpYear])
VALUES('00000', '5/20/03', '12/31/03', 2)

INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],
[ExpYear])
VALUES('00001', '4/20/03', '11/1/03', 0)

INSERT INTO [tblProgramHistory]([ProPerson], [ProStartDate], [ProEndDate],
[ProCode])
VALUES( '00000', '2/1/03', '9/30/03', '55555')

INSERT INTO [tblProgramHistory]([ProPerson], [ProStartDate], [ProEndDate],
[ProCode])
VALUES( '00000', '10/1/03', '5/1/04', '55555')

INSERT INTO [tblProgramHistory]([ProPerson], [ProStartDate], [ProEndDate],
[ProCode])
VALUES( '00001', '1/1/03', '12/31/03', '55555')

Jul 20 '05 #1

Subscribe Post Reply

8076

Erland Sommarskog

DCM Fan (dc****@aol.comSPNOAM) writes:

With one field in Common (Person). The tables B, C, D and E are History
tables, with Start and End dates. Each person has a Program history
(table B, ie), an Experience history (table C, ie), and so on...many
differernt types of histories, and it may grow from here....table F, G,
etc.

Permit to try to understand the business rules here. Your data model
looks a little bit funny.

The way your table stands, one person can be on several cases? And these
histories are not related to the cases? That is, person 1234 can have
an experience of ZYX from 2002-12-12 to 2003-03-12. Can he also have
an experience of UHG from 2003-02-02 to 2003-08-07, or must intervals
not overlap?

If person 1234 has a case on 2003-02-27, then the experience record
that includes this date should be listed. But if the case is on
2003-08-10, and there is no experience record for this date, the
case should not be listed at all?

Maybe rather than having sparse table with start_date and end_date, you
should have history tables like:

CREATE TABLE experience_history
(person char(5) NOT NULL,
date smalldatetime NOT NULL,
exp_code char(2) NULL,
CONSRAINT pk_exphist PRIMARY KEY (person, date))

The tables will be bigger, and you will have to find a way to fill
them up, but they will be very easy to use, and queries like the
one you had will be swift. And there will not be any surrogate keys.
--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #2

DCM Fan

<<The way your table stands, one person can be on several cases? >>

Certainly...a Case record is created every day of the year for every person.

<<And these histories are not related to the cases>>

Of course they are...sort of...on person! The histories have nothing to do with
the cases, other than the fact that the Person is stored in each. And try not
to think of Cases as "Cases in litigation."

Technically, you're correct in that the Cases are not related to teh history
tables in a RI sense...the true structure has ONE-MANY from a Person table to
each if the histories and the Case table. But the bottom line is that each Case
data record can only map back to one of the MANY history records for each
PERSON, based on Person(ID) and tblCase.DATE

Each Person has a "program history," an "experience history" and a few others.
Each of those histories has Effectvie dates: From date1 to date2, so-and-so is
in program X; they can't overlap. The key need is to extract the ONE record in
each history table at the time of the Case record.

<<But if the case is on
2003-08-10, and there is no experience record for this date, the
case should not be listed at all?>>

That's right! And that should come out in the Audit section of the
application...showing a "gap" in the history for the Case date.

<<
CREATE TABLE experience_history
(person char(5) NOT NULL,
date smalldatetime NOT NULL,
exp_code char(2) NULL,
CONSRAINT pk_exphist PRIMARY KEY (person, date))

Are you suggesting a record for every date? That is, if a person is in exp_code
XX all year there should be 365 records in there for 2003?? That would
certainly ease my main concern (speed), but add a bunch of application code to
maintain them. Hmmm...how would I even show the user what the intervals are
based on that data? I would need some other SET-based grouping mechanism to
keep each interval unique:

CREATE TABLE experience_history
(person char(5) NOT NULL,
date smalldatetime NOT NULL,
exp_code char(2) NULL,
IntervalSet (type) NOT NULL
CONSRAINT pk_exphist PRIMARY KEY (person, date))

That way, I could use a GROUP BY to return the list of intervals back to the
application:

SELECT MIN([date]),MAX([date]), MAX(exp_code)
FROM experience_history
GROUP BY IntervalSet
ORDER BY 1

I'll ponder that one for a little while...THANKS!

DCMFAN

Jul 20 '05 #3

Erland Sommarskog

DCM Fan (dc****@aol.comSPNOAM) writes:

Are you suggesting a record for every date? That is, if a person is in
exp_code XX all year there should be 365 records in there for 2003??
Yes.

This is an extreme denormalization, but simplifies programming. We did
this in our application for currency prices. The table grew large, and
we later had limit which currency pairs we keep prices for. Overall
programming was grossly simplified.
That would certainly ease my main concern (speed), but add a bunch of
application code to maintain them.
A bunch? Nah, some code. The good thing is that you only need that code
in one place, whereas you can benefit from for each new report you need.
Hmmm...how would I even show the user what the intervals are based on
that data? I would need some other SET-based grouping mechanism to keep
each interval unique:

CREATE TABLE experience_history
(person char(5) NOT NULL,
date smalldatetime NOT NULL,
exp_code char(2) NULL,
IntervalSet (type) NOT NULL
CONSRAINT pk_exphist PRIMARY KEY (person, date))

That way, I could use a GROUP BY to return the list of intervals back to
the application:

SELECT MIN([date]),MAX([date]), MAX(exp_code)
FROM experience_history
GROUP BY IntervalSet
ORDER BY 1

Yes, if a person can be in the same code on two distinct intervals. But
maybe rather a running number for each interval the person is in the
exp_code.

SELECT person, exp_code, MIN(date), MAX(date)
FROM experience_history
GROUP BY person, exp_code, exp_code_no
ORDER BY person, 3

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #4

DCM Fan

<<
Yes, if a person can be in the same code on two distinct intervals. But
maybe rather a running number for each interval the person is in the
exp_code.

SELECT person, exp_code, MIN(date), MAX(date)
FROM experience_history
GROUP BY person, exp_code, exp_code_no
ORDER BY person, 3

Right on. Thanks again.

Jul 20 '05 #5

Similar topics

Need help on inner joins.

by: Prem | last post by:

Hi, I am having many problems with inner join. my first problem is : 1) I want to know the precedance while evaluating query with multiple joins. eg. select Employees.FirstName,...

Microsoft SQL Server

112

What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?

by: Andy | last post by:

Hi All! We are doing new development for SQL Server 2000 and also moving from SQL 7.0 to SQL Server 2000. What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000? Please,...

Microsoft SQL Server

help with main query using two inner joined subqueries

by: psuaudi | last post by:

I have a main query that I would like to call two different subqueries. In MS Access, I usually just save the two subqueries as separate queries which are then called by a third separate and main...

Microsoft Access / VBA

Does Inner Join work in Access????

by: chaitanya02 | last post by:

I have problem joining four tables using Inner Join in ACCESS DB, I want to join 4 tables: 1, 2, 3 , 4 ....and query data in all these to see, i get a right url associated. sql1-account id,...

ASP / Active Server Pages

Performance between Standard Join and Inner Join

by: Chamnap | last post by:

Hello, everyone I have one question about the standard join and inner join, which one is faster and more reliable? Can you recommend me to use? Please, explain me... Thanks Chamnap

Microsoft SQL Server

Inner Join question!

by: Ironr4ge | last post by:

Hi everyone, I was wondering whether is was possible to select only the filtered records of one form with Recordsource = "C" and open a diffrent form with recordsource "L" . The way I was...

Microsoft Access / VBA

Help With INNER JOINS

by: stanlew | last post by:

Happy New Year everyone! I'm new to both T-SQL and this forum. I'm currently doing an internship and my first task was to create a small program which will send an email detailing the sales of the...

Microsoft SQL Server

difference between explicit inner join and implicit

by: YZXIA | last post by:

Is there any difference between explicit inner join and implicit inner join Example of an explicit inner join: SELECT * FROM employee INNER JOIN department ON employee.DepartmentID =...

Microsoft SQL Server

INNER JOIN and MAX between two tables

by: ewokspy | last post by:

I have 2 tables that I need to use an INNER JOIN on. tblEMPLOYEES has FIRSTNAME, LASTNAME, EMPILID, DATETRAINED, DEPT tblMAIN has EMPILID, OBSERVATIONDATE In tblMAIN, the EMPILID is not the...

Microsoft Access / VBA

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server