473,326 Members | 2,337 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Joins: strategy and how-to approach

My SQL acumen stems from just a couple courses, and everything since from
the trenches. Fun + angst over time.

I'm needing some advice on joins. Though I understand the basics, I'm
having problems abstracting from instances where it's easy to think about
discrete key values (. . . and studentid = 1234) to entire sets of users,
with the joins doing their work.

For example, currently I'm going nuts trying to return dates for which
attendance has not been taken for students, but should have been. Students
have active and inactive periods of enrollment in our schools, so we have a
history table of when they were active and inactive -- as well as two more
tables that layer other bounds on eligible dates (what range of dates fall
within a given school's term? What of holidays and staff institute days?).
I also have a populated calendar table, and a table where students are
identified. Finally, there's a site history table which is a REAL pain in
the butt for me to think about.
CREATE TABLE Student (
StudentID int IDENTITY(1,1) NOT NULL,
CurrentStatus varchar(2) NOT NULL)

CREATE TABLE Calendar (
Dateid int NOT NULL,
Date datetime NULL,
Workday bit NULL )

CREATE TABLE DailyAttendance (
StudentID int NOT NULL,
AttendanceDate datetime NOT NULL,
SiteID varchar(6) NOT NULL,
Attend_Status varchar(2) NOT NULL)

(the last field is, e.g., present or absent)

CREATE TABLE StudentActivityHistory (
StudentID int NOT NULL,
StatusStartDate datetime NOT NULL,
StatusEndDate datetime NULL,
Activity_Status varchar(2) NULL,
StudentStatusHistoryID int IDENTITY(1,1) NOT NULL)

(the activity_status is either A or I; the important records in this table
are the 'A' records. A student's most recent status record always has an
end date of '12/31/9999 12:00:00 AM', whether that's an A or I record. No
dates not between start/end dates of students' A records would need
attendance taken. students may have many periods of activity -- A records
-- as well as many inactive periods.)

CREATE TABLE SiteTerms (
SiteID varchar(6) NOT NULL,
Term varchar(3) NOT NULL,
StartOfTerm datetime NOT NULL,
Quarter varchar(2) NOT NULL,
SchoolYear varchar(9) NOT NULL,
EndOfTerm datetime NOT NULL)

(different schools vary their term start and end dates. No dates not
between term start and end dates would need attendance taken by students
assigned to and active in that school during that period.)

CREATE TABLE SiteExceptionDays (
SiteID varchar(6) NOT NULL,
SchoolDayStartTime datetime NOT NULL,
SchoolDayEndTime datetime NOT NULL,
SchoolDayType varchar(2) NOT NULL)

(there are two kinds of days -- partial attendance, and no attendance. In
short, if the type of day is "N" no attendance needs to be taken for
students assigned to that school and active on that day)

CREATE TABLE StudentSiteHistory (
StudentID int NOT NULL,
SiteStartDate datetime NOT NULL,
SiteID varchar(6) NOT NULL,
SiteEndDate datetime NULL,
StudentSiteHistoryID int IDENTITY(1,1) NOT NULL)

(Pain. The attendance table tells which site a student was assigned when
attendance was taken. To find which school a student was a assigned to on
days attendance was NOT taken, this table's implicated 'cause it's the only
way of connecting everything else together)

Dangitall, I know this can be done but I've beat my head against the wall.
Due diligence has gotten me a headache and a hankerin' for whiskey, and I'm
not much of a drinker. Is there anyone in the group for whom this kind of
thing is a no-brainer? I'd just as soon get some tips on how to approach
this kind of thing, and figure it out myself with some guidance.

Any takers? Gotta run, dang I'm late for something.

TIA

--

Scott
Oct 5 '05 #1
4 1404
On Wed, 5 Oct 2005 17:40:58 -0500, Scott Marquardt wrote:
My SQL acumen stems from just a couple courses, and everything since from
the trenches. Fun + angst over time.

I'm needing some advice on joins. Though I understand the basics, I'm
having problems abstracting from instances where it's easy to think about
discrete key values (. . . and studentid = 1234) to entire sets of users,
with the joins doing their work.

For example, currently I'm going nuts trying to return dates for which
attendance has not been taken for students, but should have been. Students
have active and inactive periods of enrollment in our schools, so we have a
history table of when they were active and inactive -- as well as two more
tables that layer other bounds on eligible dates (what range of dates fall
within a given school's term? What of holidays and staff institute days?).
I also have a populated calendar table, and a table where students are
identified. Finally, there's a site history table which is a REAL pain in
the butt for me to think about.
(snip CREATE TABLE statements)
Dangitall, I know this can be done but I've beat my head against the wall.
Due diligence has gotten me a headache and a hankerin' for whiskey, and I'm
not much of a drinker. Is there anyone in the group for whom this kind of
thing is a no-brainer?


Hi Scott,

Are you sure that the headache is not caused by the whiskey? Or by
banging your head against the wall? (Next time, leave more room between
you and the wall when you're head-banging. Or switch from heavy metal to
something more placid - Mozart, perhaps?)

Anyway, I don't think this is a no-brainer for anyone. The number of
tables involved and the way they are related make this a tough one. And
the absence of constraints in the DDL, of INSERT statements with sample
data and expected output didn't help either (hint, hint).

Here's my attempt. It's untested. If it doesn't work as expected, then
please read www.aspfaq.com/5006 and follow the instructions carefully.

SELECT s.StudentID, c.Date
FROM Student AS s
INNER JOIN StudentActivityHistory AS sah
ON sah.StudentID = s.StudentID
AND sah.Activity_Status = 'A'
INNER JOIN Calendar AS c
ON c.Date BETWEEN sah.StatusStartDate AND sah.StatusEndDate
INNER JOIN StudentSiteHistory AS ssh
ON ssh.StudentID = s.StudentID
AND c.Date BETWEEN ssh.SiteStartDate AND ssh.SiteEndDate
INNER JOIN SiteTerms AS st
ON st.SiteID = ssh.SiteID
AND c.Date BETWEEN st.StartOfTerm AND st.EndOfTerm
WHERE NOT EXISTS
(SELECT *
FROM DailyAttendance AS da
WHERE da.StudentID = s.StudentID
AND da.AttendanceDate = c.Date)
AND NOT EXISTS
(SELECT *
FROM SiteExceptionDays AS sed
WHERE sed.SiteID = ssh.SiteID
AND sed.SchoolDayStartTime >= c.Date
AND sed.SchoolDayStartTime < DATEADD(day, 1, c.Date)
AND sed.SchoolDayType = 'N')

(Note: You can also convert the NOT EXISTS subqueries to LEFT OUTER
JOINS. Use this generic approach:

FROM a WHERE NOT EXISTS (SELECT * FROM b WHERE b.x = a.x)
is equivalent to
FROM a LEFT OUTER JOIN b ON b.x = a.x WHERE b.x IS NULL

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Oct 7 '05 #2
> I'm [..] trying to return dates for which attendance has not been taken for students, but should have been.

First, let's try to get the dates where attendance should have been
taken. For this step, we are only considering the StudentSiteHistory
table, the student's CurrentStatus (I'm assuming 'A' for active) and
the working days from the Calendar table:

SELECT SSH.StudentID, SSH.SiteID, C.Date
FROM StudentSiteHistory SSH
INNER JOIN Student S ON S.StudentID=SSH.StudentID
INNER JOIN Calendar C
ON C.Date BETWEEN SSH.SiteStartDate AND SSH.SiteEndDate
WHERE S.CurrentStatus='A' AND C.Workday<>0

Now, let's add the SiteTerms, SiteExceptionDays and
StudentActivityHistory tables:

SELECT SSH.StudentID, SSH.SiteID, C.Date
FROM StudentSiteHistory SSH
INNER JOIN Student S ON S.StudentID=SSH.StudentID
INNER JOIN Calendar C
ON C.Date BETWEEN SSH.SiteStartDate AND SSH.SiteEndDate
WHERE S.CurrentStatus='A' AND C.Workday<>0
AND EXISTS (
SELECT * FROM SiteTerms ST
WHERE ST.SiteID=SSH.SiteID
AND C.Date BETWEEN ST.StartOfTerm AND ST.EndOfTerm
) AND NOT EXISTS (
SELECT * FROM SiteExceptionDays SED
WHERE SED.SiteID=SSH.SiteID AND SED.SchoolDayType='N'
AND SED.SchoolDayStartTime>=C.Date
AND SED.SchoolDayStartTime<C.Date+1
) AND EXISTS (
SELECT * FROM StudentActivityHistory SAH
WHERE SAH.StudentID=SSH.StudentID AND SAH.Activity_Status='A'
AND C.Date BETWEEN SAH.StatusStartDate AND SAH.StatusEndDate
)

I have used subqueries to add these conditions, mainly because they are
more intuitive. I believe I could have used INNER JOIN-s instead of
EXISTS subqueries (adding a DISTINCT keyword or a GROUP BY), but
performance would have been worse.

In the SiteExceptionDays table, I'm assuming that the
SchoolDayStartTime and SchoolDayEndTime would store the date and the
time (unlike all the other datetime columns, which store only the date,
with a time of 0:00:00). Also I'm assuming that SchoolDayStartTime and
SchoolDayEndTime are always in the same day (for each row). This should
be verified with a CHECK CONSTRAINT (and it would have been useful if
you also provided all the constraints for these tables: primary keys,
foreign keys, unique constraints, check constraints).

Now, we only need to filter the days which have no attendance. Of
course, we can use a subquery (like above), but this time let's use a
LEFT JOIN:

SELECT X.StudentID, X.SiteID, X.Date
FROM (
SELECT SSH.StudentID, SSH.SiteID, C.Date
FROM StudentSiteHistory SSH
INNER JOIN Student S ON S.StudentID=SSH.StudentID
INNER JOIN Calendar C
ON C.Date BETWEEN SSH.SiteStartDate AND SSH.SiteEndDate
WHERE S.CurrentStatus='A' AND C.Workday<>0
AND EXISTS (
SELECT * FROM SiteTerms ST
WHERE ST.SiteID=SSH.SiteID
AND C.Date BETWEEN ST.StartOfTerm AND ST.EndOfTerm
) AND NOT EXISTS (
SELECT * FROM SiteExceptionDays SED
WHERE SED.SiteID=SSH.SiteID AND SED.SchoolDayType='N'
AND SED.SchoolDayStartTime>=C.Date
AND SED.SchoolDayStartTime<C.Date+1
) AND EXISTS (
SELECT * FROM StudentActivityHistory SAH
WHERE SAH.StudentID=SSH.StudentID AND SAH.Activity_Status='A'
AND C.Date BETWEEN SAH.StatusStartDate AND SAH.StatusEndDate
)
) X LEFT JOIN (
SELECT StudentID, SiteID, AttendanceDate
FROM DailyAttendance
WHERE Attend_Status='P'
) Y
ON X.StudentID=Y.StudentID
AND X.SiteID=Y.SiteID
AND X.Date=Y.AttendanceDate
WHERE Y.StudentID IS NULL

After I wrote these queries, I also saw Hugo's response and I was
amazed (again) how similar our queries are.

Razvan

Oct 8 '05 #3
You have tables without keys, use BIT and IDENTITY. Can you explain
what the heck a date_id would mean? Why do you think that a date is
not unique in itself? You also keep saying "fields" and
"records" so we know that you are designing a file system and not
an RDBMS.
A student's most recent status record [sic] always has an end date of '12/31/9999 12:00:00 AM' <<


This is a really bad idea for a lot of reasons. Ignoring the use of
"record", a status is an attribute and not an entity. Cleaning up just
a little, we have:

CREATE TABLE Calendar -- 10-20 years is usually enough
(cal_date DATETIME NOT NULL PRIMARY KEY,
date_type CHAR(1) NOT NULL);

CREATE TABLE Students
(student_id INTEGER NOT NULL PRIMARY KEY,
student_status CHAR(2) NOT NULL);

CREATE TABLE DailyAttendance
(student_id INTEGER NOT NULL
REFERENCES Students(student_id),
attendance_date DATETIME NOT NULL,
site_id VARCHAR(6) NOT NULL,
attend_code CHAR(2) NOT NULL);

History tables are keyed with the entity and the start date of an
event, not with IDENTITY unless you meant to destroy data integrity.

CREATE TABLE StudentActivityHistory
(student_id INTEGER NOT NULL,
status_startdate DATETIME NOT NULL,
status_enddate DATETIME,
activity_status CHAR(2) NOT NULL,
PRIMARY KEY (student_id, status_startdate))

CREATE TABLE StudentSiteHistory
(student_id INTEGER NOT NULL,
site_id VARCHAR(6) NOT NULL,
site_startdate DATETIME NOT NULL,
site_enddate DATETIME,
PRIMARY KEY (student_id, site_startdate));

Why is there not a general table like this instead?

CREATE TABLE StudentHistory
(student_id INTEGER NOT NULL,
startdate DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
enddate DATETIME, -- null means current
CHECK (startdate < enddate),
activity_code CHAR(2) NOT NULL
CHECK (activity_code IN (..)),
site_id VARCHAR(6) NOT NULL
REFERENCES Sites(site_id)
ON UPDATE CASCADE,
PRIMARY KEY (student_id, startdate));

Now your question is easy -- you have an event anchored in both time
and space, like it should be. And the DailyAttendance table is
redundant; the current day has (enddate IS NULL).

Oct 8 '05 #4
--CELKO-- opined thusly on Oct 8:
You have tables without keys, use BIT and IDENTITY. Can you explain
what the heck a date_id would mean?


Probably, but I won't bother. What I posted is what I inherited, not what I
designed. That said, with respect to the remaining information I'd probably
do little better myself -- or possibly worse. Bear in mind my opening
disclaimer in the post.

I can't very well turn the rudder on a rather intractable ship (the app is
pretty embedded just now, so I'm not at liberty to make the changes you
suggest). I'll try the other respondents' ideas with what I actually have,
and (for the second time in several weeks) post my progress/result. Thanks
to all.

Just now, though, I go a-bed. Dang I'm tired.

- Scott
Oct 9 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Ralph Freshour | last post by:
I am having a hard time with joins - my following code displays: ..member_name .gender instead of the actual data - I've been reading through my PHP and MySQL manuals - the MySQL manual tells...
2
by: raulgz | last post by:
Hi I work ith sql server 2000 and i need know the diferent of joins in format not ansi ( with * ) and joins in format ansi ( with 'outher join on' ). Two format work equal ???
3
by: jakelake | last post by:
Help....I have a DB I'm working with that I know doesn't work with the ANSI-92 JOIN SYNTAX....I'm not sure how much this limits my ability to deal with the following situation, so I'm soliciting...
4
by: michaelnewport | last post by:
Greetings, I like to write my inner joins as below, but someone at work tells me its not as 'performant' as using the 'inner join' statement. Is this true ? Is there a better way to write it...
0
by: DG | last post by:
Can all "joins" be mimicked via nested selects?
3
by: orekinbck | last post by:
Hi There What are the best value practice exam engines for 70-316? How does this strategy sound: * First iteration - Go through Kalani's training guide + MSDN +...
1
by: Nick | last post by:
Hi, I read somewhere recently that the strategy pattern could be used instead of using switch command on an enum (for example). How would this work? A small example would be great. Thanks,...
1
by: prileep | last post by:
I have two tables Users and UserLogin. Here i will use two methods of table design. and which query will return me the result more fast. The table size will be large that it may contain records in...
1
by: rneel | last post by:
i have two tables a1 and b1. they have a common column which is not having any constraints... they have null values... when i use a join between the two columns the null values are neglected... how...
0
by: satishreddy | last post by:
how many types of joins are there in oracle
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.