473,503 Members | 1,746 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

behavior of SQL on joined queries

Hi all,

Currently our product has a setup that stores information about
transactions in a transaction table. Additionally, certain transactions
pertain to specific people, and extra information is stored in another
table. So for good or ill, things look like this right now:

create table TransactionHistory (
TrnID int identity (1,1),
TrnDT datetime,
--other information about a basic transaction goes here.
--All transactions have this info
Primary Key Clustered (TrnID)
)

Create Index TrnDTIndex on TransactionHistory(TrnDT)

create table PersonTransactionHistory (
TrnID int,
PersonID int,
--extended data pertaining only to "person" transactions goes
--here. only Person transactions have this
Primary Key Clustered(TrnID),
Foreign Key (TrnID) references TransactionHistory (TrnID)
)

Create Index TrnPersonIDIndex on PersonTransactionHistory(Person)
A query about a group of people over a certain date range might fetch
information like so:
select * from TransactionHistory TH
inner join PersonTransactionHistory PTH
on TH.TrnID = PTH.TrnID
where PTH.PersonID in some criteria
and TH.TrnDT between some date and some date
In my experience, this poses a real problem when trying to run queries
that uses both date and personID criteria. If my guesses are correct this
is because SQL is forced to do one of two things:

1 - Use TrnPersonIDIndex to find all transactions which match the person
criteria, then for each do a lookup in the PersonTransactionHistory to
fetch the TrnID and subsequently do a lookup of the TrnID in the clustered
index of the TransactionHistory Table, and finally determine if a given
transaction also matches the date time criteria.

2 - Use TrnDTIndex to final all transaction matching the date criteria,
and then perform lookups similar to the above, except for personID instead
of datetime.

Compounding this is my suspicion (based on performance comparison of when
I specify which indexes to use in the query vs when I let SQL Server
decide itself) that SQL sometimes chooses a very non optimal course. (Of
course, sometimes it chooses a better course than me - the point is I want
it to always be able to pick a good enough course such that I don't have
to bother specifying). Perhaps the table layout is making it difficult for
SQL Server to find a good query plan in all cases.

Basically I'm trying to determine ways to improve our table design here to
make reporting easier, as this gets painful when running report for
large groups of people during large date ranges. I see a few options based
on my above hypothesis, and am looking for comments and/or corrections.

1 - Add the TrnDT column to the PersonTransactionHistory Table as
well. Then create a foreign key relationship of PersonTransactionHistory
(TrnID, TrnDT) references TransactionHistory (TrnID, TrnDT) and create
indexes on PersonTransactionHistory with (TrnDT, PersonID) and
(PersonID, TrnDT). This seems like it would let SQL Server make
much more efficient execution plans. However, I am unsure if SQL server
can leverage the FK on TrnDT to use those new indexes if I give it a query
like:

select * from TransactionHistory TH
inner join PersonTransactionHistory PTH
on TH.TrnID = PTH.TrnID
where PTH.PersonID in some criteria
and TH.TrnDT between some date and some date

The trick being that SQL server would know that it can use PTH.TrnDT and
TH.TrnDT interchangably because of the foreign key (this would support all
the preexisting existing queries that explicitly named TH.TrnDT - any that
didn't explicitly specify the table would now have ambigious column
names...)

2 - Just coalesce the two tables into one. The original intent was to save
space by not requiring extra columns about Persons for all rows, many of
which did not have anything to do with a particular person (for instance a
contact point going active). In my experience with our product, the end
user's decisions about archiving and purging have a much bigger impact
than this, so in my opinion efficient querying is more important than
space. However I'm not sure if this is an elegant solution either. It also
might require more changes to existing code, although the use of views
might help.

We also run reports based on other criteria (columns I replaced with
comments above) but none of them are as problematic as the situation
above. However, it seems that if I can understand the best way to solve
this, I will be able to leverage that approach if other types of reports
become problematic.

Any opinions would be greatly appreciated. Also any references to good
sources regarding table and index design would be helpful as well (online
or offline references...)

thanks,
Dave

Jul 20 '05 #1
3 1514
Metal Dave (me***@spam.spam) writes:
create table TransactionHistory (
TrnID int identity (1,1),
TrnDT datetime,
--other information about a basic transaction goes here.
--All transactions have this info
Primary Key Clustered (TrnID)
)

Create Index TrnDTIndex on TransactionHistory(TrnDT)

create table PersonTransactionHistory (
TrnID int,
PersonID int,
--extended data pertaining only to "person" transactions goes
--here. only Person transactions have this
Primary Key Clustered(TrnID),
Foreign Key (TrnID) references TransactionHistory (TrnID)
)

Create Index TrnPersonIDIndex on PersonTransactionHistory(Person)
Given your query, it could be a good idea to have the clustered index
on TrnDT and PersonID instead. The main problem now with the queries
is that SQL Server will have to make a choice between Index Seek +
Bookmark Lookup on the one hand, and Clustered Index Scan on the other.
This is a guessing game that does not always end up the best way.

Of course, you may have other queries that are best off with clustering
on the Pkey, but this does not seem likely. (Insertion may however
benefit from a montonically increasing index. A clustered index on
PersonID may cause fragmentation.)
1 - Add the TrnDT column to the PersonTransactionHistory Table as
well. Then create a foreign key relationship of PersonTransactionHistory
(TrnID, TrnDT) references TransactionHistory (TrnID, TrnDT) and create
indexes on PersonTransactionHistory with (TrnDT, PersonID) and
(PersonID, TrnDT). This seems like it would let SQL Server make
much more efficient execution plans. However, I am unsure if SQL server
can leverage the FK on TrnDT to use those new indexes if I give it a query
like:

select * from TransactionHistory TH
inner join PersonTransactionHistory PTH
on TH.TrnID = PTH.TrnID
where PTH.PersonID in some criteria
and TH.TrnDT between some date and some date
Well, take a copy of the database and try it!

(But first try changing the clustered index.)
2 - Just coalesce the two tables into one. The original intent was to save
space by not requiring extra columns about Persons for all rows, many of
which did not have anything to do with a particular person (for instance a
contact point going active).


Depends a little on the ration. If the PersonTransactionHistory is 50%
of all rows in the main table, collapsing into one is probably the best.
If it's 5%, I don't think it is.


--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #2
On Tue, 26 Oct 2004, Erland Sommarskog wrote:
Given your query, it could be a good idea to have the clustered index
on TrnDT and PersonID instead. The main problem now with the queries
is that SQL Server will have to make a choice between Index Seek +
Bookmark Lookup on the one hand, and Clustered Index Scan on the other.
This is a guessing game that does not always end up the best way.

Of course, you may have other queries that are best off with clustering
on the Pkey, but this does not seem likely. (Insertion may however
benefit from a montonically increasing index. A clustered index on
PersonID may cause fragmentation.)
My intuition agrees with you regarding the index in this case. I'm
pretty sure the clustered bookmark scan kills us on many reports. However
I haven't looked with enough depth at the wide variety of queries we use
to know for sure where I should put the clustered index so I'm reserving
judgement for now. I'd also like to study a bit more first so that I
don't replace one hasty decision with another - it might solve ad
individual problem but exacerbate others.

For instance, I think

select * from PersonTransactionHistory PTH
inner join TransactionHistory TH on PTH.TrnID = TH.TrnID
where PTH.PersonID = 12345

would be harmed by moving the TH clustered index from TH.TrnID to
TH.TrnDT, as it would now have to make the same lookup vs scan choice in
order to perform the join. Does that make sound reasonable? And since it's
rare for us to access PTH without the inner join to TH, there are probably
many queries like this.
1 - Add the TrnDT column to the PersonTransactionHistory Table as
well. Then create a foreign key relationship of PersonTransactionHistory
(TrnID, TrnDT) references TransactionHistory (TrnID, TrnDT) and create
indexes on PersonTransactionHistory with (TrnDT, PersonID) and
(PersonID, TrnDT). This seems like it would let SQL Server make
much more efficient execution plans. However, I am unsure if SQL server
can leverage the FK on TrnDT to use those new indexes if I give it a query
like:

select * from TransactionHistory TH
inner join PersonTransactionHistory PTH
on TH.TrnID = PTH.TrnID
where PTH.PersonID in some criteria
and TH.TrnDT between some date and some date


Well, take a copy of the database and try it!


I appreciate the value of experimentation and normally would do that but
if it didn't work that wouldn't necesarily prove to me that I wasn't
simply doing something wrong like not making the foreign key specific
enough or putting something in my query which made SQL server ignore this
potential valuable relationship. So I was basically wondering if there
were any good docs regarding what types of information SQL Server will and
will no leverage in its choices or whether someone familiar with those
rules had some feedback off the top of their head.

2 - Just coalesce the two tables into one. The original intent was to save
space by not requiring extra columns about Persons for all rows, many of
which did not have anything to do with a particular person (for instance a
contact point going active).


Depends a little on the ration. If the PersonTransactionHistory is 50%
of all rows in the main table, collapsing into one is probably the best.
If it's 5%, I don't think it is.


It's probably between 20% and 40% depending on the particular
installation. It's your rationale that for 50% the space saved is
negligible whereas for 5% is is not? For me it's as more about limiting
the changes to the client software (definitely keeping the tables
separate) vs speeding up queries (possible coalescing) rather than a space
consideration. I did a test once and recall discovering we took up nearly
as much or more space with our indexes than our tables anyway, so
coalescing might make a big space difference anyway. (This amount of index
space suprised me but I'm not sure if there is a good rule of thumb for
how much space indexes should take.)

Rereading the post I probably should have just asked for good table design
references right up front. Any takers?

Thanks for the feedback.

Dave

Jul 20 '05 #3
Metal Dave (me***@spam.spam) writes:
For instance, I think

select * from PersonTransactionHistory PTH
inner join TransactionHistory TH on PTH.TrnID = TH.TrnID
where PTH.PersonID = 12345

would be harmed by moving the TH clustered index from TH.TrnID to
TH.TrnDT, as it would now have to make the same lookup vs scan choice in
order to perform the join. Does that make sound reasonable? And since it's
rare for us to access PTH without the inner join to TH, there are probably
many queries like this.
Let's assume for the example that the clustered index in FTH is on PersonID.
Then the join against TH on TrnID will be akin to Index Seek + Bookmark
Lookup, no matter if the index on TrnID is clustered or not. In both
cases you would expect a plan with a Nested Loop join which means that
for each in FTH you look up a row in TH. The only difference if the index
on TrnID is non-clustered, is that you will get a few more reads for
each access. Which indeed is not neglible, since it multiplies with the
number of rows for PersonID.

And just like "SELECT * FROM tbl WHERE nonclusteredcol = @val" has a
choice between index seek and scan, so have this query. Rather than
nested loop, the optimizer could go for hash or merge join which would
mean a single scan of TH. I would guess that the probability for this is
somewhat higher with a NC index on TrnID.

Of course, you opt to change only FTH, if you like.
I appreciate the value of experimentation and normally would do that but
if it didn't work that wouldn't necesarily prove to me that I wasn't
simply doing something wrong like not making the foreign key specific
enough or putting something in my query which made SQL server ignore this
potential valuable relationship. So I was basically wondering if there
were any good docs regarding what types of information SQL Server will and
will no leverage in its choices or whether someone familiar with those
rules had some feedback off the top of their head.
SQL Server does look at constraints, but really how intelligent it is,
I have not dug into. Thus, my encouragement of experimentation.
It's probably between 20% and 40% depending on the particular
installation. It's your rationale that for 50% the space saved is
negligible whereas for 5% is is not?


Actually, I was more thinking in terms of performance, but space and
performance are related. My idea was that with 50%, the space saved is not
worth the extra complexity, and performance may suffer. With 5%, you save a
lot of space, since FTH would be a small table.

Your concern of having to change the client is certainly not one to be
neglected, and if this is costly in development time, I don't think it's
worth it.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
732
by: Andrew | last post by:
With command-line interface ( 3.23.37, UNIX Socket ) all is well with column aliasing. However, column aliases disappear in Excel, over ODBC, when there are multiple (joined) tables in the query. ...
2
1509
by: GGerard | last post by:
Hello I have the following tables joined on a one (Field4) to many (Field3) relationship Table1 Field1 / Field2 / Field3 1 Bob 2 2 Fred ...
1
3173
by: Ersin Gençtürk | last post by:
hi , I have 2 tables , tUser and tUserRole and I have a query like : select * from tUser inner join tUserRole on tUserRole.UserId=tUser.UserId where tUser.UserId=1 this will return a user...
112
3935
by: Tom | last post by:
This is very strange: I have a Windows Form with a Panel on it. In that panel I dynamically (at run time) create some labels, as so: for i=1 to x dim ctlNew as New Label() with ctlNew...
3
1738
by: loosecannon_1 | last post by:
I get a 90-120 second blocking when send 15 or so simultaneous queries to SQL Server 2000 that query a view made up of two joined tables. After each query is blocking for the same amount of time...
2
2659
by: psuaudi | last post by:
I have a main query that I would like to call two different subqueries. In MS Access, I usually just save the two subqueries as separate queries which are then called by a third separate and main...
0
2432
by: Chuck36963 | last post by:
Hi all, I've been working on a listing problem and I can't figure out how to work it out. I have looked far and wide on the web to find answers, but I'd like other peoples input on my project in...
20
1349
by: Tommy Vercetti | last post by:
Hi - Great group! I have 2 queries about undefined behavior: 1) Is the following code undefined? float myfunction(float f) {
0
7074
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7273
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7322
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
7451
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5572
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
5000
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4667
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
1501
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
731
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.