Will a lookup table improve performance?

Paul H

Suppose I have a table called tblPeople and I want a field to illustrate
whether each person prefers cats or dogs. I could do it one of three ways.

1. A plain text field
Create a text field in tblPeople called PreferredPet.

2. A lookup field that stores text values.
Create a text field in tblPeople called PreferredPetID and use it to lookup
an identical text field in tblPreferredPets.

3. A lookup field that stores an integer.
Create a byte field in tblPeople called PreferredPetID and use it to lookup
an identical field in tblPreferredPets. There will also be an additional
filed in tblPreferredPets called PetType that will store the actual values
(Cat, Dog, Mouse, etc..). This way only byte values will be stored in
tblPeople instead of text strings. This method I would have thought would
benefit performance if there were hundreds of thousands of rows in
tblPeople.

Although option three makes form filters, reports and query design a little
more involved is that the "correct" method for storing a fixed range of
values?

Paul

Jan 6 '07 #1

Subscribe Post Reply

2622

Rick Brandt

Paul H wrote:

Suppose I have a table called tblPeople and I want a field to
illustrate whether each person prefers cats or dogs. I could do it
one of three ways.
1. A plain text field
Create a text field in tblPeople called PreferredPet.

2. A lookup field that stores text values.
Create a text field in tblPeople called PreferredPetID and use it to
lookup an identical text field in tblPreferredPets.

3. A lookup field that stores an integer.
Create a byte field in tblPeople called PreferredPetID and use it to
lookup an identical field in tblPreferredPets. There will also be an
additional filed in tblPreferredPets called PetType that will store
the actual values (Cat, Dog, Mouse, etc..). This way only byte values
will be stored in tblPeople instead of text strings. This method I
would have thought would benefit performance if there were hundreds
of thousands of rows in tblPeople.

Although option three makes form filters, reports and query design a
little more involved is that the "correct" method for storing a fixed
range of values?

Method 3 would be "more preferred" as the text entries get longer and the more
likely they are to be changing down the road. When small and stable storing the
text is fine. The speed and space differences would not be noticeable on modern
hardware.

--
Rick Brandt, Microsoft Access MVP
Email (as appropriate) to...
RBrandt at Hunter dot com

Jan 6 '07 #2

Bernard Peek

In message <hw******************@newssvr27.news.prodigy.net >, Rick
Brandt <ri*********@hotmail.comwrites

>Although option three makes form filters, reports and query design a
little more involved is that the "correct" method for storing a fixed
range of values?

Method 3 would be "more preferred" as the text entries get longer and the more
likely they are to be changing down the road. When small and stable
storing the
text is fine. The speed and space differences would not be noticeable
on modern
hardware.

Just to add a little more detail. What the "correct" method is depends
on context. Most of all it depends on the data, which is why generic
answers have to be hedged around with caveats and exceptions.

In general if you alter the structure of a database away from the
logical data structure inherent in the data there will be a price to
pay. If you understand the structure of the data and you know the other
constraints on the design then you can assess the costs and the benefits
of any particular design, and choose the best one. Any non-trivial
database application can usually be implemented in a number of different
ways.

Some of the factors you might need to consider are:

- Is this a use-once quick fix (if so it may not justify lots of design
effort)
- How well-trained are the users
- What is the cardinality of the relationships between entities
- Does the cardinality change over time
- How many records will the system hold
- How many inserts/updates will there be
- How fast do updates need to be made
- How many users will need simultaneous access
- Is the database distributed or single location
- How clean is the data
- How well does the designer understand the data
- How good is the designer

I'm sure I've missed out lots here.

--
Bernard Peek
back in search of cognoscenti

Jan 6 '07 #3

(PeteCresswell)

Per Paul H:

>
Although option three makes form filters, reports and query design a little
more involved is that the "correct" method for storing a fixed range of
values?

I'm not up to discussing the technicalities, but for me option 3 is almost
always the no-brainer. Only exception I can think of in recent experience is
USA state codes. Even there, I have a tlkpState - but the PK is the two-letter
state code so that's what gets stored in the parent record.

I guess there's an argument for having as much person-readable data in the
parent table as possible - so one might take the state code approach and apply
it more broadly - using the short name of whatever is being stored as the lookup
table's PK but in the end it seems to me like you need to control what's used
and what isn't and the lookup table and the appropriate relationship seem like
the logical tools for that.
--
PeteCresswell

Jan 6 '07 #4

Lyle Fairfield

Paul H wrote:

Suppose I have a table called tblPeople and I want a field to illustrate
whether each person prefers cats or dogs. I could do it one of three ways.

1. A plain text field
Create a text field in tblPeople called PreferredPet.

2. A lookup field that stores text values.
Create a text field in tblPeople called PreferredPetID and use it to lookup
an identical text field in tblPreferredPets.

3. A lookup field that stores an integer.
Create a byte field in tblPeople called PreferredPetID and use it to lookup
an identical field in tblPreferredPets. There will also be an additional
filed in tblPreferredPets called PetType that will store the actual values
(Cat, Dog, Mouse, etc..). This way only byte values will be stored in
tblPeople instead of text strings. This method I would have thought would
benefit performance if there were hundreds of thousands of rows in
tblPeople.

Although option three makes form filters, reports and query design a little
more involved is that the "correct" method for storing a fixed range of
values?

Paul

The nice thing about using a pointer to another table is that we can
use joins to query our information; there maybe some cases where joins
will be thousands of times faster than ins or wheres with subqueries.
Another nice thing is that a null entry pretty much tells use we don't
know, and it's easy to change an entry to a null when we need to do
that.

We could use booleans. My problem with booleans is that MS has not,
TTBOMK, clearly delineated how JET deals with booleans and the third
state (not yet entered). Is it another bit in the same byte? Is it a
bit in another byte? Does it do it at all? Is it identical with
MS-SQLserver? These questions may seem pedantic and probably were ten
years ago, but now we may be moving data among various db storage
systems frequently. It wouldn't hurt for us to know what's going on.
(MS never seems to care if we know what's going on).
A poster here, TC, has indicated that JET 4.0 can store 8 booleans in
the equivalent of two byte fields. I assume one byte keeps track of the
true-false, on-off condition of the boolean and the other keeps track
oft whether the field has actually had anything at all stored in it, or
is just in its default zero state. I've studied mdb files and can find
no reason to doubt him. But I don't KNOW 100% that this is so. And I
don't know about JET 3.5; the documentation for 3.5 seems to state
clearly that one boolean field = one byte field.
And what about ACE? Is it really just JET localized to Access with a
new name?

And what about 64 bits which seems to be hovering around the corner?

I guess because of these concenrs I would not use booleans but I might
use my own bytes as described above.
So bit one of byte A would indicate True of false.
ans bit one of byte B would indicate whether the Booelan had actually
ever been assigned a value or was still in its default state.
I could handle 8 booleans in two bytes this way.
One of the innovations of JET 4 was/is to allow bit wise operations in
our queries if we use an OLEDB connection eg:
SELECT ID, ID BXOR 2 as IDBXORedWith2 from Table1
I am guessing this should be very fast.

Blah blah blah

Enough ...

Jan 7 '07 #5

Tim Marshall

Paul H wrote:

1. A plain text field
Create a text field in tblPeople called PreferredPet.

This route is practical and an acceptable way to do things if, AND ONLY
IF, you establish table level constraints at tblPeople. This means use
the Validation Rule with something like:

"Cat" or "Dog" or "Parrot" etc.

This is practical if:

1) You are certain that the list you develop is not likely going to
change very often;

2) There aren't too many values;

3) The spelling of a choice is not going to change once it's established
(for instance, you decide that "Parrot" should be changed to "Parrot -
means a superior pet owner"); and

4) There's no other data to be associated with the choice.

If all four issues are not satisfied, then one of 2 or 3 is the better
way to with a look up table.

2. A lookup field that stores text values.

3. A lookup field that stores an integer.

2 or 3 is fine as long as the stored value is an unchanging primary key
from the look up table.
--
Tim http://www.ucs.mun.ca/~tmarshal/
^o<
/#) "Burp-beep, burp-beep, burp-beep?" - Quaker Jake
/^^ "Whatcha doin?" - Ditto "TIM-MAY!!" - Me

Jan 7 '07 #6

Allen Browne

Paul, I'm pleased you got different response, and I'm voting for #2.
(This assumes the person is allowed only one preference, i.e. can't be
both.)

Advantages of using a text-based key:
===========================
1. Since there is a simple, obvious, brief, natural key, the artificial key
is redundant.

2. Humans can read the foreign key value, which avoids the newbie's
temptation to put lookup fields in tables:
http://www.mvps.org/access/lookupfields.htm

3. It simplifies queries. Preparing data for a report, you may not need the
lookup table at all (since you have a readable value), and it may even get
you out of a spot involving "ambiguous outer joins."

4. If you filter the RowSource of the foreign key's combo so that it does
not show all values from the lookup table, the entries display as blank if
the bound column is hidden. This suggests to the user that the value is
Null, when in fact it is not. You avoid this misrepresentation of the data
if the bound column is visible, since Access can display the bound column
value, even if it is not in the list.

Disadvantages of using a text-based key:
=============================
1. When you come to write code later, you need to remember to add extra
quotes to criteria strings. It's not obvious how to write the code without
looking up the data type of this particular key. (Workaround: consider some
kind of naming convention for fields to indicate the data type.)

2. For the same reason, it is harder to write generic code that accepts a
key value, because the data type is inconsistent. (Workaround: pass the
whole Where clause instead of the key value. This copes with multi-field
keys as well.)

That should get some more discussion going about preferred styles. :-)

--
Allen Browne - Microsoft MVP. Perth, Western Australia
Tips for Access users - http://allenbrowne.com/tips.html
Reply to group, rather than allenbrowne at mvps dot org.

"Paul H" <pa**@nospam.comwrote in message
news:0e*********************@eclipse.net.uk...

Suppose I have a table called tblPeople and I want a field to illustrate
whether each person prefers cats or dogs. I could do it one of three ways.

1. A plain text field
Create a text field in tblPeople called PreferredPet.

2. A lookup field that stores text values.
Create a text field in tblPeople called PreferredPetID and use it to
lookup an identical text field in tblPreferredPets.

3. A lookup field that stores an integer.
Create a byte field in tblPeople called PreferredPetID and use it to
lookup an identical field in tblPreferredPets. There will also be an
additional filed in tblPreferredPets called PetType that will store the
actual values (Cat, Dog, Mouse, etc..). This way only byte values will be
stored in tblPeople instead of text strings. This method I would have
thought would benefit performance if there were hundreds of thousands of
rows in tblPeople.

Although option three makes form filters, reports and query design a
little more involved is that the "correct" method for storing a fixed
range of values?

Paul

Jan 7 '07 #7

Tom van Stiphout

On 6 Jan 2007 15:58:59 -0800, "Lyle Fairfield" <ly***********@aim.com>
wrote:

I rarely use nullable yes/no fields. Most of the time there is a
reasonable default that can be chosen. To me bit fields have 2 states,
not 3. I would trust a good designer to make sensible exceptions, but
for rookies this is the guideline.

-Tom.

>
Paul H wrote:
>Suppose I have a table called tblPeople and I want a field to illustrate
whether each person prefers cats or dogs. I could do it one of three ways.

1. A plain text field
Create a text field in tblPeople called PreferredPet.

2. A lookup field that stores text values.
Create a text field in tblPeople called PreferredPetID and use it to lookup
an identical text field in tblPreferredPets.

3. A lookup field that stores an integer.
Create a byte field in tblPeople called PreferredPetID and use it to lookup
an identical field in tblPreferredPets. There will also be an additional
filed in tblPreferredPets called PetType that will store the actual values
(Cat, Dog, Mouse, etc..). This way only byte values will be stored in
tblPeople instead of text strings. This method I would have thought would
benefit performance if there were hundreds of thousands of rows in
tblPeople.

Although option three makes form filters, reports and query design a little
more involved is that the "correct" method for storing a fixed range of
values?

Paul

The nice thing about using a pointer to another table is that we can
use joins to query our information; there maybe some cases where joins
will be thousands of times faster than ins or wheres with subqueries.
Another nice thing is that a null entry pretty much tells use we don't
know, and it's easy to change an entry to a null when we need to do
that.

We could use booleans. My problem with booleans is that MS has not,
TTBOMK, clearly delineated how JET deals with booleans and the third
state (not yet entered). Is it another bit in the same byte? Is it a
bit in another byte? Does it do it at all? Is it identical with
MS-SQLserver? These questions may seem pedantic and probably were ten
years ago, but now we may be moving data among various db storage
systems frequently. It wouldn't hurt for us to know what's going on.
(MS never seems to care if we know what's going on).
A poster here, TC, has indicated that JET 4.0 can store 8 booleans in
the equivalent of two byte fields. I assume one byte keeps track of the
true-false, on-off condition of the boolean and the other keeps track
oft whether the field has actually had anything at all stored in it, or
is just in its default zero state. I've studied mdb files and can find
no reason to doubt him. But I don't KNOW 100% that this is so. And I
don't know about JET 3.5; the documentation for 3.5 seems to state
clearly that one boolean field = one byte field.
And what about ACE? Is it really just JET localized to Access with a
new name?

And what about 64 bits which seems to be hovering around the corner?

I guess because of these concenrs I would not use booleans but I might
use my own bytes as described above.
So bit one of byte A would indicate True of false.
ans bit one of byte B would indicate whether the Booelan had actually
ever been assigned a value or was still in its default state.
I could handle 8 booleans in two bytes this way.
One of the innovations of JET 4 was/is to allow bit wise operations in
our queries if we use an OLEDB connection eg:
SELECT ID, ID BXOR 2 as IDBXORedWith2 from Table1
I am guessing this should be very fast.

Blah blah blah

Enough ...

Jan 7 '07 #8

Allen Browne

Lyle, I can't comment on the storage mechanism for booleans in JET, but it
does seem that the JET logic is seriously flawed when it comes to null
values in boolean fields.

Of course, JET can't store Null in a Yes/No field. It seems like the
designers at Microsoft figured they didn't have to build JET to handle Nulls
in boolean fields. That was a bad mistake: you do get Nulls in yes/no
fields, e.g. in a query with an outer join.

Such a query is likely to fail. If you're lucky, you get a senseless error
message such as, "No current record." If you're unlucky, JET crashes.

There's a demo of the first of those two here:
http://allenbrowne.com/bug-14.html
I'm not going to post a demo that crashes Access.

--
Allen Browne - Microsoft MVP. Perth, Western Australia
Tips for Access users - http://allenbrowne.com/tips.html
Reply to group, rather than allenbrowne at mvps dot org.

"Lyle Fairfield" <ly***********@aim.comwrote in message
news:11**********************@11g2000cwr.googlegro ups.com...

>
We could use booleans. My problem with booleans is that MS has not,
TTBOMK, clearly delineated how JET deals with booleans and the third
state (not yet entered). Is it another bit in the same byte? Is it a
bit in another byte? Does it do it at all? Is it identical with
MS-SQLserver? These questions may seem pedantic and probably were ten
years ago, but now we may be moving data among various db storage
systems frequently. It wouldn't hurt for us to know what's going on.
(MS never seems to care if we know what's going on).
A poster here, TC, has indicated that JET 4.0 can store 8 booleans in
the equivalent of two byte fields. I assume one byte keeps track of the
true-false, on-off condition of the boolean and the other keeps track
oft whether the field has actually had anything at all stored in it, or
is just in its default zero state. I've studied mdb files and can find
no reason to doubt him. But I don't KNOW 100% that this is so. And I
don't know about JET 3.5; the documentation for 3.5 seems to state
clearly that one boolean field = one byte field.
And what about ACE? Is it really just JET localized to Access with a
new name?

And what about 64 bits which seems to be hovering around the corner?

I guess because of these concenrs I would not use booleans but I might
use my own bytes as described above.
So bit one of byte A would indicate True of false.
ans bit one of byte B would indicate whether the Booelan had actually
ever been assigned a value or was still in its default state.
I could handle 8 booleans in two bytes this way.
One of the innovations of JET 4 was/is to allow bit wise operations in
our queries if we use an OLEDB connection eg:
SELECT ID, ID BXOR 2 as IDBXORedWith2 from Table1
I am guessing this should be very fast.

Jan 8 '07 #9

Paul H

Thanks Allen, #2 certainly makes the developers life easier. But will
storing a text string (Cat, Dog, Parrot, etc..) have a noticeably
detrimental effect on performance and bloat compared to storing a byte
value?

In this specific project, although a very simple (3-5 tables) database there
will be hundreds of thousand of records to sort and report on. It is also
worth bearing in mind that the database will probably be ported to SQL at
some point in the near future.

Paul

Paul, I'm pleased you got different response, and I'm voting for #2.
(This assumes the person is allowed only one preference, i.e. can't be
both.)

Advantages of using a text-based key:
===========================
1. Since there is a simple, obvious, brief, natural key, the artificial
key is redundant.

2. Humans can read the foreign key value, which avoids the newbie's
temptation to put lookup fields in tables:
http://www.mvps.org/access/lookupfields.htm

3. It simplifies queries. Preparing data for a report, you may not need
the lookup table at all (since you have a readable value), and it may even
get you out of a spot involving "ambiguous outer joins."

4. If you filter the RowSource of the foreign key's combo so that it does
not show all values from the lookup table, the entries display as blank if
the bound column is hidden. This suggests to the user that the value is
Null, when in fact it is not. You avoid this misrepresentation of the data
if the bound column is visible, since Access can display the bound column
value, even if it is not in the list.

Disadvantages of using a text-based key:
=============================
1. When you come to write code later, you need to remember to add extra
quotes to criteria strings. It's not obvious how to write the code without
looking up the data type of this particular key. (Workaround: consider
some kind of naming convention for fields to indicate the data type.)

2. For the same reason, it is harder to write generic code that accepts a
key value, because the data type is inconsistent. (Workaround: pass the
whole Where clause instead of the key value. This copes with multi-field
keys as well.)

That should get some more discussion going about preferred styles. :-)

--
Allen Browne - Microsoft MVP. Perth, Western Australia
Tips for Access users - http://allenbrowne.com/tips.html
Reply to group, rather than allenbrowne at mvps dot org.

"Paul H" <pa**@nospam.comwrote in message
news:0e*********************@eclipse.net.uk...
>Suppose I have a table called tblPeople and I want a field to illustrate
whether each person prefers cats or dogs. I could do it one of three
ways.

1. A plain text field
Create a text field in tblPeople called PreferredPet.

2. A lookup field that stores text values.
Create a text field in tblPeople called PreferredPetID and use it to
lookup an identical text field in tblPreferredPets.

3. A lookup field that stores an integer.
Create a byte field in tblPeople called PreferredPetID and use it to
lookup an identical field in tblPreferredPets. There will also be an
additional filed in tblPreferredPets called PetType that will store the
actual values (Cat, Dog, Mouse, etc..). This way only byte values will be
stored in tblPeople instead of text strings. This method I would have
thought would benefit performance if there were hundreds of thousands of
rows in tblPeople.

Although option three makes form filters, reports and query design a
little more involved is that the "correct" method for storing a fixed
range of values?

Paul

Jan 8 '07 #10

Terry Kreft

#1 is wide open to user entry erors (errors <g>).
#2 Means you have a 1 field 1 key table, the trouble with this is that if
you change an entry in the lookup table you have to cascade that change down
to the fields in all the other tables which use it.
#3 Is the most flexible for accurate data-entry with the least impact for
maintenance. i.e. it is easy to add a new option to the table but amending
an option has no impact on related tables.

Personally I would go for #3 every time.

--

Terry Kreft
"Paul H" <pa**@nospam.comwrote in message
news:0e*********************@eclipse.net.uk...

Suppose I have a table called tblPeople and I want a field to illustrate
whether each person prefers cats or dogs. I could do it one of three ways.

1. A plain text field
Create a text field in tblPeople called PreferredPet.

2. A lookup field that stores text values.
Create a text field in tblPeople called PreferredPetID and use it to

lookup

an identical text field in tblPreferredPets.

3. A lookup field that stores an integer.
Create a byte field in tblPeople called PreferredPetID and use it to

lookup

an identical field in tblPreferredPets. There will also be an additional
filed in tblPreferredPets called PetType that will store the actual values
(Cat, Dog, Mouse, etc..). This way only byte values will be stored in
tblPeople instead of text strings. This method I would have thought would
benefit performance if there were hundreds of thousands of rows in
tblPeople.

Although option three makes form filters, reports and query design a

little

more involved is that the "correct" method for storing a fixed range of
values?

Paul

Jan 8 '07 #11

Allen Browne

The only way to know what the performance penalty would be will be to test
it. My guess is the the penalty will be negligable: since both the primary
key and foreign key fields are indexed, I would expect Access to use the
indexes rather than performing any string matching.

The text-based key will use a few more bytes than the numeric key. That in
turn means more bytes of disk reads/writes, so there might be a penalty
there.

If you envision hundreds of users on this app simultaneously, the concept of
cascading updates to correct a spelling mistake in the key field might also
be unattractive.

The only way to know for sure would be to run some tests on the actual data
you anticipate. Or perhaps someone who has tested this kind of thing on SQL
Server might like to comment.

--
Allen Browne - Microsoft MVP. Perth, Western Australia
Tips for Access users - http://allenbrowne.com/tips.html
Reply to group, rather than allenbrowne at mvps dot org.

"Paul H" <pa**@nospam.comwrote in message
news:D5*********************@eclipse.net.uk...

Thanks Allen, #2 certainly makes the developers life easier. But will
storing a text string (Cat, Dog, Parrot, etc..) have a noticeably
detrimental effect on performance and bloat compared to storing a byte
value?

In this specific project, although a very simple (3-5 tables) database
there will be hundreds of thousand of records to sort and report on. It is
also worth bearing in mind that the database will probably be ported to
SQL at some point in the near future.

Paul

>Paul, I'm pleased you got different response, and I'm voting for #2.
(This assumes the person is allowed only one preference, i.e. can't be
both.)

Advantages of using a text-based key:
===========================
1. Since there is a simple, obvious, brief, natural key, the artificial
key is redundant.

2. Humans can read the foreign key value, which avoids the newbie's
temptation to put lookup fields in tables:
http://www.mvps.org/access/lookupfields.htm

3. It simplifies queries. Preparing data for a report, you may not need
the lookup table at all (since you have a readable value), and it may
even get you out of a spot involving "ambiguous outer joins."

4. If you filter the RowSource of the foreign key's combo so that it does
not show all values from the lookup table, the entries display as blank
if the bound column is hidden. This suggests to the user that the value
is Null, when in fact it is not. You avoid this misrepresentation of the
data if the bound column is visible, since Access can display the bound
column value, even if it is not in the list.

Disadvantages of using a text-based key:
=============================
1. When you come to write code later, you need to remember to add extra
quotes to criteria strings. It's not obvious how to write the code
without looking up the data type of this particular key. (Workaround:
consider some kind of naming convention for fields to indicate the data
type.)

2. For the same reason, it is harder to write generic code that accepts a
key value, because the data type is inconsistent. (Workaround: pass the
whole Where clause instead of the key value. This copes with multi-field
keys as well.)

That should get some more discussion going about preferred styles. :-)

"Paul H" <pa**@nospam.comwrote in message
news:0e*********************@eclipse.net.uk...
>>Suppose I have a table called tblPeople and I want a field to illustrate
whether each person prefers cats or dogs. I could do it one of three
ways.

1. A plain text field
Create a text field in tblPeople called PreferredPet.

2. A lookup field that stores text values.
Create a text field in tblPeople called PreferredPetID and use it to
lookup an identical text field in tblPreferredPets.

3. A lookup field that stores an integer.
Create a byte field in tblPeople called PreferredPetID and use it to
lookup an identical field in tblPreferredPets. There will also be an
additional filed in tblPreferredPets called PetType that will store the
actual values (Cat, Dog, Mouse, etc..). This way only byte values will
be stored in tblPeople instead of text strings. This method I would have
thought would benefit performance if there were hundreds of thousands of
rows in tblPeople.

Although option three makes form filters, reports and query design a
little more involved is that the "correct" method for storing a fixed
range of values?

Paul

Jan 8 '07 #12

Will a lookup table improve performance?

Similar topics