By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,853 Members | 936 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,853 IT Pros & Developers. It's quick & easy.

Linking an Mdb to SQL Server

P: n/a
Hi!!

Years ago I built a database to control the production of a little
factory.
The users wanted to work in a Windows Net workgroup so I created an
mdb with all the tables and data an after that every user had his own
mde linked to the main mdb.
Year by year, progressively the number of records has been growing and
the speed of the application has decreased strongly, because of the
grrrreat number of records stored.
In order to optimize the speed I decided to try to export the main mdb
with the data to SQL Server but I don't know how to do it.

After that I would like to know how to link the mde's to the sql
Server database.

does any one know an internet manual ? or does any one know how to do
what I'm trying ?

Thanks a lot in advance!!!
Nov 12 '05 #1
Share this Question
Share on Google+
14 Replies


P: n/a
Your story sounds most familiar. The little system that grows and grows!

You should be happy, and congratulated when that happens! These kinds of
stories is why ms-access is so popular!

You don't actually mention how large your data tables are. However, the fact
of linked mde front ends does hint you did some homework here. (good for
you!).

You should remember that moving to sql server will not always cause a
instant increase in performance unless your designs take this into account.
There is weekly posts in the sql newsgroups about how data from a JET mdb
file was moved to sql server, and the application now runs slower. So, you
do have to remember that sql server is not a magic bullet, and does not
always cure performance problems. You have to *design* your system to
minimize the amount of data transferred down the wire.

For example, I find that when using a JET file share (no sql server), 5
users, and tables in the 50,000 record range is very sweet. In other words,
with such small tables, ms-access really performs well and one is hard
pressed find any form that takes more then 1 second to load. So, JET is
remarkable in terms of speed. And, with indexing, it does not pull much more
data then what sql server will send (a JET file share with a index only
pulls the data you need down the wire. JET is very smart, and why so many
applications run so well without sql server).

So, taking into account the above, the general approach converting is:

Move your data from that backend to the sql server. You can use the DTS
tools that come with sql server to do this. You can create, and link one
table at a time, but that is a lot of work. In fact, you should give the
up-sizing wizard a try, as it will move tables, and even keep/create your
existing relations.

In your front end, you will have to re-link the tables to the sql server
now.

I not going to suggest that you un-link the front end, and the back end.
But, the upsizing wizards kind of assume NON linked tables. The resulting
up-sized file will have both links to the sql server, and the old tables.
(so, of course you run the up-sizer on the back end). You can then
grab/import the table links into the front end (gee, I guess you have to
delete the front end links first...).

Once that is done, you will have to start looking at fixing performance
problems, and just things that don't work. You should note that a good
number of forms will actually load SLOWER if you do nothing. In fact, the
majority of your forms will now load slower, as your designs are not setup
to take advantage of sql server. You will now have to start changing your
designs to work better with sql server.

Note that you should follow some guide lines when linking to sql server
tables. You want to have timestamp fields in the sql sever tables, and those
timestamp fields SHOULD BE exposed to jet. This is especially so for stuff
like sub-forms.

I would also avoid any sql that tries to "join" those linked tables. In
those cases, create a view on the sql server, and link to that (a view in
sql server is the same idea as a saved query in ms-access).

Here is some more reading:
ACC2000: "Access 2000 Upsizing Tools" White Paper Available in Download
Center
http://support.microsoft.com/?id=241743

ACC2002: "Access 2002 Upsizing Tools" White Paper Available in Download
Center
http://support.microsoft.com/?id=294407

ACC2000: Optimizing for Client/Server Performance (odbc)
http://support.microsoft.com/?id=208858

ACC: "Upsizing to Microsoft SQL Server" White Paper Available in Download
Center (a95, and a97)
http://support.microsoft.com/?id=175619

HOW TO: Convert an Access Database to SQL Server (a97,a2000)
http://support.microsoft.com/?id=237980

ACC: Choosing Database Tools White Paper Available in Download Center

The Choose.exe file contains a document called "Choosing the Right Database
Tools" that discusses Microsoft's database products: Microsoft Access,
Microsoft FoxPro, Microsoft SQL Server, Microsoft Visual Basic, and Open
Database Connectivity (ODBC). Use this document to decide which database
tool is right for you.

http://support.microsoft.com/?id=128384

ACC: Tips for Optimizing Queries on Attached SQL Tables
http://support.microsoft.com/?id=99321
--
Albert D. Kallal (Access MVP)
Edmonton, Alberta Canada
No************@msn.com
http://www.attcanada.net/~kallal.msn

Nov 12 '05 #2

P: n/a
"Albert D. Kallal" <pl********************@msn.com> wrote:
For example, I find that when using a JET file share (no sql server), 5
users, and tables in the 50,000 record range is very sweet.


And I can say the same for 25 users and 200K to 400K records.

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Nov 12 '05 #3

P: n/a
i typed sql into access 2000 help system it explained what you are looking for.
Nov 12 '05 #4

P: n/a
"Tony Toews" <tt****@telusplanet.net> wrote in message

news:6h********************************@4ax.com...
"Albert D. Kallal" <pl********************@msn.com> wrote:
For example, I find that when using a JET file share (no sql server), 5
users, and tables in the 50,000 record range is very sweet.


And I can say the same for 25 users and 200K to 400K records.


Gee, that is rather large to say "sweet". When I say sweet, I mean that
there is not a hint of performance problems!

If you say that your application with that user count and that many records
in some tables runs sweet, then I am impressed.

On the other hand, it is rapidity becoming apparent that a well written JET
application performs very well, and often EASILY gives sql server a run for
the money. When I think about this, I see little reason why a table with
200k records does not perform well on a network. I mean, the amount of
traffic to retrieve that one record to edit is not going to be that large at
all!

There is never a doubt as to the increased security, and reliability, and
connective ability of sql server. Further, I am now starting to see some
posts where people who use adp projects are stating it is *EASIER* to get
better performance then a JET file share.

I am talking on the level of a automatic car vs. a standard car. For most
users, that automatic transmission is better, but with more experience, that
standard is preferred.

If you just start working on a adp project, then no worry about persistent
connection, no splitting etc. The user will NOT experience much of that
"nice list" of performance problems you have. You also eliminate any
problems of service updates for JET. Thus, moving to a adp eliminates the
user having to learn about splitting, or even keeping a persistent
connection, and even updating JET.

Thus, you can read and have learned less about ms-access, and it probably
will work better the first time with a adp project!

So, it would seem that a JET file share in the hands of experienced
developer is really remarkable, but sql server and a adp project just might
be the answer to some developers who do not want to take the time to get
everything just right, or read your performance faq! Hum...just like a
automatic transmission....

--
Albert D. Kallal (MVP)
Edmonton, Alberta Canada
No************@msn.com
http://www.attcanada.net/~kallal.msn

Nov 12 '05 #5

P: n/a
pl********************@msn.com (Albert D. Kallal) wrote in
<ty5Nb.86929$ts4.46467@pd7tw3no>:
"Tony Toews" <tt****@telusplanet.net> wrote in message

news:6h********************************@4ax.com.. .
"Albert D. Kallal" <pl********************@msn.com> wrote:
>For example, I find that when using a JET file share (no sql
>server), 5 users, and tables in the 50,000 record range is very
>sweet.
And I can say the same for 25 users and 200K to 400K records.


Gee, that is rather large to say "sweet". When I say sweet, I mean
that there is not a hint of performance problems!


Well, I have a client with 12-15 simultaneous users whose main form
displays date from a table with almost 350K with records, and the
main child table (displayed in a subform) has over 500K records
(another child table has nearly 70K records). Generally,
performance is *very* good. Retrieval of 1 record or small groups
of records is virtually instantaneous once the main form is
initialized.

The form initialization, though, takes several seconds becasue I
set up some large recordsets in memory when the form opens. If I
were running a server back end, I wouldn't do that, because the
calculations those recordsets represent could be done server side
on request.
If you say that your application with that user count and that
many records in some tables runs sweet, then I am impressed.
I never would have forecast that data sets this large would work
this well in Access.
On the other hand, it is rapidity becoming apparent that a well
written JET application performs very well, and often EASILY gives
sql server a run for the money. When I think about this, I see
little reason why a table with 200k records does not perform well
on a network. I mean, the amount of traffic to retrieve that one
record to edit is not going to be that large at all!


The big difference is the index, not the data retrieved, because in
an app that runs well with large tables, you're retrieving only one
record (or small groups of records) at a time.

But as the PK index doesn't change a huge amount, I expect that
most of it gets retrieved a few pages at a time as needed and is
then cached. The last pages are the only ones that are going to
change and, perhaps be requested a second time, so I don't see that
there is much likelihood that the retrieval time is going to
increase noticeably at all, even with 10 times the number of
records Tony and I are talking about.

The only thing that would change this mix is if there are 10s of
thousands of records being added in batches while other users are
using the database. This could cause the need to retrieve the last
pages of the table's index repeatedly.

At least, that's the way I see it insofar as I understand how Jet
handles indexes and data pages.

--
David W. Fenton http://www.bway.net/~dfenton
dfenton at bway dot net http://www.bway.net/~dfassoc
Nov 12 '05 #6

P: n/a

On Wed, 14 Jan 2004 17:08:38 GMT, dX********@bway.net.invalid (David
W. Fenton) wrote in comp.databases.ms-access:
The big difference is the index, not the data retrieved, because in
an app that runs well with large tables, you're retrieving only one
record (or small groups of records) at a time.
True (about the record retrieval).
But as the PK index doesn't change a huge amount, I expect that
most of it gets retrieved a few pages at a time as needed and is
then cached. The last pages are the only ones that are going to
change and, perhaps be requested a second time, so I don't see that
there is much likelihood that the retrieval time is going to
increase noticeably at all, even with 10 times the number of
records Tony and I are talking about.
I disagree with the logic here, but agree with the conclusion. The
index is not sequential, but rather a tree type structure
(particularly for tables with many records). Jet doesn't suck in the
index sequentially, but rather reads through it. As data is added, it
is not just the last index page that gets updated. Higher levels of
the index tree will also be updated. But yes, substantially
increasing the number of records does not affect index performance in
a directly linear fashion.
The only thing that would change this mix is if there are 10s of
thousands of records being added in batches while other users are
using the database. This could cause the need to retrieve the last
pages of the table's index repeatedly.


And here I agree with the first line but not the second. It is not
the last index pages that are thrashed when many records are added,
but it will indeed slow down performance substantially.

Peter Miller
__________________________________________________ __________
PK Solutions -- Data Recovery for Microsoft Access/Jet/SQL
Free quotes, Guaranteed lowest prices and best results
www.pksolutions.com 1.866.FILE.FIX 1.760.476.9051
Nov 12 '05 #7

P: n/a
pm*****@pksolutions.com (Peter Miller) wrote in
<lu********************************@4ax.com>:

On Wed, 14 Jan 2004 17:08:38 GMT, dX********@bway.net.invalid
(David W. Fenton) wrote in comp.databases.ms-access:
The big difference is the index, not the data retrieved, because
in an app that runs well with large tables, you're retrieving
only one record (or small groups of records) at a time.


True (about the record retrieval).
But as the PK index doesn't change a huge amount, I expect that
most of it gets retrieved a few pages at a time as needed and is
then cached. The last pages are the only ones that are going to
change and, perhaps be requested a second time, so I don't see
that there is much likelihood that the retrieval time is going to
increase noticeably at all, even with 10 times the number of
records Tony and I are talking about.


I disagree with the logic here, but agree with the conclusion.
The index is not sequential, but rather a tree type structure
(particularly for tables with many records). Jet doesn't suck in
the index sequentially, but rather reads through it. As data is
added, it is not just the last index page that gets updated.
Higher levels of the index tree will also be updated. But yes,
substantially increasing the number of records does not affect
index performance in a directly linear fashion.


Say you have an index in a compacted database that takes up 10 data
pages to store.

Say you add records. The original 10 data pages are *not* updated.

But tell us, Peter, are the new entries in the index all in a new
data page, or are new data pages written to replace the old ones? I
can't really see how the latter would work, because, assuming data
page 4 is already filled up, that means data page 4 gets replaced
with data page 11 and 12. That would basically mean very quick
fragmentation of the index, and a proliferation of incomplete data
pages.

I have always assumed (perhaps wrongly) that additions to the index
are in new data pages that are only merged into the full index tree
at compact.
The only thing that would change this mix is if there are 10s of
thousands of records being added in batches while other users are
using the database. This could cause the need to retrieve the
last pages of the table's index repeatedly.


And here I agree with the first line but not the second. It is
not the last index pages that are thrashed when many records are
added, but it will indeed slow down performance substantially.


How does it actually work, in terms of writes to the index pages?

--
David W. Fenton http://www.bway.net/~dfenton
dfenton at bway dot net http://www.bway.net/~dfassoc
Nov 12 '05 #8

P: n/a

David,

On Wed, 14 Jan 2004 22:29:23 GMT, dX********@bway.net.invalid (David
W. Fenton) wrote in comp.databases.ms-access:
Say you have an index in a compacted database that takes up 10 data
pages to store.
OK.
Say you add records. The original 10 data pages are *not* updated.
Before going further, let's clarify definitions. I use data pages to
refer to pages that store data, and index pages to refer to pages that
store indexes. Pages are, of course, generic, and as such, a given
page is the same size whether it stores data or indexes (or anything
else) and can be used for a variety of purposes (although only one
purpose at any given time).

So I'll assume in your first sentence above, you mean index and not
data pages, and likewise through most of the rest of the post below.
Correct me if I'm wrong.
But tell us, Peter, are the new entries in the index all in a new
data page, or are new data pages written to replace the old ones?
Actually, you're leaving out the third option, which is that pages can
be updated.
I
can't really see how the latter would work, because, assuming data
page 4 is already filled up, that means data page 4 gets replaced
with data page 11 and 12. That would basically mean very quick
fragmentation of the index, and a proliferation of incomplete data
pages.
Huh? While this is in fact not how things work (in my understanding)
it doesn't follow that (a) fragmentation poses a problem or (b)
incomplete index (I take it that's what you mean when you say data)
pages do either.
I have always assumed (perhaps wrongly) that additions to the index
are in new data pages that are only merged into the full index tree
at compact.


Indexes are not required to fill their page, and indexing works even
for non-compacted databases. Think about what you are saying. You
are suggesting that not only is the index not optimal after data
entry, but that it is largely unused (for newly added data). Of
course, whatever cache you are assuming is used for indexing this
newly added data would have some structure presumably, but not the
advantages of a full tree structure that exists for data added prior
to the last compact. What benefit would such a system have? Well,
the number one, far and away, benefit is that appending new records
requires no significant performance hit, because the tree is only
adjusted at compact. Try this out. Add 10,000 records to a 100,000
record indexed table. These 10,000 recs should have pk values that
fall throughout the existing range to really make the case clearer.
If you were right, this should not take much longer than adding them
to a table with no records, or to one with 100K existing records, but
no pk. Of course, the results will show otherwise.

I'm not knocking the idea of caching new data entry and postponing
index updates - it is a technique used in many rdbms and can work
well. But I am telling you that in general, Jet has a tighter
integration of data-entry and index updates. Index optimization is a
benefit of compacting, but indexes are updated at data entry time, and
even if you never compact your file, index pages are updated and
written all the time.
The only thing that would change this mix is if there are 10s of
thousands of records being added in batches while other users are
using the database. This could cause the need to retrieve the
last pages of the table's index repeatedly.


And here I agree with the first line but not the second. It is
not the last index pages that are thrashed when many records are
added, but it will indeed slow down performance substantially.


How does it actually work, in terms of writes to the index pages?


Well, that's for someone at Microsoft to answer. I'm just a curious
bystander. I would however, stress again, that many assumptions one
may have about how indexes, or other db mechanisms for that matter,
are handled are easily confirmed or nullified by simply conceiving of
tests that would show whether the assumptions are merited, and then
running such tests.
Peter Miller
__________________________________________________ __________
PK Solutions -- Data Recovery for Microsoft Access/Jet/SQL
Free quotes, Guaranteed lowest prices and best results
www.pksolutions.com 1.866.FILE.FIX 1.760.476.9051
Nov 12 '05 #9

P: n/a
pm*****@pksolutions.com (Peter Miller) wrote in
<dj********************************@4ax.com>:
On Wed, 14 Jan 2004 22:29:23 GMT, dX********@bway.net.invalid
(David W. Fenton) wrote in comp.databases.ms-access:
Say you have an index in a compacted database that takes up 10
data pages to store.
OK.
Say you add records. The original 10 data pages are *not*
updated.


Before going further, let's clarify definitions. I use data pages
to refer to pages that store data, and index pages to refer to
pages that store indexes. Pages are, of course, generic, and as
such, a given page is the same size whether it stores data or
indexes (or anything else) and can be used for a variety of
purposes (although only one purpose at any given time).

So I'll assume in your first sentence above, you mean index and
not data pages, and likewise through most of the rest of the post
below. Correct me if I'm wrong.


You read me correctly.
But tell us, Peter, are the new entries in the index all in a new
data page, or are new data pages written to replace the old ones?


Actually, you're leaving out the third option, which is that pages
can be updated.


Well, I'm assuming certain things about the PK index (which is the
one that matters in this scenario):

1. it's an autonumber.

2. it can't be updated, because of 1).

So, the index page is not updated for existing records, unless they
are deleted. I don't know how index pages work, but data pages
don't delete the data, just mark it deleted. I'd assume that for
performance purposes, there'd be something similar.

An update, then, could occur only on an insert. But if you start
with a compacted file, the index pages are filled, no? And,
therefore, there's no room to add new data.

Or are index pages written differently, for example, new pages are
allocated when only 1/2 filled, leaving plenty of room for
additions to that leaf of the tree?

I can't quite see, though, how this could be with an Autonumber,
though, unless it's random.

I guess I don't know how the nodes of the index tree are allocated
in a sequential numeric index, assuming that any gaps are never
filled (as should be the case with any Autonumber PK).
I
can't really see how the latter would work, because, assuming
data page 4 is already filled up, that means data page 4 gets
replaced with data page 11 and 12. That would basically mean very
quick fragmentation of the index, and a proliferation of
incomplete data pages.


Huh? While this is in fact not how things work (in my
understanding) it doesn't follow that (a) fragmentation poses a
problem or (b) incomplete index (I take it that's what you mean
when you say data) pages do either.


Perhaps I wasn't clear in what I was talking about, Autonumber
primary keys, exclusively, as this is what really has the greatest
impact in an application where you're retrieving single records by
PK. If you're retrieving by other data, the PK index is involved
secondarily, since it's what's used to find the data pages, right?
I have always assumed that non-PK indexes are keyed not to the
original data pages but to the PK index -- perhaps that was a wrong
assumption?
I have always assumed (perhaps wrongly) that additions to the
index are in new data pages that are only merged into the full
index tree at compact.


Indexes are not required to fill their page, and indexing works
even for non-compacted databases. Think about what you are
saying. You are suggesting that not only is the index not optimal
after data entry, but that it is largely unused (for newly added
data). Of course, whatever cache you are assuming is used for
indexing this newly added data would have some structure
presumably, but not the advantages of a full tree structure that
exists for data added prior to the last compact. . . .


I would have assumed two tree structures, just as we have for data
pages.
. . . What benefit
would such a system have? Well, the number one, far and away,
benefit is that appending new records requires no significant
performance hit, because the tree is only adjusted at compact.
Try this out. Add 10,000 records to a 100,000 record indexed
table. These 10,000 recs should have pk values that fall
throughout the existing range to really make the case clearer. . .
But that will *not* be the case with the implicit scenario I was
thinking of, unless your Autonumber PK is random. I have
occasionally thought that a random Autonumber PK could improve
concurrency.
. . . If
you were right, this should not take much longer than adding them
to a table with no records, or to one with 100K existing records,
but no pk. Of course, the results will show otherwise.
I'm not sure how your answer applies to the scenario I was thinking
of.

You should learn to read my mind! :)
I'm not knocking the idea of caching new data entry and postponing
index updates - it is a technique used in many rdbms and can work
well. But I am telling you that in general, Jet has a tighter
integration of data-entry and index updates. Index optimization
is a benefit of compacting, but indexes are updated at data entry
time, and even if you never compact your file, index pages are
updated and written all the time.


Can you draw this out for three kinds of indexes:

1. non-random Autonumber PK.

2. alphanumeric non-PK index (say, on LastName).

3. random Autonumber PK.
The only thing that would change this mix is if there are 10s
of thousands of records being added in batches while other
users are using the database. This could cause the need to
retrieve the last pages of the table's index repeatedly.

And here I agree with the first line but not the second. It is
not the last index pages that are thrashed when many records are
added, but it will indeed slow down performance substantially.


How does it actually work, in terms of writes to the index pages?


Well, that's for someone at Microsoft to answer. I'm just a
curious bystander. I would however, stress again, that many
assumptions one may have about how indexes, or other db mechanisms
for that matter, are handled are easily confirmed or nullified by
simply conceiving of tests that would show whether the assumptions
are merited, and then running such tests.


I'm working from my experience with how quickly and efficiently Jet
manages to retrieve small sets of records from relatively large
data tables. I'm trying to figure out why that should be, and I can
only assuming that caching of the indexes is why it works so well.
For that to work, the cache can't need to be repopulated after
every update.

Of course, server-side caching might be helping, too.

--
David W. Fenton http://www.bway.net/~dfenton
dfenton at bway dot net http://www.bway.net/~dfassoc

Nov 12 '05 #10

P: n/a

On Thu, 15 Jan 2004 00:04:52 GMT, dX********@bway.net.invalid (David
W. Fenton) wrote in comp.databases.ms-access:
Actually, you're leaving out the third option, which is that pages
can be updated.
Well, I'm assuming certain things about the PK index (which is the
one that matters in this scenario):

1. it's an autonumber.

2. it can't be updated, because of 1).


Ah, I was not at all assuming (1), since it is certainly not an
inherent characteristic of a pk, but ok, let's go with that.
So, the index page is not updated for existing records, unless they
are deleted.
OK, and that's another assumption I wasn't making (ie, no deletes
occur). This also seems to make the example a more specialized case,
since pk's must certainly expect and handle record deletions, but if
this is the example that we are to work with, then ok, so be it. An
autonumber field for the pk, and no record deletions. I'm with you
so far.
I don't know how index pages work, but data pages
don't delete the data, just mark it deleted.
This is not true. I know I've posted on this many times, but to recap
quickly, Jet doesn't simply flag deleted data like some other desktop
dbms's. Jet copies the first deleted record on a data page over all
other deleted records on a data page. This destroys all but the first
deleted record on each page. If only one record fit on each 4kb data
page, then undeletion would be trivial, and what you say would be
true. But typically, several dozen or more records exist on a given
data page, so assuming just 20 recs per page, undeletion can only
bring back 5% of the original data (ie, one in every twenty records).
Its true that record deletion does not result in reduction in size of
the database, but it very much does result in deletion (or erasure) of
the vast majority of the deleted records.

Again, this is trivial to test. Create a table. Add a dozen records
with distinct text values. Close the db and open it in a hex editor
(or a text editor if no hex editor is available). The file contents
will look like gibberish, but that's ok. Search for the first text
value entered. You'll then see the other eleven close by, with some
misc bytes amongst them. Now close the editor, open the db, delete
the twelve records, close the db, and re-open it in the editor.
Again, search for the first value. You will find it, but you will
also find that the eleven other text values are gone, and that this
first deleted value is repeated twelve times.

So there's proof of what I'm saying.

Now consider why Msft does this. It takes time to write the first
deleted record over the other deleted records, so it is not for
efficiency's sake. It doesn't provide security (if that were the
goal, the page would be erased rater than some data duplicated). It
also prevents undeletion (but not because security is a priority!).
In other words, it's stupid. But that's how it is.
I'd assume that for
performance purposes, there'd be something similar.
Since the former is false, the latter assumption doesn't follow.
An update, then, could occur only on an insert. But if you start
with a compacted file, the index pages are filled, no? And,
therefore, there's no room to add new data. Or are index pages written differently, for example, new pages are
allocated when only 1/2 filled, leaving plenty of room for
additions to that leaf of the tree?
Remember that there are leaves and branches (or at least there are if
there are enough records in the table to justify branches). Even with
a compacted database, it is not required that there are one or fewer
partially full branch pages, or leaf pages.
I can't quite see, though, how this could be with an Autonumber,
though, unless it's random.
AFAIK, indexes doesn't treat autonumbers as a special case. They are
simply long integer pk fields. Autonumber fields only have special
meaning when it comes to populating them for the first time, or
attempting to update them.
Huh? While this is in fact not how things work (in my
understanding) it doesn't follow that (a) fragmentation poses a
problem or (b) incomplete index (I take it that's what you mean
when you say data) pages do either.


Perhaps I wasn't clear in what I was talking about, Autonumber
primary keys, exclusively, as this is what really has the greatest
impact in an application where you're retrieving single records by
PK.


I'm not sure what you mean here. Whether a pk is an autonumber field
or not really has no impact on pk maintenance, storage, etc. I was
assuming we were talking about pulling records solely by pk usage, but
as I mentioned above, I was not assuming the pk must be an autonumber
field, but then it is irrelevant whether the pk field is autonumber or
not for index purposes.
If you're retrieving by other data, the PK index is involved
secondarily, since it's what's used to find the data pages, right?
Well, in a sense, but not really. If you pulled records by values in
other fields, then those fields were either indexed or not. If they
were indexed, then there indexes are sued INSTEAD of the pk index, but
if these other fields were not indexed at all, then the retrieval uses
the pk index since it is the primary means to navigate the table's
data. So in these other cases, either the pk isn't used, or it is
used as the primary index. Its only really used as a secondary index
if both pk and non-pk-but-otherwise-indexed fields are used for
identifying records to be retrieved, and only then under certain
circumstances.
I have always assumed that non-PK indexes are keyed not to the
original data pages but to the PK index -- perhaps that was a wrong
assumption?
They are keyed to the pk, but remember that the pk has its values
stored twice, once in the actual data pages and once in the index
pages. This, incidentally, is why it is usually ill-advised to
construct a multi-text-field pk, as it results in a lo of extra
overhead and duplication.
I have always assumed (perhaps wrongly) that additions to the
index are in new data pages that are only merged into the full
index tree at compact.


Indexes are not required to fill their page, and indexing works
even for non-compacted databases. Think about what you are
saying. You are suggesting that not only is the index not optimal
after data entry, but that it is largely unused (for newly added
data). Of course, whatever cache you are assuming is used for
indexing this newly added data would have some structure
presumably, but not the advantages of a full tree structure that
exists for data added prior to the last compact. . . .


I would have assumed two tree structures, just as we have for data
pages.


Huh? Tree structures exist for indexes. There is no reason why they
should exist for data pages.
. . . What benefit
would such a system have? Well, the number one, far and away,
benefit is that appending new records requires no significant
performance hit, because the tree is only adjusted at compact.
Try this out. Add 10,000 records to a 100,000 record indexed
table. These 10,000 recs should have pk values that fall
throughout the existing range to really make the case clearer. . .


But that will *not* be the case with the implicit scenario I was
thinking of, unless your Autonumber PK is random.


Yes, we were not on the same page (no pun intended) when I wrote that.
I have
occasionally thought that a random Autonumber PK could improve
concurrency.
This is precisely so. Look at replication identifiers. They are
large and random.
. . . If
you were right, this should not take much longer than adding them
to a table with no records, or to one with 100K existing records,
but no pk. Of course, the results will show otherwise.


I'm not sure how your answer applies to the scenario I was thinking
of.

You should learn to read my mind! :)


It just shows that this index-cache idea isn't really applicable to
Jet.
I'm not knocking the idea of caching new data entry and postponing
index updates - it is a technique used in many rdbms and can work
well. But I am telling you that in general, Jet has a tighter
integration of data-entry and index updates. Index optimization
is a benefit of compacting, but indexes are updated at data entry
time, and even if you never compact your file, index pages are
updated and written all the time.


Can you draw this out for three kinds of indexes:

1. non-random Autonumber PK.

2. alphanumeric non-PK index (say, on LastName).

3. random Autonumber PK.


Well, the three are treated similarly. Remember that Jet doesn't
anticipate pk value usage. It simply works with what it gets.
Numeric pks work better than text ones (in my understanding), but
autonumber or non-autonumber is beside the point, as is random or
non-random.
I'm working from my experience with how quickly and efficiently Jet
manages to retrieve small sets of records from relatively large
data tables. I'm trying to figure out why that should be,
Lets assume that 20 records fit on a data page (this is a low
estimate) and that there are one million records in a table (this is a
large estimate), and that we wish to pull one randomly chosen record
by its id. Since index pages store more pointers than data pages
store data, let's assume that there are 100 pk values stored per index
page (this is a very low estimate). Each estimate we've made has made
the task much more difficult than would normally be the case. How
many index and data pages does the jet engine need to pull over the
wire to get this record? Four. That's 16kb. Two branches, a leaf
and a data page. That's pretty easy stuff.
and I can
only assuming that caching of the indexes is why it works so well.


It would be lightning fast without any cache, be it within jet or at
the o/s level.

Peter Miller
__________________________________________________ __________
PK Solutions -- Data Recovery for Microsoft Access/Jet/SQL
Free quotes, Guaranteed lowest prices and best results
www.pksolutions.com 1.866.FILE.FIX 1.760.476.9051
Nov 12 '05 #11

P: n/a

On Thu, 15 Jan 2004 00:52:22 GMT, Peter Miller
<pm*****@pksolutions.com> wrote in comp.databases.ms-access:
Huh? Tree structures exist for indexes. There is no reason why they
should exist for data pages.


I should clarify that by indexes above, I mean to include the pk. One
could argue that a tree for data pages would be the way to implement
the pk. Jet doesn't work that way, though.
Peter Miller
__________________________________________________ __________
PK Solutions -- Data Recovery for Microsoft Access/Jet/SQL
Free quotes, Guaranteed lowest prices and best results
www.pksolutions.com 1.866.FILE.FIX 1.760.476.9051
Nov 12 '05 #12

P: n/a
pm*****@pksolutions.com (Peter Miller) wrote in
<li********************************@4ax.com>:
On Thu, 15 Jan 2004 00:04:52 GMT, dX********@bway.net.invalid
(David W. Fenton) wrote in comp.databases.ms-access:
[]
I can't quite see, though, how this could be with an Autonumber,
though, unless it's random.


AFAIK, indexes doesn't treat autonumbers as a special case. They
are simply long integer pk fields. Autonumber fields only have
special meaning when it comes to populating them for the first
time, or attempting to update them.


Yes, an AutoNumber is really on a special kind of default value.

But my point is that a random AutoNumber will be evenly distributed
through out the index tree (well, randomly distributed, which is,
over time, going to be evenly relatively distributed), whereas
sequential Autonumbers are going to be not at all randomly
distributed. The next one will always be added in predictable part
of the index tree, and the one after that, etc.
Huh? While this is in fact not how things work (in my
understanding) it doesn't follow that (a) fragmentation poses a
problem or (b) incomplete index (I take it that's what you mean
when you say data) pages do either.


Perhaps I wasn't clear in what I was talking about, Autonumber
primary keys, exclusively, as this is what really has the
greatest impact in an application where you're retrieving single
records by PK.


I'm not sure what you mean here. Whether a pk is an autonumber
field or not really has no impact on pk maintenance, storage, etc.


But it certainly does on what values new records are going to have
in the PK index.
I was assuming we were talking about pulling records solely by pk
usage, but as I mentioned above, I was not assuming the pk must be
an autonumber field, but then it is irrelevant whether the pk
field is autonumber or not for index purposes.


It's not irrelevant when the subject is what happens to index
retrieval during data entry. An alphanumeric primary key is going
to be distributed differently than a sequential Autonumber.

[]
I have always assumed that non-PK indexes are keyed not to the
original data pages but to the PK index -- perhaps that was a
wrong assumption?


They are keyed to the pk, but remember that the pk has its values
stored twice, once in the actual data pages and once in the index
pages. This, incidentally, is why it is usually ill-advised to
construct a multi-text-field pk, as it results in a lo of extra
overhead and duplication.


I don't quite follow.

My understanding is if I look for "Fenton" in the LastName field,
the index for LastName will have an entry that amounts to:

Fenton 12345

where 112345 is a pointer to the PK (whether literally so or not),
and this will then be used to look up the data page location in the
primary key index.

If not, then I have to assume that the LastName index refers to the
data page/offset in data page, not to the PK. That must be the case
if the PK index is not used.

And I thought this was why indexed joins are so fast, because I
assumed it was a fundamental part of the underlying Jet engine's
interaction with its data store.
I have always assumed (perhaps wrongly) that additions to the
index are in new data pages that are only merged into the full
index tree at compact.

Indexes are not required to fill their page, and indexing works
even for non-compacted databases. Think about what you are
saying. You are suggesting that not only is the index not
optimal after data entry, but that it is largely unused (for
newly added data). Of course, whatever cache you are assuming
is used for indexing this newly added data would have some
structure presumably, but not the advantages of a full tree
structure that exists for data added prior to the last compact.
. . .


I would have assumed two tree structures, just as we have for
data pages.


Huh? Tree structures exist for indexes. There is no reason why
they should exist for data pages.


No, there's two sets of data pages, the ones that are contiguously
stored, and the ones that are added wherever. This is kept track of
somehow so that all the data can be found. I assumed the same for
the index, the existing index pages and then the new index pages.

[]
I have
occasionally thought that a random Autonumber PK could improve
concurrency.


This is precisely so. Look at replication identifiers. They are
large and random.


Well, I never assumed they were large and random to improve
concurrency, but in order to prevent collisions between records
added in multiple locations.
. . . If
you were right, this should not take much longer than adding
them to a table with no records, or to one with 100K existing
records, but no pk. Of course, the results will show otherwise.


I'm not sure how your answer applies to the scenario I was
thinking of.

You should learn to read my mind! :)


It just shows that this index-cache idea isn't really applicable
to Jet.


But there is *some* caching, surely? It doesn't go back to the
server and retrieve the index for every record retrieval, does it?
I'm not knocking the idea of caching new data entry and
postponing index updates - it is a technique used in many rdbms
and can work well. But I am telling you that in general, Jet
has a tighter integration of data-entry and index updates.
Index optimization is a benefit of compacting, but indexes are
updated at data entry time, and even if you never compact your
file, index pages are updated and written all the time.


Can you draw this out for three kinds of indexes:

1. non-random Autonumber PK.

2. alphanumeric non-PK index (say, on LastName).

3. random Autonumber PK.


Well, the three are treated similarly. Remember that Jet doesn't
anticipate pk value usage. It simply works with what it gets.
Numeric pks work better than text ones (in my understanding), but
autonumber or non-autonumber is beside the point, as is random or
non-random.


It's not Jet that is important, but the caching mechanism.
Non-random Autonumbers are always going to be added to a particular
page (whatever the active unfilled index page is), they are not
going to be going into any old page anywhere in the index. Thus, a
smaller number of index pages is going to be altered with
additions.

With random AutoNumbers, and to a lesser extent, alphanumeric
indexes, any index page could be updated.

That makes caching much less efficient, because if randomly
distributed, all index pages will be updated.
I'm working from my experience with how quickly and efficiently
Jet manages to retrieve small sets of records from relatively
large data tables. I'm trying to figure out why that should be,


Lets assume that 20 records fit on a data page (this is a low
estimate) and that there are one million records in a table (this
is a large estimate), and that we wish to pull one randomly chosen
record by its id. Since index pages store more pointers than data
pages store data, let's assume that there are 100 pk values stored
per index page (this is a very low estimate). Each estimate we've
made has made the task much more difficult than would normally be
the case. How many index and data pages does the jet engine need
to pull over the wire to get this record? Four. That's 16kb.
Two branches, a leaf and a data page. That's pretty easy stuff.


But, assuming a 350K-record table with an AutoNumber PK index, the
size of the index alone is pretty large. Assuming random requests
for AutoNumbers, why wouldn't caching of the PK index improve
performance, by eliminating the need for the re-retrieval of pages
that have already been retrieved earlier and have not been updated?

Perhaps it's disk caching that is doing the job here, and not Jet
caching?
and I can
only assuming that caching of the indexes is why it works so
well.


It would be lightning fast without any cache, be it within jet or
at the o/s level.


How large would a PK index for a 350K table be?

OK, I guess looking at it a different way, it really doesn't
matter, since what really matters is how much data is required to
retrieve the requested record/records. That isn't going to go up a
lot as the index gets larger (it will go up some as you get more
branches to get to the leaf, right?).

The scenario my largest app runs in is:

Lookup on LastName and FirstName, with LIKE name* (i.e., starts
with). In most cases, the data retrieved is from 0 to 50 records,
with most retrievals being 10 or fewer. The return time is simply
lightning fast after the first few retrievals. So, it must be disk
caching that's giving the benefit, is all I can figure, based on
your debunking of my existing theories.

--
David W. Fenton http://www.bway.net/~dfenton
dfenton at bway dot net http://www.bway.net/~dfassoc
Nov 12 '05 #13

P: n/a

On Thu, 15 Jan 2004 02:42:53 GMT, dX********@bway.net.invalid (David
W. Fenton) wrote in comp.databases.ms-access:
It's not irrelevant when the subject is what happens to index
retrieval during data entry. An alphanumeric primary key is going
to be distributed differently than a sequential Autonumber.
I wasn't saying its irrelevant in terms of performance, but rather
that as I understand it, the jet engine does not treat such cases
specially.
They are keyed to the pk, but remember that the pk has its values
stored twice, once in the actual data pages and once in the index
pages. This, incidentally, is why it is usually ill-advised to
construct a multi-text-field pk, as it results in a lo of extra
overhead and duplication.


I don't quite follow.


If you use, say, lastname, streetaddress, city and zip as the pk in a
contact table (stupid, yes, but doable) then these precise values are
stored twice. Since they are strings, and possibly unicode strings at
that, you're talking about a lot of duplication of values. A long
integer pk would have only required four bytes to store the pk value.
You obviously need these text fields stored in the contact table, but
there is no good reason for also storing them in the index pages.
That was my point.
My understanding is if I look for "Fenton" in the LastName field,
the index for LastName will have an entry that amounts to:

Fenton 12345

where 112345 is a pointer to the PK (whether literally so or not),
and this will then be used to look up the data page location in the
primary key index.

If not, then I have to assume that the LastName index refers to the
data page/offset in data page, not to the PK. That must be the case
if the PK index is not used.

And I thought this was why indexed joins are so fast, because I
assumed it was a fundamental part of the underlying Jet engine's
interaction with its data store.
Yes, but I wasn't disputing that. If indexed non-pk fields are used
for joins, and combined with say a where clause, a very efficient join
and data slicing operation is possible.
>I have always assumed (perhaps wrongly) that additions to the
>index are in new data pages that are only merged into the full
>index tree at compact.

Indexes are not required to fill their page, and indexing works
even for non-compacted databases. Think about what you are
saying. You are suggesting that not only is the index not
optimal after data entry, but that it is largely unused (for
newly added data). Of course, whatever cache you are assuming
is used for indexing this newly added data would have some
structure presumably, but not the advantages of a full tree
structure that exists for data added prior to the last compact.
. . .

I would have assumed two tree structures, just as we have for
data pages.


Huh? Tree structures exist for indexes. There is no reason why
they should exist for data pages.


No, there's two sets of data pages, the ones that are contiguously
stored, and the ones that are added wherever. This is kept track of
somehow so that all the data can be found.


OK, I'm not sure why you say this. Jet works on a paging system. The
model does not require contiguous pages for a given table, nor does it
treat contiguous pages specially. Sure, when you compact a database,
data pages are (more or less) stored contiguously, but this has
nothing to do with how jet manages pages in general. Specifically,
there is no meaning to the notion of contiguous pages versus ad-hoc
pages.
I assumed the same for
the index, the existing index pages and then the new index pages.
I disagree with the way you suggest data pages are handled.
Extrapolating to index pages doesn't work for me.
I have
occasionally thought that a random Autonumber PK could improve
concurrency.


This is precisely so. Look at replication identifiers. They are
large and random.


Well, I never assumed they were large and random to improve
concurrency, but in order to prevent collisions between records
added in multiple locations.


Well, both really, no?
. . . If
you were right, this should not take much longer than adding
them to a table with no records, or to one with 100K existing
records, but no pk. Of course, the results will show otherwise.

I'm not sure how your answer applies to the scenario I was
thinking of.

You should learn to read my mind! :)


It just shows that this index-cache idea isn't really applicable
to Jet.


But there is *some* caching, surely? It doesn't go back to the
server and retrieve the index for every record retrieval, does it?


Well, at some level, it needs to check whether the file on the server
has been modified by others, and since there is no server process to
handle such things for Jet, essentially, yes, JEt does need to check
the database file to determine whether it has changed.
That makes caching much less efficient, because if randomly
distributed, all index pages will be updated. ^
(any)
Lets assume that 20 records fit on a data page (this is a low
estimate) and that there are one million records in a table (this
is a large estimate), and that we wish to pull one randomly chosen
record by its id. Since index pages store more pointers than data
pages store data, let's assume that there are 100 pk values stored
per index page (this is a very low estimate). Each estimate we've
made has made the task much more difficult than would normally be
the case. How many index and data pages does the jet engine need
to pull over the wire to get this record? Four. That's 16kb.
Two branches, a leaf and a data page. That's pretty easy stuff.


But, assuming a 350K-record table with an AutoNumber PK index, the
size of the index alone is pretty large.


That;s relative. It'd be 1.4MB for the pk values alone, plus record
pointers, and branch data. Still, all in all, less than 5MB.
Assuming random requests
for AutoNumbers, why wouldn't caching of the PK index improve
performance, by eliminating the need for the re-retrieval of pages
that have already been retrieved earlier and have not been updated?
Because you'd (a) need to pull 5MB over the wire instead of <16kb, and
(b) you'd need to do more i/o than just grab the full pk index anyway
because you'd need to make sure no new records had been added or old
ones deleted by others.
Perhaps it's disk caching that is doing the job here, and not Jet
caching?
Disk caching *does* help, but just in the generic normal o/s sense.
and I can
only assuming that caching of the indexes is why it works so
well.


It would be lightning fast without any cache, be it within jet or
at the o/s level.


How large would a PK index for a 350K table be?


<above>
OK, I guess looking at it a different way, it really doesn't
matter, since what really matters is how much data is required to
retrieve the requested record/records. That isn't going to go up a
lot as the index gets larger (it will go up some as you get more
branches to get to the leaf, right?).
Agreed.
The scenario my largest app runs in is:

Lookup on LastName and FirstName, with LIKE name* (i.e., starts
with). In most cases, the data retrieved is from 0 to 50 records,
with most retrievals being 10 or fewer. The return time is simply
lightning fast after the first few retrievals. So, it must be disk
caching that's giving the benefit, is all I can figure, based on
your debunking of my existing theories.


I agree. I'm just additionally pointing out that there is no need to
pull across a complete index in order to use it efficiently. The more
of it you pull, the faster the engine can work. But that doesn't mean
that pulling it all over before first use is a good idea.
Peter Miller
__________________________________________________ __________
PK Solutions -- Data Recovery for Microsoft Access/Jet/SQL
Free quotes, Guaranteed lowest prices and best results
www.pksolutions.com 1.866.FILE.FIX 1.760.476.9051
Nov 12 '05 #14

P: n/a
starwars <no****@tatooine.homelinux.net> wrote:
Comments: This message did not originate from the Sender address above.
It was remailed automatically by anonymizing remailer software.
Anonymous posting.
Hello David. Because of your killfile filters you apparently missed


As well David should've missed your crap.

Go away. Get lost. We don't want you here.

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Nov 12 '05 #15

This discussion thread is closed

Replies have been disabled for this discussion.