473,473 Members | 2,004 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

20 million records: Too many for an MS Access app?

This is pursuant to another thread I tried to start, but can't find on
my server.

I've got to store rolling rates of return for investment funds -
calendar year, quarterly, and monthly

To cut to the chase, with a normal data structure, my estimates are
coming out around 20 million recs max.

They'd be indexed on FundID, BeginDate, EndDate, and ReturnType
(Monthly or Quarterly).

Seems to me like I read somewhere that when indexing is used, the
number of records isn't all that big a performance factor.

Can anybody agree or disagree with this?

My alternative is a Pivot Table strategy

For annual returns, we'd have a record for Fund/Year" and 9
RateOfReturn columns - one for each of the rolling periods we need:

- This year and last year
- This year and the previous 2 years
- This year and the previous 3 years
- This year and the previous 4 years
- This year and the previous 5 years
- This year and the previous 6 years
- This year and the previous 7 years
- This year and the previous 8 years
- This year and the previous 9 years
For quarterlies, we'd have a record for each Fund/Year/Quarter and 10
rate of return columns:

- Current quarter back 4 quarters
- Current quarter back 8 quarters
- Current quarter back 12 quarters
- Current quarter back 16 quarters
- Current quarter back 20 quarters
- Current quarter back 24 quarters
- Current quarter back 28 quarters
- Current quarter back 32 quarters
- Current quarter back 36 quarters
- Current quarter back 40 quarters
For monthlies, we'd have a record for each Fund/Year/Month and 8
columns:

- Previous 3 months
- Previous 6 months
- YTD
- Previous 12 months
- Previous 24 months
- Previous 36 months
- Previous 48 months
- Previous 60 months

This would cut the number of records down by a factor of 8-10, giving
us more like 2 million instead of 20 million records.

But would it be worth the loss in flexability/aesthetics?
Nov 13 '05 #1
2 4069
On Wed, 16 Feb 2005 14:27:20 -0500, PeteCresswell <x@y.z> wrote:

It is true: indexing essentially makes the performance independent of
record count for some kinds of queries. For example selecting one
record out of 2E4 records or out of 2E7 records takes about the same
amount of time: way less than a second.
But this is not true for all queries, e.g. those that require a
tablescan.
Sounds like you are working on a serious project. I would spend some
time on a mockup of the database, filled with 2E7 fake records. It
takes little time to write some code to fill a table with
pseudo-random records mimicking the records that will actually happen.
For a similar situation I wrote functions like GetRandomNumber(intLow,
intHigh) and GetRandomText(intLength). It probably needs to run
overnight to fill with 2E7 records.
There is nothing like running some performance stats on the real
network using the real database (or as real as it gets).

When designing a db, I always gravitate to 3rd normal form first, and
will only deviate from that once I prove to myself that this structure
is time-prohibitive. I have made the mistake in the past to guess
that some query would be too slow and thus a denormalized design was
in order. I'm not making that mistake anymore. I let performance speak
for itself.

Another rule of thumb: every factor of 1000 more records, all rules
change. So if you know how to deal with thousands of records, you know
nothing about dealing with millions. If you know millions, nothing
about billions. Anecdote: I once exported 35E7 records to text using
BCP, and ran a custom multithreaded C application on it to manipulate
the data on a 4-way processor machine, and then BCP'ed it back into
the table. That was faster than having SQL Server 4.2 (?) do it.

-Tom.
This is pursuant to another thread I tried to start, but can't find on
my server.

I've got to store rolling rates of return for investment funds -
calendar year, quarterly, and monthly

To cut to the chase, with a normal data structure, my estimates are
coming out around 20 million recs max.

They'd be indexed on FundID, BeginDate, EndDate, and ReturnType
(Monthly or Quarterly).

Seems to me like I read somewhere that when indexing is used, the
number of records isn't all that big a performance factor.

Can anybody agree or disagree with this?

My alternative is a Pivot Table strategy

For annual returns, we'd have a record for Fund/Year" and 9
RateOfReturn columns - one for each of the rolling periods we need:

- This year and last year
- This year and the previous 2 years
- This year and the previous 3 years
- This year and the previous 4 years
- This year and the previous 5 years
- This year and the previous 6 years
- This year and the previous 7 years
- This year and the previous 8 years
- This year and the previous 9 years
For quarterlies, we'd have a record for each Fund/Year/Quarter and 10
rate of return columns:

- Current quarter back 4 quarters
- Current quarter back 8 quarters
- Current quarter back 12 quarters
- Current quarter back 16 quarters
- Current quarter back 20 quarters
- Current quarter back 24 quarters
- Current quarter back 28 quarters
- Current quarter back 32 quarters
- Current quarter back 36 quarters
- Current quarter back 40 quarters
For monthlies, we'd have a record for each Fund/Year/Month and 8
columns:

- Previous 3 months
- Previous 6 months
- YTD
- Previous 12 months
- Previous 24 months
- Previous 36 months
- Previous 48 months
- Previous 60 months

This would cut the number of records down by a factor of 8-10, giving
us more like 2 million instead of 20 million records.

But would it be worth the loss in flexability/aesthetics?


Nov 13 '05 #2
Per Tom van Stiphout:
Sounds like you are working on a serious project. Very serious. New client. Beeeeeg client. If I can make these guys happy,
I'm "in"....if not....well, windsurfing season is just around th corner...
I would spend some
time on a mockup of the database, filled with 2E7 fake records.
That was my initial feeling. Now that you've said it, I guess I'll get on with
it.

When designing a db, I always gravitate to 3rd normal form first, and
will only deviate from that once I prove to myself that this structure
is time-prohibitive. I have made the mistake in the past to guess
that some query would be too slow and thus a denormalized design was
in order. I'm not making that mistake anymore. I let performance speak
for itself.


Thanks for the insights....
--
PeteCresswell
Nov 13 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Warren Wright | last post by:
Hello, We maintain a 175 million record database table for our customer. This is an extract of some data collected for them by a third party vendor, who sends us regular updates to that data...
4
by: Bayou BoB | last post by:
Hello; Simple question for most I'm sure. I'm curious how many records MS Access will store in a single database? What happens when you reach that number of records? Do you just lose the ability...
6
by: Paul T. Rong | last post by:
Dear all, Here is my problem: There is a table "products" in my access database, since some of the products are out of date and stopped manufacture, I would like to delete those PRODUCTS from...
1
by: gooogle | last post by:
Hi, We need to log the fields from a IIS log into some database where it can be sorted by date and id of a record. There will be about a million records per day or more than that may be. What...
17
by: Lauren Quantrell | last post by:
Using MS Access 2K, I have a client with a number of seperate customer tables for each country, approx 50 tables, stored on a SQL Server backend. I cleaned up the data in the tables and inserted...
4
by: serge | last post by:
I am running a query in SQL 2000 SP4, Windows 2000 Server that is not being shared with any other users or any sql connections users. The db involves a lot of tables, JOINs, LEFT JOINs, UNIONS...
1
by: achurin | last post by:
Hi everybody, I am a rookie at the forum stuff. Please don't bash me too badly. I have created an app a few years back to store some records in a DB. According our calculations we were never to...
3
by: Raqueeb Hassan | last post by:
Hello, Given the idea of having voter ID card for all the citizens of Bangladesh, I was thinking of assessing few things before it actually starts. The election commission, the government agency...
6
by: goraya | last post by:
This is design level discussion about web applications. How I design application that support 1 million concurrent requests??
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.