473,326 Members | 2,126 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

vb.net Fastest Data Storage/Access Method

6
I wrote a program in VB.NET that uses a database of 120,000+ records. Each record has 12 fields and can containe quite a bit of data... Currently I have it setup to pull from an Access database, but it is SLOW. Searching and pulling upto 30,000 records at a time can take an hour or more.

My question is... What is the fastst alternative to using an Access DB? I cannot use a SQL DB as the end users will be pulling the data localy, not from a server. The users need to be able to edit and delete the records without it taking forever as well.

Any help is greatly appreciated.
Sep 19 '08 #1
8 7312
mldisibio
190 Expert 100+
Do a search on standalone database file formats that allow users to have a local database, such as SQLLite.

However, the key to any database performance is normalization and indexing.
120000 is not a lot of records. If they were indexed, a file routine could probably load, read and search them in a text file within a few seconds.

Access is not a great database, but since you are already using it, I would suggest reviewing the table structure, normalization, and indexes. That should increase performance dramatically.

After that step, then you might look into competing database engines.
Sep 19 '08 #2
Curtis Rutland
3,256 Expert 2GB
I'll just confirm what the previous poster said. 120000 records is not a lot of records. We work on tables with 2.5 million records, and realistically, that's not a lot. Indexing is likely a large part of your problem.
Sep 19 '08 #3
csin
6
They are indexed...


My issue is more prevelent when runing a deduplication rutine,,Where the app needs to find records that have field 2,4,5, and 12 the same, but the others can be different...

IE:

1 2 3 4 5 6 7 8 9 10 11 12
2 2 34 4 5 46 7 81 3 1 11 12
14 15 16 17 18 19 20 21 22 23 24 25 26

The data would be much longer, but for simplicty use the above... It would delete the second record above because there is already on there that has that info for fields 2,4,5, and 12, but it would not delete both and would perform this check to clean all but one record for each distinct set of fields 2,4,5, and 12.

Would it be work to make a 13th field that has fields 2,4,5, and 12 and then do a select distinct(field13) into a new table?
Sep 20 '08 #4
balabaster
797 Expert 512MB
I'd say that SQL Server or Oracle are the ideal solutions in this case, but as they're not possible, then some form of flat file configuration that would allow random access... It's a good question that raises a whole host of theoretical answers.

Fixed length rows would allow simulation of this.
You can query Excel documents using ADO, so using an Excel native document format could be a possibility...
I believe you can also query CSV using ADO, although I've never done it or seen it done.
Sep 20 '08 #5
mldisibio
190 Expert 100+
A 13th field is a possibility.
But again, a correctly created index would prevent duplicates in the first place.
If you were to create a unique index composed of the fields in question (fields 2,4,5,12 if I am understanding you correctly that you mean fields and not values) then
- duplicate records would never be allowed in the first place
- or, if you needed to allow them for some reason, the index would not be unique, but searching for that combination of fields should be much quicker.

However, I understand that then you may need to add application code to explain to the user why their entry failed, and maybe you cannot do that now.

You may also need to do a one time clean-up, in which case the 13th field or temporary pseudo-index tables can help.

Is a permanent fix such as not allowing such duplicates a possibility for you?

Again, not to criticize, but if you have data that requires a painfully search to eliminate odd combinations of duplicates, I would strongly recommend reviewing your data model to see if it can be normalized. For example, if fields 2,4,5 and 12 represent some kind of unique key, why not have a table with just those fields, where all fields together make up a primary key, and then have your (current) 12 field table (now reduced to 8 fields) simply reference that unique key set with a foreign key?
Sep 20 '08 #6
csin
6
Okay, The program im working on is an internal app for reviews on other applications. Each month an update will come out with new reviewed applications.

The way I have it now is like this:

tblPrograms
-id (index)
-lngPlatformFK
-strCompany
-strProgramName
-strVersion
-strInfo
-strURL
-strLetter
-strNew
-strUpdate
-strType
-strMU

tblPlatforms
-strPlatform
-strMU

tbl:etters
-lngPlatormFK
-strLetter
-strMU

tblUpdates
-strUpdate


strMU designate "(U)ser Database" or "(M)ain Database".

The procedure im working on needs to make sure there are no duplicates in each of the two databases... Meaning its okay to have one PC version of Adobe Acrobat 6 in the MainDB and one PC version of Adobe Acrobat 6 in the UserDB, but not two in the same database.

And even though I have checks in place to make sure duplicates are not imported, I need to create a procedure to remove them in the event they get in there still somehow.
Sep 21 '08 #7
mldisibio
190 Expert 100+
Couple of comments:
1. Having an ID indexed does not enhance any query performance (your original issue) if you are searching non-indexed fields. If you have a particularly slow search with four specific fields, then you need an index with just those four fields and make sure each of the four fields is in your WHERE clause (so the optimizer knows which index to use).
2. You have strMU in three tables, yet you say this a defining characteristic of a unique record. I don't understand your structure completely, but I would aim for a structure where strMU is only in one table and is part of a unique index.
3. I am not sure how much control you have over the incoming data. If you are importing some kind of data dumps over which you have no control, and then are responsible for sorting out duplicates, then yes, you need some different data tables and some good duplicate removal procedures.
If, however, you control the incoming data at its origin, then you would be able to eliminate duplicates with properly created keys/indexes, and there would be no need for clean-up. However, I suspect your case is the the first one, am I correct?
4. One suggestion would be to re-vamp your import process. Perhaps set up two tables similar to what you have, but one for M and one for U, with a unique index on ALL the field which constitute a unique record. Then import your data dumps, and handle "duplicate key" exceptions by tossing out those records. Finally, you can merge the M and U table back into a reporting table similar to what you have. Separate good "normalized" tables which you control from "reporting" tables which your end-users (or apps) want to see.
Sep 21 '08 #8
csin
6
Couple of comments:
1. Having an ID indexed does not enhance any query performance (your original issue) if you are searching non-indexed fields. If you have a particularly slow search with four specific fields, then you need an index with just those four fields and make sure each of the four fields is in your WHERE clause (so the optimizer knows which index to use).
2. You have strMU in three tables, yet you say this a defining characteristic of a unique record. I don't understand your structure completely, but I would aim for a structure where strMU is only in one table and is part of a unique index.
3. I am not sure how much control you have over the incoming data. If you are importing some kind of data dumps over which you have no control, and then are responsible for sorting out duplicates, then yes, you need some different data tables and some good duplicate removal procedures.
If, however, you control the incoming data at its origin, then you would be able to eliminate duplicates with properly created keys/indexes, and there would be no need for clean-up. However, I suspect your case is the the first one, am I correct?
4. One suggestion would be to re-vamp your import process. Perhaps set up two tables similar to what you have, but one for M and one for U, with a unique index on ALL the field which constitute a unique record. Then import your data dumps, and handle "duplicate key" exceptions by tossing out those records. Finally, you can merge the M and U table back into a reporting table similar to what you have. Separate good "normalized" tables which you control from "reporting" tables which your end-users (or apps) want to see.
strMU is in 3 of the tables because there are letter recors for each of the two databases seperately, there are programs for each of them seperatly, and there are platforms for each of them seperatly...

A big part of the reason I need to come up with a dup checker in the first place is because the db they gave me to start with has 124000 records and about 25%+ are duplicates already and I need to filter them out.

Any other suggestions on the structure?
Sep 21 '08 #9

Sign in to post your reply or Sign up for a free account.

Similar topics

8
by: Scott J. McCaughrin | last post by:
The following program compiles fine but elicits this message from the linker: "undefined reference to VarArray::funct" and thus fails. It seems to behave as if the static data-member:...
9
by: danny van elsen | last post by:
hello all, I have an application in which I build a list<node>, with potentially thousands of nodes. each node has an "index", and all nodes are ordered by this index. this index reflects a...
11
by: Ignacio X. Domínguez | last post by:
Hi. I'm developing a desktop application that needs to store some data in a local file. Let's say for example that I want to have an address book with names and phone numbers in a file. I would...
60
by: Julie | last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't...
11
by: hoopsho | last post by:
Hi Everyone, I am trying to write a program that does a few things very fast and with efficient use of memory... a) I need to parse a space-delimited file that is really large, upwards fo a...
0
by: Stephen | last post by:
Hi, Suppose I have 2 tables. TableA has about 3 million rows and TableB has about 50,000 rows. Assuming a btree index is used in the time column. I need to query a limited and time ordered...
12
by: Chris Springer | last post by:
I'd like to get some feedback on the issue of storing data out to disk and where to store it. I've never been in a production environment in programming so you'll have to bear with me... My...
6
by: Niyazi | last post by:
Hi all, What is fastest way removing duplicated value from string array using vb.net? Here is what currently I am doing but the the array contains over 16000 items. And it just do it in 10 or...
24
by: ThunderMusic | last post by:
Hi, The subject says it all... I want to use a byte and use it as byte* so I can increment the pointer to iterate through it. What is the fastest way of doing so in C#? Thanks ThunderMusic
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.