By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,221 Members | 1,106 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,221 IT Pros & Developers. It's quick & easy.

Preventing MDB bloat when adding to linked table

Expert 5K+
P: 8,434
Hi all.

I won't go into too much detail at this point, as I have a fairly complex (and probably quite ugly) setup. the relevant details are that I have a front-end MDB, and I'm using ADO to add records into a linked table.

The front-end MDB keeps blowing out to a ridiculous size, so I have to keep compacting it. It compacts back to a couple of hundred KB, so obviously this is all temporary stuff used "internally" and discarded.

I'm not looking for detailed help at this point, would just like to get some pointers on what sort of things to look out for, or different ideas to try, to see whether I can alleviate the situation. I have been wondering what the "batch" related ADO methods and parameters are all about. Should I be looking into them further?

One thing I've considered (but haven't yet tried) is working directly with the table in the "back end" MDB, instead of always opening it through the front-end. I had hoped to avoid this sort of thing, as the front-end MDB is a nice, tidy place to pull everything together. The back-end tables may reside all over the place.

As for my code, I'm just doing an AddNew, setting field values, then an Update.

Normally I wouldn't be especially concerned about this, as I don't mind compacting after I load the data. But I have 4-5 years worth of data to load, and the front-end database approaches the 2GB limit after about one month-worth. As you can imagine, this makes it a rather labour-intensive project. I want to be able to "set and forget" on a Friday, and find it all loaded on Monday.
Aug 16 '07 #1
Share this Question
Share on Google+
47 Replies


MMcCarthy
Expert Mod 10K+
P: 14,534
Instead of AddNew try using an INSERT statement

DoCmd.RunSQL "INSERT INTO .....etc"
Aug 20 '07 #2

Expert 5K+
P: 8,434
Instead of AddNew try using an INSERT statement

DoCmd.RunSQL "INSERT INTO .....etc"
Without intending to sound rude... why?


Incidentally, it's interesting to see this thing running - certainly the first VB application I've seen using around 1.5GB of RAM. :)
I do plan to do something about the memory usage - it's not a big deal - pretty much just lazy programming.
Aug 20 '07 #3

MMcCarthy
Expert Mod 10K+
P: 14,534
Without intending to sound rude... why?


Incidentally, it's interesting to see this thing running - certainly the first VB application I've seen using around 1.5GB of RAM. :)
I do plan to do something about the memory usage - it's not a big deal - pretty much just lazy programming.
The INSERT INTO method is more efficient because you don't need to open a recordset at all.
Aug 20 '07 #4

MMcCarthy
Expert Mod 10K+
P: 14,534
The INSERT INTO method is more efficient because you don't need to open a recordset at all.
Just out of curiousity. Are you closing your recordset and setting it to Nothing.

rst.Close
Set rst = Nothing

This clears the memory when using a recordset
Aug 20 '07 #5

NeoPa
Expert Mod 15k+
P: 31,660
Registering interest.
Keep us appraised of how you're getting on Killer. If you need more involvement just ask.

I did find that early versions of A2K suffered horribly from data bloat. Office SPs fixed that (didn't make it good - just not AS bad). MDAC updates are also worth applying and are easily found on the web. Office should be at least SR1 SP3, which is the latest for 2K (I think). MDAC is v2.8 SP1 I think.
Aug 20 '07 #6

Expert 5K+
P: 8,434
Registering interest.
Keep us appraised of how you're getting on Killer. If you need more involvement just ask.

I did find that early versions of A2K suffered horribly from data bloat. Office SPs fixed that (didn't make it good - just not AS bad). MDAC updates are also worth applying and are easily found on the web. Office should be at least SR1 SP3, which is the latest for 2K (I think). MDAC is v2.8 SP1 I think.
I'll address the last few posts in one go, here.
  • Mary...
    • Thanks for the tip, I'll see what I can do with INSERT.
    • DoCmd may be a problem - I'm working in VB6, not VBA. However, I imagine I can use the ADO Execute?
    • Does "more efficient" mean less memory, less time or both? Time is a bit of an issue in this case, though not end-of-the-world critical.
    • I am closing and "Nothing-ing" (releasing?) everything I open, but not until the end of the process. One thing I'm planning to do is only open them as required, rather than opening everything up front. Like I said, lazy programming. :)
    • I did find a few days ago that I had forgotten to close/release one recordset I was just reading at startup. Doing so reclaimed at least a couple of hundred MB of memory.
  • NeoPa...
    • I'm using Access 2003 SP1. I have used Access 2000 at home from time to time, to play with it, but not lately.
    • I don't think I'm allowed to apply updates to the software here.
    • And yes, I'll let you know how I get on with this. (You may need to remind me from time to time - once the thread scrolls off, well, you know...)
Aug 20 '07 #7

MMcCarthy
Expert Mod 10K+
P: 14,534
DoCmd may be a problem - I'm working in VB6, not VBA. However, I imagine I can use the ADO Execute?
Yes that will be fine.

Does "more efficient" mean less memory, less time or both? Time is a bit of an issue in this case, though not end-of-the-world critical.
Both

I am closing and "Nothing-ing" (releasing?) everything I open, but not until the end of the process. One thing I'm planning to do is only open them as required, rather than opening everything up front. Like I said, lazy programming. :)
This should help. Closing and "Nothing-ing" recordsets once they are no longer required is more efficient.
Aug 20 '07 #8

Expert 5K+
P: 8,434
I've modified my code to just skip opening a huge series of recordsets at the start. Since I'm only inserting data into them, and now executing an SQL "INSERT" command against the connection, I commented them out.

Haven't noticed any improvement in speed (well, maybe just a smidgeon...) but the memory usage is down by about 1.2GB. :)
Aug 21 '07 #9

MMcCarthy
Expert Mod 10K+
P: 14,534
I've modified my code to just skip opening a huge series of recordsets at the start. Since I'm only inserting data into them, and now executing an SQL "INSERT" command against the connection, I commented them out.

Haven't noticed any improvement in speed (well, maybe just a smidgeon...) but the memory usage is down by about 1.2GB. :)
Thats great. Glad it worked for you.
Aug 21 '07 #10

Expert 5K+
P: 8,434
Thats great. Glad it worked for you.
Further news.

It looks as though the memory improvement is "all she wrote". The speed seems about the same, possibly ever so slightly improved. And the bloat is still happening.

What I'm thinking of now is creating a separate connection for each linked table, connecting to the "real" MDB that it lives in. That way, any extra temp data left lying around like this can be dealt with by compressing them all afterward (if I bother at all).

Alternatively, now that I don't have dozens of recordsets open, perhaps I can periodically do a "compress and repair" on the front-end DB. It was in the too-hard basket before, due to the aforementioned dozens of recordsets. At this point, I think I have one or two open. I can easily set up a routine which will close them, do the compress (hm... must track down that old post from NeoPa...) and reopen them.

This isn't really urgent, as it's sort of a side-project. But I do need to do something about the bloat. When I do a complete load like this, which may happen from time to time, I have to load something like 1,700 files (and counting). The night before last, I tried to load about 30, and hit the 2GB wall. I loaded 20 of them last night, and reached 1.5GB. So unless I prevent or overcome the bloat, it will remain a labour-intensive task. Load a few, compress, load a few more, compress... the ideal is to place all the data there, start the program then come back in a week when it's all loaded.

(Of course, the next task will be to address the speed. But that's another story.)
Aug 21 '07 #11

Expert 5K+
P: 8,434
Just to keep everyone up to date...

To recap, I was previously opening all the tables as a collection of ADO recordsets at startup, but reclaimed a huge amount of memory by skipping that and just using an Execute against the Connection object to INSERT records to the various tables, which are all linked to the main MDB.

Well, now I'm creating a collection of ADO Connection objects at startup, each connecting to the "back-end" MDB which holds the table. (For those who don't already know, I didn't have a choice about splitting it up - altogether the tables come to around 7ĹGB, and increasing).

Anyway, where opening all the tables at startup took a fair bit of time as well as a lot of memory, creating the connections at startup is very quick and doesn't seem to use much memory.

So now, where I'm storing my info, I'm both selecting a connection and inserting the appropriate table name into the SQL - all of which is pretty simple. Time will tell, whether I still have a "bloat" problem. Even if it is now distributed among the various back-end databases, it may still be a problem. Hm... actually, it may be more of a problem. To compact the front-end MDB is very quick, as it only contains a tiny amount of "real" data. The back-end ones, on the other hand, do contain millions of records and will take much longer to compact.

I'll have to think it over. But in the meantime, I'll be running some tests to see how it goes. So far it looks good, but I've only loaded a fairly small amount.
Aug 24 '07 #12

NeoPa
Expert Mod 15k+
P: 31,660
Sounds like you've made some real progress there Killer. Nice work.
Aug 24 '07 #13

Expert 5K+
P: 8,434
Sounds like you've made some real progress there Killer. Nice work.
Thanks.

The bloating problem did seem to be significantly reduced. I've left it running for the weekend, and form the small sample I saw on Friday, I think it may actually cope on its own now. Time will tell.
Aug 25 '07 #14

Expert 5K+
P: 8,434
Yeeha!

As mentioned, I left it running over the weekend. Not sure how much detail I've provided but I'm loading info from daily log files. The database is split into quarters (no, not four parts - I mean 3 months per Mdb). I loaded six months data over the weekend. While I'm still not overjoyed by the speed, it was at least able to complete successfully. Each three-month mdb ended up around 650MB (when I then compacted them , they came down to about 460MB). this hasn't just distributed the "bloat", it has definitely reduced it. Previously, I couldn't load a single month of data without blowing the 2GB limit.

So, while I still need to address the loading speed (40 hours for 169 days' worth of data; average a little over 14 minutes per day) at least it works.

Um... could someone give me a really brief run-down on what the deal is with "batch" processing in ADO? And can I use it with Access, or is it SQL Server only, or what? I'm hoping it might be a tool I can use to improve the loading speed.
Aug 27 '07 #15

MMcCarthy
Expert Mod 10K+
P: 14,534
Um... could someone give me a really brief run-down on what the deal is with "batch" processing in ADO? And can I use it with Access, or is it SQL Server only, or what? I'm hoping it might be a tool I can use to improve the loading speed.
Have a look at these two articles by ADezii

DAO Transaction Processing - What is it?
ADO Transaction Processing

I think he's away this week but he's the man you need to talk to about this stuff.

Mary
Aug 27 '07 #16

Expert 5K+
P: 8,434
Have a look at these two articles by ADezii ... I think he's away this week but he's the man you need to talk to ...
Thanks Mary, I'll check it out.
Aug 27 '07 #17

Expert 5K+
P: 8,434
Thanks Mary, I'll check it out.
Finally found a minute to have a look. It turned out not to be what I was looking for.

I'm already using BeginTrans and CommitTrans to batch a bunch of updates into a transaction. This is not what I was curious about. I have seen occasional references to "batch" methods in the DAO or ADO documentation, but can't remember exactly what. Guess I'll have to do a bit of searching and get back to you.
Aug 28 '07 #18

MMcCarthy
Expert Mod 10K+
P: 14,534
Finally found a minute to have a look. It turned out not to be what I was looking for.

I'm already using BeginTrans and CommitTrans to batch a bunch of updates into a transaction. This is not what I was curious about. I have seen occasional references to "batch" methods in the DAO or ADO documentation, but can't remember exactly what. Guess I'll have to do a bit of searching and get back to you.
I'll ask ADezii to take a look.
Aug 28 '07 #19

Expert 5K+
P: 8,434
I'll ask ADezii to take a look.
I managed to track down UpdateBatch and CancelBatch methods, and from their doco I'm starting to get some idea of how it works. I doubt it's going to help me, as I suspect it will not apply to Access.
Aug 28 '07 #20

ADezii
Expert 5K+
P: 8,669
I managed to track down UpdateBatch and CancelBatch methods, and from their doco I'm starting to get some idea of how it works. I doubt it's going to help me, as I suspect it will not apply to Access.
Hello Killer:
I put some code together for you that will illustrate Batch Updates for Linked Tables within the context of ADO. It may be worth it to have a look at the code and make your own substitutions. I ran the code for 50,000 and 100,000 Records, the process times were printed to the Debug Window, and will be listed below the code segment. I placed the calculation within the Addition Loop to actually slow the process down, and roughly simulate the addition of some other Fields, since I do not know the specifics. The code itself is self-documenting, and in my estimation is efficient. The Recordset is closed and the Object Variable is set to Nothing within the Procedure to ensure the closing of the Recordset and the releasing of the memory assigned to the Object Variable. If you don't mind, let me know how you make out, since I am interested myself in the outcome. See you around.
Expand|Select|Wrap|Line Numbers
  1. Private Sub Command6_Click()
  2. Dim rst As ADODB.Recordset, lngRecNo As Long
  3. Dim sngStartTime, sngEndTime
  4.  
  5. sngStartTime = Timer
  6.  
  7. Set rst = New ADODB.Recordset
  8.  
  9. DoCmd.Hourglass True
  10.  
  11. With rst
  12.   Set .ActiveConnection = CurrentProject.Connection
  13.       .Source = "tblTestBatchAdditions"
  14.       .CursorLocation = adUseClient     'required for Batch Updates
  15.       .LockType = adLockBatchOptimistic 'required for Batch Updates
  16.       .CursorType = adOpenKeyset
  17.         .Open
  18.        For lngRecNo = 1 To 100000
  19.          .AddNew
  20.            rst![Counter] = lngRecNo
  21.            rst![Calculation] = lngRecNo / 2
  22.         .Update     'Additions are cached locally and the New Record is
  23.                     'added to the Current Recordset. Changes are not posted to
  24.                     'the the underlying Database until UpdateBatch is invoked.
  25.        Next
  26.       'Commit Additions, Now!
  27.       .UpdateBatch  'Changes now written to the Database
  28. End With
  29.  
  30. DoCmd.Hourglass False
  31.  
  32. Debug.Print "Elapsed Time: " & Format$((Timer - sngStartTime) / 60, "#,#,#.00") & " minutes for " & CStr(lngRecNo) & " Record Additions"
  33.  
  34. rst.Close
  35. Set rst = Nothing
  36. End Sub
OUTPUT:
Expand|Select|Wrap|Line Numbers
  1. Elapsed Time: 10.19 minutes for 50000 Record Additions
  2. Elapsed Time: 20.51 minutes for 100000 Record Additions
Aug 28 '07 #21

MMcCarthy
Expert Mod 10K+
P: 14,534
I would substitute the INSERT statement on an Execute for the AddNew as previous.

Mary
Aug 28 '07 #22

ADezii
Expert 5K+
P: 8,669
I would substitute the INSERT statement on an Execute for the AddNew as previous.

Mary
I don't think you can use the UpdateBatch Method in conjunction with the INSERT INTO SQL Statement.
Aug 28 '07 #23

MMcCarthy
Expert Mod 10K+
P: 14,534
I don't think you can use the UpdateBatch Method in conjunction with the INSERT INTO SQL Statement.
OK, I'm sure killer will figure it out. Let us know how you get on killer.
Aug 28 '07 #24

ADezii
Expert 5K+
P: 8,669
OK, I'm sure killer will figure it out. Let us know how you get on killer.
If feasible, it may be a very good idea to migrate the Back End Data to SQL Server, and access it from an Access Project interface. In this scenario, I think several problems would be eliminated, one for sure would be the Database Size restriction. Just a thought.
Aug 28 '07 #25

Expert 5K+
P: 8,434
OK, I'm sure killer will figure it out. Let us know how you get on killer.
Thanks Mary, I will.

If feasible, it may be a very good idea to migrate the Back End Data to SQL Server, and access it from an Access Project interface. In this scenario, I think several problems would be eliminated, one for sure would be the Database Size restriction. Just a thought.
Not feasible, unfortunately. Question, though - is the size unlimited under SQL Server, or is it simply a higher limit?

I'm convinced I will be able to improve the speed, simply because it used to be faster.

You see, I was using a VB6 program to extract the info form the big log files into a tab-delimited text file. Then I was running a VBA routine in Access to import it (as you might guess, it involved quite a bit more than just an import). Now I've developed a VB6 program to handle the entire thing, reading the log files, extracting the info and plugging it into the database.

While I don't have any figures, my feeling is that the old VB and VBA processing added together probably took less than half as long as the new combined process. I don't think there's much I can do on the extract side, and it's the same code as before, anyway. So the import to the database must be the bottleneck. Not too surprising, really. Previously I was telling Access to import a text file, then running a couple of queries to copy information around. Now I'm throwing individual records at Access.

So as I see it, the main points of attack will be...
  • Try to tweak the performance of the record additions (possibly involving BatchUpdate and so on), and/or
  • Consider the possibility of extracting to a temp file and using the text-import function again. Um... this is assuming I can trigger a text import from my VB6 code?

Oh! Forgot to mention, I haven't had a chance to read your earlier message (the timing results) in detail. Will do so later today.
Aug 29 '07 #26

Expert 5K+
P: 8,434
News flash!

It wasn't adding the new records which was taking the time. It was a Find. Who'd have thunk it.

I have a 20 character "description" field which I long ago pulled out to a lookup table. It contains a lot of duplicates, so this saves me quite a bit of space in the main data table(s).

Well, as a result, before adding each record I do...
Expand|Select|Wrap|Line Numbers
  1. With tbl_Desc
  2.   .Find "Description = " & TempDesc, , adSearchForward, One
  3.   If .AbsolutePosition < Zero Then
  4.     .AddNew "Description", TheJob.Desc
  5.     .Update
  6.   End If
  7.   TheJob.DescID = .Fields("ID").Value
  8. End With
The idea, of course, is to end up with the ID in TheJob.DescID, regardless of whether I found it already there or had to store it.

Well, after commenting out the adding of the new records (the "real" data records, not these lookup ones) I found it was still just about as slow. So I commented out the above Find, and suddenly it was flying.

So, it looks as though I know where to focus my efforts now. Hm... probably I'll load the lookup table into an array (um... collection, maybe) at startup. That should allow me to do the check much quicker, and whenever I come across a new entry, I'll add it to both the array/collection and the table.
Aug 29 '07 #27

Expert 5K+
P: 8,434
Quick update - I had trouble getting the collection to work. I always have trouble with those damn things.

I'll either thrash out what's going on with the collection (Error 5, I forget the description) or switch to using an array. Was hoping to avoid that, since it means I'll need to code a search.

I'm at home now, so don't have the stuff here to reproduce the error.
Aug 29 '07 #28

MMcCarthy
Expert Mod 10K+
P: 14,534
Quick update - I had trouble getting the collection to work. I always have trouble with those damn things.

I'll either thrash out what's going on with the collection (Error 5, I forget the description) or switch to using an array. Was hoping to avoid that, since it means I'll need to code a search.

I'm at home now, so don't have the stuff here to reproduce the error.
Throw up the code you are using for the collection when you are ready and we'll have a look.
Aug 29 '07 #29

Expert 5K+
P: 8,434
Throw up the code you are using for the collection when you are ready and we'll have a look.
Can do, when I get back to work. (At home now).

Off the top of my head though, I just did:
Expand|Select|Wrap|Line Numbers
  1. Public Description As New Collection
  2.  
  3. ' And to load the descriptions with ID from ADO recordset RS, I think it was...
  4. Descriptions.Add CStr(RS.Fields("ID")), CStr(RS.Fields("Desc"))
So far so good, I ended up with a few hundred thousand items in there. But when I try to access the entries by key (description) I get error 5. Hang on, I'll check... Ok, that's "Invalid procedure call or argument". I'm almost certain the entry I requested was in the collection, but haven't had a chance to double-check as yet. I will in the morning.

I'm afraid I always seem to have trouble when it comes to collections. I usually end up just using an array, but with hundreds of thousands of entries I don't want to have to search an array.
Aug 29 '07 #30

ADezii
Expert 5K+
P: 8,669
News flash!

It wasn't adding the new records which was taking the time. It was a Find. Who'd have thunk it.

I have a 20 character "description" field which I long ago pulled out to a lookup table. It contains a lot of duplicates, so this saves me quite a bit of space in the main data table(s).

Well, as a result, before adding each record I do...
Expand|Select|Wrap|Line Numbers
  1. With tbl_Desc
  2.   .Find "Description = " & TempDesc, , adSearchForward, One
  3.   If .AbsolutePosition < Zero Then
  4.     .AddNew "Description", TheJob.Desc
  5.     .Update
  6.   End If
  7.   TheJob.DescID = .Fields("ID").Value
  8. End With
The idea, of course, is to end up with the ID in TheJob.DescID, regardless of whether I found it already there or had to store it.

Well, after commenting out the adding of the new records (the "real" data records, not these lookup ones) I found it was still just about as slow. So I commented out the above Find, and suddenly it was flying.

So, it looks as though I know where to focus my efforts now. Hm... probably I'll load the lookup table into an array (um... collection, maybe) at startup. That should allow me to do the check much quicker, and whenever I come across a new entry, I'll add it to both the array/collection and the table.
Why would you be using the 'Find' Method? The fastest possible search mechanism by far is the 'Seek' Method which is used in conjunction with an Indexed Field on Table Type Recordsets. It would be a good idea to use the Supports Method to make sure the Recordset supports this kind of functionality. The syntax would be:
Expand|Select|Wrap|Line Numbers
  1. If rst.Supports(adSeek) and rst.Supports(adIndex) Then
  2.   'OK to continue
  3. End If
I would be willing to bet that the code will execute significantly faster by using the Seek Method. Give it a try.
Aug 29 '07 #31

Expert 5K+
P: 8,434
... I would be willing to bet that the code will execute significantly faster by using the Seek Method. Give it a try.
Seek was what I tried first. It kept telling me Access didn't support it, which came as something of a surprise.

When I have a few minutes, I'll try playing with it again. I may have had the wrong recordset type or something.

I kind of favour the collection though, if I can get it to work. The less database interaction I can manage along the way, the happier I'll be.
Aug 29 '07 #32

Expert 5K+
P: 8,434
Got the collection fixed (just silly coding errors, as expected). Now running at a much more acceptable speed.

Running overnight, the load seemed to be taking about 15-20 minutes per daily log. Now it's looking more like under three minutes. Let's see, that comes to less than 3Ĺ days to extract the entire 4Ĺ years worth of data from source files and load it (compared to at least a couple of weeks). I think I can live with that. Memory useage is still OK, too. The only hard part now is finding the disk space to hold the source files to do the load all in one go.

Note, the full load should happen quite rarely. I normally load new data daily.

Think I'll quit fiddling with this one for now.
Aug 29 '07 #33

ADezii
Expert 5K+
P: 8,669
Seek was what I tried first. It kept telling me Access didn't support it, which came as something of a surprise.

When I have a few minutes, I'll try playing with it again. I may have had the wrong recordset type or something.

I kind of favour the collection though, if I can get it to work. The less database interaction I can manage along the way, the happier I'll be.
Sorry Killer, but I still think you will never get any faster than the Seek Method. Here is sample code which is fully functional. You must, however:
  1. Use a Server Side Cursor.
  2. The Field to search on must be Indexed.
  3. The Recordset must be opened with the adCmdTableDirect Argument to the Open Method.
Expand|Select|Wrap|Line Numbers
  1. Dim rst As ADODB.Recordset
  2.  
  3. Set rst = New ADODB.Recordset
  4.  
  5. With rst
  6.   Set .ActiveConnection = CurrentProject.Connection
  7.       .CursorLocation = adUseServer
  8.       .LockType = adLockOptimistic
  9.       .CursorType = adOpenKeyset
  10.         .Open "tblEmployee", Options:=adCmdTableDirect
  11.         .Index = "PrimaryKey"
  12.  
  13.         .Seek 999999, adSeekAfterEQ
  14.  
  15.         If .EOF Then
  16.           MsgBox "Unable to find Match!"
  17.         Else
  18.           MsgBox "Match found for " & ![FirstName] & " " & ![LastName]
  19.         End If
  20. End With
  21.  
  22. rst.Close
  23. Set rst = Nothing
Aug 30 '07 #34

Expert 5K+
P: 8,434
Thanks for that.

However, I do find it hard to believe it's faster to do a database lookup than to retrieve a member of a collection from memory. Even allowing for caching.

Incidentally, does Access support using server-side cursor? It's all I have here to work with.

In any case, I'm happy enough with the performance now, using the collection. I can reload my entire four years' worth of data in something like 3 days, and the daily update looks as though it'll take 2-3 minutes. That's less than it took before, running the separate extract and import processes. So I've more than made up for the performance I originally lost when I combined them.
Aug 30 '07 #35

ADezii
Expert 5K+
P: 8,669
Thanks for that.

However, I do find it hard to believe it's faster to do a database lookup than to retrieve a member of a collection from memory. Even allowing for caching.

Incidentally, does Access support using server-side cursor? It's all I have here to work with.

In any case, I'm happy enough with the performance now, using the collection. I can reload my entire four years' worth of data in something like 3 days, and the daily update looks as though it'll take 2-3 minutes. That's less than it took before, running the separate extract and import processes. So I've more than made up for the performance I originally lost when I combined them.
Any chance of losing the double CStrs within the Collection's Add Method, or are they significant? You may get some performance gain without the double conversions.
Aug 30 '07 #36

NeoPa
Expert Mod 15k+
P: 31,660
Thanks for that.

However, I do find it hard to believe it's faster to do a database lookup than to retrieve a member of a collection from memory. Even allowing for caching.

Incidentally, does Access support using server-side cursor? It's all I have here to work with.

In any case, I'm happy enough with the performance now, using the collection. I can reload my entire four years' worth of data in something like 3 days, and the daily update looks as though it'll take 2-3 minutes. That's less than it took before, running the separate extract and import processes. So I've more than made up for the performance I originally lost when I combined them.
Killer,

Congratulations on such a significant improvement. It is possible that an "in memory" structure like a collection can cause a performance hit though on balance it's most likely to be a lot faster. Where it can hit problems is if the physical memory is limited relative to the size of the Collection structure and paging takes place. A database is tailored for searching efficiently through large amounts of data whereas paging randomly (by the OS) could, in fairly extreme cases, cause more disk hits and therefore slower progress. I doubt this is the case for you (particularly as you now report serious improvement), but not impossible bearing in mind the size of your data. In truth I only raised it as a point of interest ;)

ADezii,

I'm glad you got involved as, whether Killer needs all your ideas or not in this instance, I found the Seek() point interesting, and happenned to come across a search needed in my code today. So, this thread could certainly prove useful in future.
Aug 30 '07 #37

MMcCarthy
Expert Mod 10K+
P: 14,534

ADezii,

I'm glad you got involved as, whether Killer needs all your ideas or not in this instance, I found the Seek() point interesting, and happenned to come across a search needed in my code today. So, this thread could certainly prove useful in future.
I smell a tip of the week :)
Aug 30 '07 #38

ADezii
Expert 5K+
P: 8,669
Killer,

Congratulations on such a significant improvement. It is possible that an "in memory" structure like a collection can cause a performance hit though on balance it's most likely to be a lot faster. Where it can hit problems is if the physical memory is limited relative to the size of the Collection structure and paging takes place. A database is tailored for searching efficiently through large amounts of data whereas paging randomly (by the OS) could, in fairly extreme cases, cause more disk hits and therefore slower progress. I doubt this is the case for you (particularly as you now report serious improvement), but not impossible bearing in mind the size of your data. In truth I only raised it as a point of interest ;)

ADezii,

I'm glad you got involved as, whether Killer needs all your ideas or not in this instance, I found the Seek() point interesting, and happenned to come across a search needed in my code today. So, this thread could certainly prove useful in future.
Hopefully, it will be a learning experience for us all!
Aug 30 '07 #39

Expert 5K+
P: 8,434
Congratulations on such a significant improvement. It is possible that an "in memory" structure like a collection ... raised it as a point of interest ;)
Anything relevant is well worth including here, huh. Playing "devil's advocate" not only has the potential to bring out things I've overlooked, but ensures we've covered the topic for others who may hit the thread later.

As it turns out, yes, I was considering the memory and so on. But so far I have something like 300,000 to 400,000 items in the collection. I didn't bother trying to calculate size because I don't know how collections are stored. And while the data is a Long value (yes, I did drop both the CStr functions, thanks) the key is a variable-length string. In any case, I've kept an eye on the RAM usage and it's looking perfectly acceptable.

[Thinking out loud mode]Even allowing for future expansion, I don't think it'll ever take up more than a few MB. If we assume an average key length of 10 characters (max is 20) and 8 bytes for the Long value, a million items comes out (probably oversimplified) to around 18MB. With 2GB of RAM, who cares? :)[/Thinking out loud mode]

ADezii,
I'm glad you got involved as, whether Killer needs all your ideas or not in this instance, I found the Seek() point interesting, and happenned to come across a search needed in my code today. So, this thread could certainly prove useful in future.
I completely agree. Itís good to tease out all these tidbits of info that might otherwise go unnoticed. For instance, while I did actually think Seek would be faster than Find (in fact, I think the doco says so), it might never have occurred to me to use Execute to do my additions instead of opening up recordsets. Not only did that save me something like 1.2GB in the short term, but before long I would have gone way past the available RAM.

What Iím finding now is that my new record additions are humming along (looks like about 250-300 records per second) the update of some statistics at the end is entirely another story. It has to update a few hundred thousand records. This is part of the process I just threw together early in the development and havenít had a chance yet to address the performance. Iím hoping there might be some way (possibly through using Execute?) to speed this up a bit. Iíll have a play with it myself and let you know what happens.
Aug 30 '07 #40

Expert 5K+
P: 8,434
Argh!

The update of the statistics is even worse than I thought. It has been going now for almost 2 hours. Unfortunately, though I have included a progress indicator, it isn't working properly yet. So I can't tell how long it has to go.
Aug 31 '07 #41

MMcCarthy
Expert Mod 10K+
P: 14,534
Argh!

The update of the statistics is even worse than I thought. It has been going now for almost 2 hours. Unfortunately, though I have included a progress indicator, it isn't working properly yet. So I can't tell how long it has to go.
You can use the execute method on an UPDATE statement just as easily as an INSERT statement.
Aug 31 '07 #42

Expert 5K+
P: 8,434
You can use the execute method on an UPDATE statement just as easily as an INSERT statement.
Thanks. I'd guessed that, but what remains to be seen is whether there is any impact on performance.

There are other options. For instance, updating the stats as I go. It may slow down the processing, but I might just have to live with that. When I come along and tell the program to quit loading and finish up, and it says "sure, just a tick while I update some stats", I don't want to sit and wait for three hours. That's getting even worse than Access in the "almost finished, honest!" stakes.
Aug 31 '07 #43

Expert 5K+
P: 8,434
Phew! Came back from lunch, and the update has finally finished. So let's see, it took a bit under 18 hours to add maybe 16,000,000 records to the database. Then 5 hours to update a few hundred thousand stats records.

That update is definitely worth a look. I'll let you all know how it goes.
Aug 31 '07 #44

NeoPa
Expert Mod 15k+
P: 31,660
Looks like that's the next bottle-neck Killer ;)
Aug 31 '07 #45

Expert 5K+
P: 8,434
Looks like that's the next bottle-neck Killer ;)
Yep. I hope to find time to look into it over lunch today.
Sep 2 '07 #46

Expert 5K+
P: 8,434
A day late perhaps, but I'm now having a go at converting the stats update to work via connection.Execute rather than opening it as a recordset and inserting/editing there.

This will save me quite a few MB of RAM again, too. I probably won't know until tomorrow morning how well it worked (or didn't). Fingers crossed...
Sep 4 '07 #47

Expert 5K+
P: 8,434
Argh.

I think to save time on the bulk load, I'm just going to skip the updating of the stats. The stats are derived from the loaded data, so I can do them in one big burst after it's all loaded. I think that's what I did in the previous version. Can't remember for sure - it was part VB, part VBA, part Access query, part Access macro and part manual intervention).

So, I think I'll put in a switch. For a big bulk load I'll toggle off the updating of the stats. For the daily load of new stuff I'll either have it update as it goes, or do them in a batch at the end as it's doing now. (In fact I'll probably go back to the drawing board and copy the process I used before to throw out and recreate the stats, since I don't remember it ever taking so long.)
Sep 4 '07 #48

Post your reply

Sign in to post your reply or Sign up for a free account.