By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
462,137 Members | 858 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 462,137 IT Pros & Developers. It's quick & easy.

Comparing data

P: 1
Need some tips to help me approach this problem. I would consider myself advanced in Access and VBA, so just looking for conceptual pointers.

In essence, I have a number of text files, each of which contains data that is parsed into various database tables. The data thus stored in the tables contains 4 fields of information. I have this part of the system working 100%.

The 4 fields are: ParseID, lngTime (times are derived in thousands of a second, so I store all times in a long format), strDataSource and strMsg

Some other relevant info:
-Each text file will contain approx 70,000 number of records and takes about 10 seconds to parse. So from 2 datasources, I will get about 140,000 records
-Each record in the text file contains a time stamp (lngTime); however, the clocks between each recordsource are not syncronised. These time differences vary from recordsource to recordsource, but are always less than 30 seconds.

What I need to do:
I am looking for records where the strMsg field from one datasource is exactly the same as from the other datasource, but the difference between the lngTime field is less than 30 seconds. i.e. I want to identify "identical" records from different datasources (taking into account the time lag).

I was originally hoping that I could do a SELECT DISTINCT on the records, but the one major flaw is that the time offset from the different datasources varies every so slightly. For example, it may be 22.789 seconds in one case, and then 23.015 in the next. [Question: is there an opportunity to use rounding to make things faster?]

What I have got to so far is to create an array and loop through it, but as expected this take a bit of time. (note that the following code doesnt delete any records yet, but it does identify where the matching records are in the Array)
Expand|Select|Wrap|Line Numbers
  1. Sub CombineRecordsets()
  3. Dim i As Long
  4. Dim j As Long
  6. Dim cnn As ADODB.Connection
  7. Dim rs1 As ADODB.Recordset
  8. Dim strSql1 As String
  9. Dim lngRecordCount1 As Long
  10. Dim lngParseID As Long
  11. Dim strMsg As String
  12. Dim lngTime1 As Long
  13. Dim strDataSource1 As String
  14. Dim strDataSource2 As String
  15. Dim lngCurrRecord As Long
  17. Dim varArray
  19. 'Recordset to retrieve the combat log
  20.     Set cnn = CurrentProject.Connection
  21.     Set rs1 = New ADODB.Recordset
  23.     strSql1 = "Select ParseID, lngTime, strDataSource, strMsg, lngParseID from tblParsedData ORDER BY lngTime ASC"
  25.     rs1.Open strSql1, cnn, adOpenKeyset, adLockOptimistic
  28.     varArray = rs1.GetRows
  29.     lngRecordCount1 = rs1.RecordCount
  31.     lngRecordCount1 = rs1.RecordCount - 1
  33.     strDataSource1 = "Mike"
  34.     strDataSource2 = "Steve"
  36.     For i = 0 To lngRecordCount1
  37.             If i = 10000 Then
  38.             Stop
  39.             End If
  41.         If Not IsNull(varArray(4, i)) Then 'Looks to see if we have matched it yet
  42.             'Do nothing as its aleady been matched, so move onto next i
  43.         Else
  44.             If varArray(2, i) = strDataSource1 Then
  45.             strMsg = varArray(3, i)
  46.             lngParseID = varArray(0, i)
  47.             lngTime1 = varArray(1, i)
  48.                 For j = lngCurrRecord To lngRecordCount1
  49.                     If (varArray(1, j) - lngTime1) > 30000 Then '>30 seconds, stop looking for a match
  50.                         Exit For
  51.                     Else
  52.                         If IsNull(varArray(4, j)) Then 'Looks to see if we have matched it yet
  53.                             If varArray(2, j) = strDataSource2 Then
  54.                                 If varArray(3, j) = strMsg Then
  55.                                     varArray(4, j) = lngParseID
  56.                                     varArray(4, i) = varArray(0, j)
  57.                                     lngCurrRecord = j
  58.                                     Exit For
  59.                                 End If
  60.                             End If
  61.                         End If
  62.                     End If
  63.                 Next
  64.             End If
  66.         End If
  69.     Next i
  72. End Sub
Sooooo, I would appreciate it if you could let me know if there are any faster methods or if I should be approaching this from a completely different angle.

PS - Sorry for the formatting, but this forum doesnt seem to recognise [tab] spaces.
Apr 24 '07 #1
Share this Question
Share on Google+
2 Replies

Expert Mod 10K+
P: 14,534
Try this ...
Expand|Select|Wrap|Line Numbers
  1. SELECT Table1.ParseID, Table2.ParseID
  2. FROM Table1, Table2
  3. WHERE Table1.strMsg = Table2.strMsg
  4. AND ((Table1.lngTime - Table2.lngTime) < 30000
  5.    OR (Table2.lngTime - Table1.lngTime) < 30000)
Apr 25 '07 #2

Expert Mod 10K+
P: 14,534
BTW, I added code tags to your original post. That's how you get the code to display correctly.
Apr 25 '07 #3

Post your reply

Sign in to post your reply or Sign up for a free account.