In essence, I have a number of text files, each of which contains data that is parsed into various database tables. The data thus stored in the tables contains 4 fields of information. I have this part of the system working 100%.
The 4 fields are: ParseID, lngTime (times are derived in thousands of a second, so I store all times in a long format), strDataSource and strMsg
Some other relevant info:
-Each text file will contain approx 70,000 number of records and takes about 10 seconds to parse. So from 2 datasources, I will get about 140,000 records
-Each record in the text file contains a time stamp (lngTime); however, the clocks between each recordsource are not syncronised. These time differences vary from recordsource to recordsource, but are always less than 30 seconds.
What I need to do:
I am looking for records where the strMsg field from one datasource is exactly the same as from the other datasource, but the difference between the lngTime field is less than 30 seconds. i.e. I want to identify "identical" records from different datasources (taking into account the time lag).
I was originally hoping that I could do a SELECT DISTINCT on the records, but the one major flaw is that the time offset from the different datasources varies every so slightly. For example, it may be 22.789 seconds in one case, and then 23.015 in the next. [Question: is there an opportunity to use rounding to make things faster?]
What I have got to so far is to create an array and loop through it, but as expected this take a bit of time. (note that the following code doesnt delete any records yet, but it does identify where the matching records are in the Array)
Expand|Select|Wrap|Line Numbers
- Sub CombineRecordsets()
- Dim i As Long
- Dim j As Long
- Dim cnn As ADODB.Connection
- Dim rs1 As ADODB.Recordset
- Dim strSql1 As String
- Dim lngRecordCount1 As Long
- Dim lngParseID As Long
- Dim strMsg As String
- Dim lngTime1 As Long
- Dim strDataSource1 As String
- Dim strDataSource2 As String
- Dim lngCurrRecord As Long
- Dim varArray
- 'Recordset to retrieve the combat log
- Set cnn = CurrentProject.Connection
- Set rs1 = New ADODB.Recordset
- strSql1 = "Select ParseID, lngTime, strDataSource, strMsg, lngParseID from tblParsedData ORDER BY lngTime ASC"
- rs1.Open strSql1, cnn, adOpenKeyset, adLockOptimistic
- varArray = rs1.GetRows
- lngRecordCount1 = rs1.RecordCount
- lngRecordCount1 = rs1.RecordCount - 1
- strDataSource1 = "Mike"
- strDataSource2 = "Steve"
- For i = 0 To lngRecordCount1
- If i = 10000 Then
- Stop
- End If
- If Not IsNull(varArray(4, i)) Then 'Looks to see if we have matched it yet
- 'Do nothing as its aleady been matched, so move onto next i
- Else
- If varArray(2, i) = strDataSource1 Then
- strMsg = varArray(3, i)
- lngParseID = varArray(0, i)
- lngTime1 = varArray(1, i)
- For j = lngCurrRecord To lngRecordCount1
- If (varArray(1, j) - lngTime1) > 30000 Then '>30 seconds, stop looking for a match
- Exit For
- Else
- If IsNull(varArray(4, j)) Then 'Looks to see if we have matched it yet
- If varArray(2, j) = strDataSource2 Then
- If varArray(3, j) = strMsg Then
- varArray(4, j) = lngParseID
- varArray(4, i) = varArray(0, j)
- lngCurrRecord = j
- Exit For
- End If
- End If
- End If
- End If
- Next
- End If
- End If
- Next i
- End Sub
PS - Sorry for the formatting, but this forum doesnt seem to recognise [tab] spaces.