By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
426,247 Members | 1,989 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 426,247 IT Pros & Developers. It's quick & easy.

Consolidate & Merge Several Rows of Dates

P: 6
Given the following table:
Expand|Select|Wrap|Line Numbers
  1. EMPL_ID | CO_ID | START     | STOP
  2. --------------------------------------------
  3. 12          U    01/06/1976  07/09/1980
  4. 12          U    03/18/1982  11/18/1988
  5. 12          U    03/23/1988  05/01/1992
  6. 12          U    01/25/1991  02/29/1996
  7. 12          U    03/16/2004  05/05/2004
  8. 12          U    05/06/2004  12/31/9999
  9. 12          M    11/24/1975  07/09/1980
  10. 12          M    03/18/1982  11/18/1988
  11. 12          M    03/23/1988  05/01/1992
  12. 12          M    01/25/1991  02/29/1996
  13. 12          M    03/16/2004  05/05/2004
  14. 12          M    05/06/2004  12/31/9999
  15. 16          U    11/26/1962  07/01/1963
  16. 16          U    07/01/1963  08/29/1963
  17. 16          U    10/17/1963  01/15/1965
  18. 16          U    04/09/1965  06/05/1972
  19.  
Final Version:
Expand|Select|Wrap|Line Numbers
  1. EMPL_ID | CO_ID | START     | STOP
  2. --------------------------------------------
  3. 12          U    01/06/1976  07/09/1980
  4. 12          U    03/18/1982  02/29/1996
  5. 12          U    03/16/2004  12/31/9999
  6. 12          M    11/24/1975  07/09/1980
  7. 12          M    03/18/1982  02/29/1996
  8. 12          M    03/16/2004  05/05/2004
  9. 12          M    05/06/2004  12/31/9999
  10. 16          U    11/26/1962  06/05/1972
  11.  

I am trying to determine total service time by employee and company. If the employee leaves and returns within a 6 month window, then we can act if the employee has never left. (Employee 16, company U above).

Sometimes the stop date for the row will overlap with the start date on the next row, these rows should be merged and the greater stop date should be seen.

I've got this data in an MS Access table and/or a DB2 Z/OS table, but I am not able to find any functions or develop queries that will give me the result that I want.

I'm thinking this will have to be written in a VBA module in my MS Access database that will massage and condense the dates.

Anyone good with loops and date checks?

Thanks!
Feb 18 '09 #1
Share this Question
Share on Google+
11 Replies


Expert 100+
P: 374
@MahaCoder

Hey MahaCoder,

From what you are describing, it sounds like you're trying to normalize data here. There isn't going to be any query that is going to be able to do this out of the box. You are correct in the fact that you are going to have to write VBA and create a new table with the processed data. If you're using the DB2 datasource as your never changing source, you're going to have a great deal of latency issues with the processes, if you're converting on the fly all the time.

If this is a process to correct errors, it would be more than worth it to normalize the data. But as for a working model to continue with the current dataset isn't adviseable without taking a massive performance hit.

If you need help in writing the VBA and SQL statements, please let me know. Also please provide any work that you've already tested and is or isn't work currently.

Thanks,

Joe P.
Feb 19 '09 #2

P: 6
Thanks Joe P for your interest in this project.

The data is actually entered with an additional key, [State], which I have removed and am trying to condense the dates up to to get a nationwide start and stop date by employee, by company. The data is maintained at the State level, so each one of those CO_ID's per EMPL_ID could be for different states. That's why the data does not look normalized, and this is not an attempt to correct data. This would be a monthly execution so that the data can be evaluated at a nationwide level.

Here is the VBA that I have started with, but with 600,000 rows to evaluate this is clearly going to be processing intensive. My idea was to load all start/stop dates by employee/company into a dictionary and then load the dictionary into an array and loop through and evaluate the rows to keep and the new start/stop dates. The corrected array could then be loaded back into the table. This is where I'm missing the logic. I think I'm making it much more difficult then I need to.

I mentioned my different platforms because I know there are functions in DB2 UDB or OLAP that can evaluate the previous and next rows, and was hoping there was something similar that I could do in a query in DB2 Z/OS.

Expand|Select|Wrap|Line Numbers
  1. Public Function CondenseDates()
  2.  
  3.     Dim strSQL As String
  4.     Dim strKey1 As String
  5.     Dim strKey2 As String
  6.     Dim conConnection As New ADODB.Connection
  7.     Dim rstRecordset As New ADODB.Recordset
  8.     Dim dbsDatabase As Database
  9.     Dim dtDateBeg As Date
  10.     Dim dtDateEnd As Date
  11.     Dim dtDate1 As Date
  12.     Dim dtDate2 As Date
  13.     Dim varRetVal As Variant
  14.     Dim dblCounter As Double
  15.     Dim objDict As New Scripting.Dictionary
  16.     Dim arrKeys() As Variant
  17.     Dim arrKeysDelete() As Variant
  18.     Dim intIndex As Integer
  19.     Dim intIndex2 As Integer
  20.     Dim objDictDelete As New Scripting.Dictionary
  21.     Dim blnDropRows As Boolean
  22.  
  23.     strSQL = ""
  24.     strSQL = strSQL & "SELECT * "
  25.     strSQL = strSQL & "FROM @CTG287.TDMR039 "
  26.     strSQL = strSQL & "WITH UR"
  27.  
  28.     'Initialize the progress meter.
  29.     varRetVal = SysCmd(acSysCmdInitMeter, "Reading Data...", GetRecordCount())
  30.  
  31.     conConnection.Open strConnectionString
  32.     rstRecordset.Open strSQL, conConnection, adOpenForwardOnly, adLockReadOnly
  33.  
  34.     strKey1 = rstRecordset(0)
  35.     strKey2 = rstRecordset(1)
  36.     dtDateBeg = rstRecordset(2)
  37.     dtDateEnd = rstRecordset(3)
  38.  
  39.     rstRecordset.MoveNext
  40.     dblCounter = dblCounter + 1
  41.  
  42.     Do While Not rstRecordset.EOF
  43.         If strKey1 = rstRecordset(0) And strKey2 = rstRecordset(1) Then
  44.             objDict.Add dtDateBeg & ":" & dtDateEnd, ""
  45.  
  46.             Do While strKey1 = rstRecordset(0) And strKey2 = rstRecordset(1)
  47.                 objDict.Add rstRecordset(2) & ":" & rstRecordset(3), ""
  48.                 rstRecordset.MoveNext
  49.             Loop
  50.         End If
  51.  
  52.         If objDict.Count = 0 Then
  53.             strSQL = ""
  54.             strSQL = strSQL & "INSERT INTO @CTG287.TDMR038 "
  55.             strSQL = strSQL & "VALUES ('" & strKey1 & "','" & strKey2 & "','" & dtDateBeg & "','" & dtDateEnd & "')"
  56.             conConnection.Execute strSQL
  57.         Else
  58.             arrKeys = objDict.Keys
  59.  
  60.             'Loop through Keys array
  61.             For intIndex = 0 To UBound(arrKeys)
  62.                 dtDate1 = Split(arrKeys(intIndex), ":")(0)
  63.                 dtDate2 = Split(arrKeys(intIndex), ":")(1)
  64.                 For intIndex2 = intIndex + 1 To UBound(arrKeys)
  65.                     If dtDate1 <= Split(arrKeys(intIndex2), ":")(0) And dtDate2 >= Split(arrKeys(intIndex2), ":")(1) Then
  66.                         Call ArrayRemoveItem(arrKeys, arrKeysDelete(intIndex2))
  67.                         Exit For
  68.                     End If
  69.                 Next
  70.             Next
  71.  
  72.             strSQL = ""
  73.             strSQL = strSQL & "INSERT INTO @CTG287.TDMR038 "
  74.             strSQL = strSQL & "VALUES ('" & strKey1 & "','" & strKey2 & "','" & dtDateBeg & "','" & dtDateEnd & "')"
  75.             conConnection.Execute strSQL
  76.  
  77.         End If
  78.  
  79.         objDict.RemoveAll
  80.  
  81.         strKey1 = rstRecordset(0)
  82.         strKey2 = rstRecordset(1)
  83.         dtDateBeg = rstRecordset(2)
  84.         dtDateEnd = rstRecordset(3)
  85.  
  86.         rstRecordset.MoveNext
  87.  
  88.         If rstRecordset.EOF Then
  89.             strSQL = ""
  90.             strSQL = strSQL & "INSERT INTO @CTG287.TDMR039 "
  91.             strSQL = strSQL & "VALUES ('" & strKey1 & "','" & strKey2 & "','" & dtDateBeg & "','" & dtDateEnd & "')"
  92.             conConnection.Execute strSQL
  93.         End If
  94.  
  95.         dblCounter = dblCounter + 1
  96.  
  97.         varRetVal = SysCmd(acSysCmdUpdateMeter, dblCounter)
  98.  
  99.     Loop
  100.  
  101.     rstRecordset.Close
  102.     conConnection.Close
  103.  
  104.     Set rstRecordset = Nothing
  105.     Set conConnection = Nothing
  106.  
  107.     'Remove the progress meter.
  108.     varRetVal = SysCmd(acSysCmdRemoveMeter)
  109.  
  110. End Function
  111.  
  112. Private Function GetRecordCount() As Double
  113.  
  114.     Dim strSQL As String
  115.     Dim rstRecordset As New ADODB.Recordset
  116.     Dim conConnection As New ADODB.Connection
  117.  
  118.     strSQL = ""
  119.     strSQL = strSQL & "SELECT COUNT(*) "
  120.     strSQL = strSQL & "FROM @CTG287.TDMR038 "
  121.     strSQL = strSQL & "WITH UR"
  122.  
  123.     conConnection.Open strConnectionString
  124.     rstRecordset.Open strSQL, conConnection, adOpenForwardOnly, adLockReadOnly
  125.  
  126.     GetRecordCount = rstRecordset(0)
  127.  
  128.     rstRecordset.Close
  129.     conConnection.Close
  130.  
  131. End Function
  132.  
  133. Private Sub ArrayRemoveItem(ItemArray As Variant, ByVal ItemElement As Long)
  134.  
  135. 'PURPOSE:       Remove an item from an array, then
  136. '               resize the array
  137.  
  138. 'PARAMETERS:    ItemArray: Array, passed by reference, with
  139. '               item to be removed.  Array must not be fixed
  140.  
  141. '               ItemElement: Element to Remove
  142.  
  143. 'EXAMPLE:
  144. '           dim iCtr as integer
  145. '           Dim sTest() As String
  146. '           ReDim sTest(2) As String
  147. '           sTest(0) = "Hello"
  148. '           sTest(1) = "World"
  149. '           sTest(2) = "!"
  150. '           ArrayRemoveItem sTest, 1
  151. '           for iCtr = 0 to ubound(sTest)
  152. '               Debug.print sTest(ictr)
  153. '           next
  154. '
  155. '           Prints
  156. '
  157. '           "Hello"
  158. '           "!"
  159. '           To the Debug Window
  160.  
  161.     Dim lCtr As Long
  162.     Dim lTop As Long
  163.     Dim lBottom As Long
  164.  
  165.     If Not IsArray(ItemArray) Then
  166.         Err.Raise 13, , "Type Mismatch"
  167.         Exit Sub
  168.     End If
  169.  
  170.     lTop = UBound(ItemArray)
  171.     lBottom = LBound(ItemArray)
  172.  
  173.     If ItemElement < lBottom Or ItemElement > lTop Then
  174.         Err.Raise 9, , "Subscript out of Range"
  175.         Exit Sub
  176.     End If
  177.  
  178.     For lCtr = ItemElement To lTop - 1
  179.         ItemArray(lCtr) = ItemArray(lCtr + 1)
  180.     Next
  181.  
  182. On Error GoTo ErrorHandler:
  183.  
  184. ReDim Preserve ItemArray(lBottom To lTop - 1)
  185.  
  186. Exit Sub
  187. ErrorHandler:
  188.   'An error will occur if array is fixed
  189.     Err.Raise Err.Number, , _
  190.        "You must pass a resizable array to this function"
  191. End Sub
Feb 19 '09 #3

P: 6
Ok I've been working on the code today and discovered several issues with my current code and believe I have a working version. Below is my updated code. I needed to add my date check for 6 months, and instead of removing elements from the array I just tracked the min and max dates on each run, and once I hit the end of a run I loaded that row into the table and continued onto the next run, resetting my min and max start and stop dates.

However, this takes way too long to run... I'm still set on trying to accomplish this with a series of DB2 statements so that processing is run on the server, or even a DB2 stored procedure might be an idea?
Feb 19 '09 #4

P: 6
Expand|Select|Wrap|Line Numbers
  1.     Do While Not rstRecordset.EOF
  2.         If strKey1 = rstRecordset(0) And strKey2 = rstRecordset(1) Then
  3.             objDict.Add dtDateBeg & ":" & dtDateEnd, ""
  4.  
  5.             Do While (strKey1 = rstRecordset(0)) And (strKey2 = rstRecordset(1))
  6.                 objDict.Add rstRecordset(2) & ":" & rstRecordset(3), ""
  7.                 rstRecordset.MoveNext
  8.                 If rstRecordset.EOF Then
  9.                     Exit Do
  10.                 End If
  11.             Loop
  12.         End If
  13.  
  14.         'If there is only one date range for this key, load it into the table as is
  15.         If objDict.Count = 0 Then
  16.             strSQL = ""
  17.             strSQL = strSQL & "INSERT INTO @CTG287.TDMR039 "
  18.             strSQL = strSQL & "VALUES ('" & strKey1 & "','" & strKey2 & "','" & dtDateBeg & "','" & dtDateEnd & "')"
  19.             conConnection.Execute strSQL
  20.         Else
  21.             arrKeys = objDict.Keys
  22.  
  23.             'initialize date1 and date2 with the lowest record for the key
  24.             dtDate1 = Split(arrKeys(0), ":")(0)
  25.             dtDate2 = Split(arrKeys(0), ":")(1)
  26.  
  27.             'initialize the min and max run using the first record for the key
  28.             dtDateBegMin = dtDate1
  29.             dtDateEndMax = dtDate2
  30.  
  31.             'Loop through Keys array
  32.             For intIndex = 0 To UBound(arrKeys)
  33.  
  34.                 'if the earliest end date (plus six months) is greater than the next start date, then condense
  35.                 If DateAdd("m", 6, dtDateEndMax) >= CDate(Split(arrKeys(intIndex + 1), ":")(0)) Then
  36.                     dtDateBegMin = dtDateBegMin
  37.                     dtDateEndMax = Split(arrKeys(intIndex + 1), ":")(1)
  38.                 'if the run is over, load the run into the table
  39.                 Else
  40.                     strSQL = ""
  41.                     strSQL = strSQL & "INSERT INTO @CTG287.TDMR039 "
  42.                     strSQL = strSQL & "VALUES ('" & strKey1 & "','" & strKey2 & "','" & dtDateBegMin & "','" & dtDateEndMax & "')"
  43.                     conConnection.Execute strSQL
  44.  
  45.                     'Reset the begin min and end max dates with the new run
  46.                     dtDateBegMin = Split(arrKeys(intIndex + 1), ":")(0)
  47.                     dtDateEndMax = Split(arrKeys(intIndex + 1), ":")(1)
  48.  
  49.                     'Decrement the index counter by 1 to start over with the next run
  50.                     intIndex = intIndex - 1
  51.                 End If
  52.  
  53.                 'if the last row has been evaluated, load the last run into the table
  54.                 If intIndex + 1 = UBound(arrKeys) Then
  55.                     strSQL = ""
  56.                     strSQL = strSQL & "INSERT INTO @CTG287.TDMR039 "
  57.                     strSQL = strSQL & "VALUES ('" & strKey1 & "','" & strKey2 & "','" & dtDateBegMin & "','" & dtDateEndMax & "')"
  58.                     conConnection.Execute strSQL
  59.  
  60.                     Exit For
  61.                 End If
  62.  
  63.             Next
  64.  
  65.         End If
  66.  
  67.         objDict.RemoveAll
  68.  
  69.         If Not rstRecordset.EOF Then
  70.             strKey1 = rstRecordset(0)
  71.             strKey2 = rstRecordset(1)
  72.             dtDateBeg = rstRecordset(2)
  73.             dtDateEnd = rstRecordset(3)
  74.  
  75.             rstRecordset.MoveNext
  76.         End If
  77.  
  78.         dblCounter = dblCounter + 1
  79.  
  80.         varRetVal = SysCmd(acSysCmdUpdateMeter, dblCounter)
  81.  
  82.     Loop
Feb 19 '09 #5

NeoPa
Expert Mod 15k+
P: 31,419
I won't jump in now as I expect Joe will come back to you when he comes online and I don't want to confuse matters. You've already gone to some trouble to respond fully, so I'm sure he'll be happy to progress it further.

If you've had no joy by the weekend though, bump the thread and I'll take it up.
Feb 19 '09 #6

NeoPa
Expert Mod 15k+
P: 31,419
I would use VBA Recordeset code (DAO for preference) to go through the table and where any subsequent matching records are found, update the original to reflect the extended finish date and delete the (no longer required) subsequent record.

Does this sound sensible?

Is it within your scope?
Feb 22 '09 #7

P: 6
NeoPa- this would definately fall in the scope. I might try this approach to see if I gain performance improvements. I've never used the Recordset properties of MovePrevious or Delete, so I would be concerned about what kind of problems I would encounter in my loop in regards to MoveNext when the record has been deleted.
Feb 23 '09 #8

NeoPa
Expert Mod 15k+
P: 31,419
I expect you'll need to use the Bookmark facility too. You will need to read-ahead until you know the current record is not matching (or EOF of course) and then (with all the matching information noted) return to the main record (bookmarked) and update it. The intervening (matching) records can be deleted when processed originally, and the MoveNext should skip over them happily.

Having said that, the next record at this point will be one that's already been read on the first pass, so that can be bookmarked too.

Does all this still make sense?
Feb 23 '09 #9

P: 6
Yes NeoPa- that makes sense. Do you think there will be much performance gain? I think I would like to consider writing a stored procedure that executes on the mainframe rather than relying on my CPU to process the records through any type of Recordset.
Feb 24 '09 #10

Expert 100+
P: 1,287
I believe you could improve the speed of this process by taking recordsets for one employee at a time. You could get a recordset of distinct EMPL_IDs, then loop through those and for each open a recordset containing only records for that employee on which to use NeoPa's strategy.
Feb 24 '09 #11

NeoPa
Expert Mod 15k+
P: 31,419
@MahaCoder
I'm not sure what you would compare with what to determine any such gain o great coder.
@MahaCoder
I'm not aware of another language that would allow you to manipulate it this way so easily. SQL is specifically designed for RDBMSs, and this is a process which doesn't fit into that idea well.

@Chip:
I doubt there would be any benefit in that proposal. RDBMSs specifically optimise the provision of the data. By trying to do this yourself manually, I would expect that to be stopping the RDBMS from doing it for you. You'd need to be quite clever to do it better than it can, so best leave it to the expert (It's what it was designed for after all).
Feb 24 '09 #12

Post your reply

Sign in to post your reply or Sign up for a free account.