Now we're cooking with gas. I think that regex is overkill for this
'problem'. Sure, you can use it if you wish but I think you will be making a
rod for your own back.
Here is a solution that works for your sample data. Create a Windows Forms
project, plonk a button on the form and paste the following into the form:
Private m_source1 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"15 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"18 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"19 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 Dec A1234988 Sample Parts description 1 10.00 20"
Private m_source2 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"15 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"18 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"19 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"
Private m_source3 As String = "Parts Parts ID Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"15/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"18/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"19/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12/12/06 A1234988 Sample Parts description 1 10.00 20"
Private m_source4 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"15/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"18/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"19/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12/dec/06 A1234988 Sample Parts description 1 10.00 20"
Private m_source5 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"12 13 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"12 15 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 18 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 19 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"
Private Sub Button1_Click(B yVal sender As System.Object, ByVal e As
System.EventArg s) Handles Button1.Click
Console.WriteLi ne()
Console.WriteLi ne("Sample 1")
ProcessData(m_s ource1)
Console.WriteLi ne()
Console.WriteLi ne("Sample 2")
ProcessData(m_s ource2)
Console.WriteLi ne()
Console.WriteLi ne("Sample 3")
ProcessData(m_s ource3)
Console.WriteLi ne()
Console.WriteLi ne("Sample 4")
ProcessData(m_s ource4)
Console.WriteLi ne()
Console.WriteLi ne("Sample 5")
ProcessData(m_s ource5)
Console.WriteLi ne()
End Sub
Private Sub ProcessData(ByV al source As String)
' Assumption: Lines of data are seperated by a carriage return/line feed
pair
Dim _lines As String() = source.Split(Ne w String()
{Environment.Ne wLine}, StringSplitOpti ons.RemoveEmpty Entries)
' Determined by eyeballing data: All 'fields' are delimited by a pair of
spaces
Dim _ss As String() = _lines(0).Split (New String() {" "},
StringSplitOpti ons.None)
' Determine which line is the first line of actual data
' If the first line is a heading line then all characters of the first
field will be letters
Dim _lettercount As Integer = 0
For Each _c As Char In _ss(0)
If Char.IsLetter(_ c) Then _lettercount += 1
Next
Dim _firstline As Integer = 0
If _lettercount = _ss(0).Length Then _firstline = 1
'Split the first actual line on the field delimiter
_ss = _lines(_firstli ne).Split(New String() {" "},
StringSplitOpti ons.None)
' Determined by eyeballing data: The date field is always the first
field in the line
' Determine the delimiter to be used for the date format
Dim _delimiter As String = ""
If _ss(0).IndexOf( " ") 0 Then
_delimiter = " "
ElseIf _ss(0).IndexOf( "/") 0 Then
_delimiter = "/"
ElseIf _ss(0).IndexOf( "-") 0 Then
_delimiter = "-"
Else
Console.WriteLi ne("Unable to determine delimiter out of " & _ss(0))
Return
End If
Console.WriteLi ne("Determined delimiter as '" & _delimiter & "'")
' Construct the date format to be used
Dim _format As String = String.Empty
' Split the first field on the date format delimiter
Dim _parts As String() = _ss(0).Split(Ne w String() {_delimiter},
StringSplitOpti ons.None)
If _parts.Length = 2 Then
' If there are 2 parts then we only have day and month components
If Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
Char.IsLetter(_ parts(1).Chars( 0)) Then
' The 1st part starts with a digit and the 2nd part starts with a
letter
' so we can assume that the 1st part is the day and the 2nd part is
the month
_format = New String("d"c, _parts(0).Lengt h) & _delimiter & "MMM"
If _parts(1).Lengt h 3 Then _format &= "M"
ElseIf Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
Char.IsDigit(_p arts(1).Chars(0 )) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and the 2nd
part is the month
_format = New String("d"c, _parts(0).Lengt h) & _delimiter & New
String("M"c, _parts(0).Lengt h))
If Integer.Parse(_ parts(1)) 12 Then
' The 1st part must be the month and the 2nd part must be the day
_format = New String("M"c, _parts(0).Lengt h) & _delimiter & New
String("d"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
_parts(2).Lengt h)
End If
' There is big gotcha here if both parts are < 12 and are different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
February or January 2
End If
ElseIf _parts.Length = 3 Then
' If there 3 parts then we have day, month and year components
' Assume that the year is always th 3rd part
If Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
Char.IsLetter(_ parts(1).Chars( 0)) Then
' The 1st part starts with a digit and the 2nd part starts with a
letter
' so we can assume that the 1st part is the day and the 2nd part is
the month
_format = New String("d"c, _parts(0).Lengt h) & _delimiter & "MMM"
If _parts(1).Lengt h 3 Then _format &= "M"
_format &= _delimiter & New String("y"c, _parts(2).Lengt h)
ElseIf Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
Char.IsDigit(_p arts(1).Chars(0 )) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and the 2nd
part is the month
_format = New String("d"c, _parts(0).Lengt h) & _delimiter & New
String("M"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
_parts(2).Lengt h)
If Integer.Parse(_ parts(1)) 12 Then
' The 1st part must be the month and the 2nd part must be the day
_format = New String("M"c, _parts(0).Lengt h) & _delimiter & New
String("d"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
_parts(2).Lengt h)
End If
' There is big gotcha here if the forst two parts are < 12 and are
different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
February or January 2
End If
End If
If _format.Length = 0 Then
' We were unable to determine the date format from the available
information
Console.WriteLi ne("Unable to determine format from " & _ss(0))
Return
End If
' We were able to determine the date format so we can continue and parse
the dates
Console.WriteLi ne("Determined format as " & _format)
' Start from our actual first line of data
For _i As Integer = _firstline To _lines.Length - 1
_ss = _lines(_i).Spli t(New String() {" "}, StringSplitOpti ons.None)
Dim _date As DateTime = DateTime.ParseE xact(_ss(0), _format, Nothing)
Console.WriteLi ne("Read from input: " & _ss(0) & " - Interpreted date:
" & _date.ToString( "yyyy-MM-dd"))
Next
End Sub
Note, from the results, that if there is no year part then
DateTime.ParseE xact will interpret tahe date being in the current year as
determined from the system date at the time the code is executed.
"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:uc******** ******@TK2MSFTN GP06.phx.gbl...
look like I am not expressing myself clearly. although the application
does
not know which format is used but does know for a given Set which date
format I deals with and can expect the same format for a given Set of
input.
I should not have used the term batch but a set of record. The only
possible variations are some records in certain sets may be split into 2
lines but that is not critical as the conditions can be described before
hand and normalized by the another parse component
sample date
Set1: date format mask is "dd MMM"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 Dec A1234988 Sample Parts description 1 10.00 20
18 Dec A1234988 Sample Parts description 1 10.00 20
19 Dec A1234988 Sample Parts description 1 10.00 20
12 Dec A1234988 Sample Parts description 1 10.00 20
Set 2 date format Mask is "dd MM yy"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 12 06 A1234988 Sample Parts description 1 10.00 20
18 12 06 A1234988 Sample Parts description 1 10.00 20
19 12 06 A1234988 Sample Parts description 1 10.00 20
12 12 06 A1234988 Sample Parts description 1 10.00 20
Set 3 date format mask "dd/MMM/06"
Parts Description location Quantitiy Unit Cost Total Cost
11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/12/06 A1234988 Sample Parts description 1 10.00
2018/12/06 A1234988 Sample Parts description 1 10.00
2019/12/06 A1234988 Sample Parts description 1 10.00
2012/12/06 A1234988 Sample Parts description 1 10.00 20
Set 4 date format mask ""
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/dec/06 A1234988 Sample Parts description 1 10.00 20
18/dec/06 A1234988 Sample Parts description 1 10.00 20
19/dec/06 A1234988 Sample Parts description 1 10.00 20
12/dec/06 A1234988 Sample Parts description 1 10.00 20
how do I deal with format without year, I do have cluse for other parts of
teh originatin website and optional default set by user
the sample data show variation of date format from set to set but the date
format that I need to deal within a given set are consistant and user has
influence to date format mask used.
Like Cor suggestion. don't let user enter the format but let the user pick
from a list. that will like be case at least n the version 0
"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:eF******** ******@TK2MSFTN GP03.phx.gbl...
>You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.
Although the application does not exactly know before hand what format
the
data may come in, however part of the application allow user to define
and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format
For a given batch, at least for the body, date format are uniform
furthermore, the need to make the extract process generic and adaptable
to
the front end that takes the user definitions, I believe it would be
easier
>to "normalize" date string to "yyyy-mm-dd".
Also the end target for of may not necessarily be SQL database but may be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the
appropriate
>format and identifier I can use regex,replace to normalize the date. As
a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.
I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need
who knows, maybe I taken on a totally offbeat tract
"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:%2******* *********@TK2MS FTNGP04.phx.gbl ...
thanks for all pitched in so far.
let give it another shot.
looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in
part
>2:
replace the regex string the yyyy to regex year expression with
year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier
2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?
"Cor Ligthert [MVP]" <no************ @planet.nlwrote in message
news:%2******** ********@TK2MSF TNGP06.phx.gbl. ..
GS,
Maybe can you avoid this in 2007 and all things like that as
DateTime.parseE xact, but have a look to the nicely by Microsoft
inbuild
globalization and than the to that related ToString option.
Cor
"gs" <gs@dontMail.te lusschreef in bericht
news:Ot******** ******@TK2MSFTN GP03.phx.gbl...
let say I have to deal with various date format and I am give
format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order