473,756 Members | 6,970 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

best design for parse

gs
let say I have to deal with various date format and I am give format string
from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming format
string
a) use two array and statically match
b) use regex to find the order
Jan 6 '07
29 2907
I think that you are missing the whole point.

Regular Expressions (Regex) are about pattern matching, not format matching.

It does not matter whether the source data comes from a HTML page, a Windows
Forms TextBox or a disk file. The source data is the source data and that is
all there is to it.

If the source data only contained one instance of a 'date' in dd/MM/yyyy
format then to find it by your methodology, you would need to test for up to
3,719,628 permutations from 01/01/0001 all the way up to 31/12/9999, i.e.,
31 (days) * 12 (months) * 9999 (years). Couple this up with the other 8
'formats' and you can how such a task will quickly become unmanagable.

But ... what you really are looking for is a sequence of 2 digits followed
by a slash followed by 2 digits followed by a slash followed by 4 digits.
That immediately takes care of 2 of your 'formats'. Off the top of my head
the regex for that is "[0-9]{2}/[0-9]{2}/[0-9]{4}".

The next pattern you are looking for is 2 digits followed by a slash
followed by 3 alphas followed by a slash followed by 4 digits.
"[0-9]{2}/{A-Za-z}{3}/{0-9}{4}".

The next pattern you are looking for is 3 alphas followed by a slash
followed by 2 digits followed by a slash followed by 4 digits.
"{A-Za-z}{3}/[0-9]{2}/{0-9}{4}".

The next 4 formats are taken care of by varying the above.
"[0-9]{2}/[0-9]{2}/[0-9]{2}", "[0-9]{2}/{A-Za-z}{3}/{0-9}" and
"{A-Za-z}{3}/[0-9]{2}/{0-9}{2}" respectively.

The last format is simply the pattern "[0-9]{2}/{0-9}{2}".

Now, the real secret is what directly precedes and follows your 'dates'. For
instance, are your 'dates' ALWAYS 'wrapped' in a tag? E.g.,
<td>07/01/2007</td>. It might be that there is always a space character
directly before 'date and another directly after the 'date'. Any such
information will allow you to 'tune' your pattern so that it doesn't pick up
false positives. The pattern [0-9]{2}/{0-9}{2} would pick up the 01/02 out
of 01MyQuite01/02YourQuote02.

All the patterns need to be put together in a regular expression woth or's
so that you can find all the candidate dates in one operation.

"\d{2}/\d{2}/\d{4}|\d{2}/[A-Za-z]{3}/\d{4}|[A-Za-z]{3}/\d{2}/\d{4}|\d{2}/\d{2}/\d{2}|\d{2}/[A-Za-z]{3}/\d{2}|[A-Za-z]{3}/\d{2}/\d{2}|\d{2}/\d{2}"

Please feel free to jump in here if I've got that wrong because I'm by no
means a regex expert.
Once you have your candidate dates (matches) you need to deal with each one
in turn.

As Herfried said earlier you need to use DateTime.ParseE xact.

For that you need an array of strings to hold all your formats.

Dim _formats As String() = new String() {"dd/MM/yyyy", "MM/dd/yyyy",
"dd/MMM/yyyy", "MMM/dd/yyyy", "dd/MM/yy", "MM/dd/yy", "dd/MMM/yy",
"MMM/dd/yy", "dd/MM"}

For each candidate call DateTime.ParseE xact, trapping an exception if it
occurs:

Dim _d As DateTime

Try
_d = DateTime.ParseE xact(_candidate , _formats, Nothing,
DateTimeStyles. None)
' DateTime.ParseE xact succeeded so we can deal with it
...
Catch _ex As FormatException
' Because we know that _candidate is not an empty string and none of the
elements of _formats is an empty string then _candidate does not contain a
date and time that corresponds to any element of _formats
....
End Try

"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:%2******** ********@TK2MSF TNGP04.phx.gbl. ..
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in part
2:
replace the regex string the yyyy to regex year expression with year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************ @planet.nlwrote in message
news:%2******** ********@TK2MSF TNGP06.phx.gbl. ..
>GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parse Exact, but have a look to the nicely by Microsoft inbuild
globalizatio n and than the to that related ToString option.

Cor

"gs" <gs@dontMail.te lusschreef in bericht
news:Ot******* *******@TK2MSFT NGP03.phx.gbl.. .
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order



Jan 7 '07 #11
GS
You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format the
data may come in, however part of the application allow user to define and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable to
the front end that takes the user definitions, I believe it would be easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the appropriate
format and identifier I can use regex,replace to normalize the date. As a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:%2******** ********@TK2MSF TNGP04.phx.gbl. ..
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in part
2:
replace the regex string the yyyy to regex year expression with year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************ @planet.nlwrote in message
news:%2******** ********@TK2MSF TNGP06.phx.gbl. ..
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseE xact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.te lusschreef in bericht
news:Ot******** ******@TK2MSFTN GP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order
>


Jan 8 '07 #12
Again you're missing the point.

I think the best thing you can do is post a relatively small sample of the
text you are attempting to parse.

While you're doing that, execute the following and observe the results. It
demonstrates what I am talking about:

Dim _source As String = "On 07/01/2007 the quick brown fox jumps over the
lazy dog." & Environment.New Line & _
"On 08/01/2007 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"On Jan/09/2007 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"On 10/Jan/2007 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"On 11/01/07 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"On 01/12/07 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"On Jan/13/07 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"On 14/Jan/07 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"On 15/01 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"The part number XYZ/72/84 is now discontinued."

Dim _regex As New
Regex("\d{2}/\d{2}/\d{4}|[A-Za-z]{3}/\d{2}/\d{4}|\d{2}/[A-Za-z]{3}/\d{4}|\d{2}/\d{2}/\d{2}|[A-Za-z]{3}/\d{2}/\d{2}|\d{2}/[A-Za-z]{3}/\d{2}|\d{2}/\d{2}")

Dim _candidates As Integer = 0
Dim _matches As Integer = 0

Dim _match As Match = _regex.Match(_s ource)

While _match.Success
_candidates += 1
Console.WriteLi ne("{0} found at index {1}", _match.Value, _match.Index)
Try
Console.WriteLi ne("Converted value = {0:yyyy-MM-dd}",
DateTime.ParseE xact(_match.Val ue, New String() {"dd/MM/yyyy", "MM/dd/yyyy",
"MMM/dd/yyyy", "dd/MMM/yyyy", "dd/MM/yy", "MM/dd/yy", "dd/MMM/yy",
"MMM/dd/yy", "dd/MM"}, Nothing, DateTimeStyles. None))
_matches += 1
Catch _ex As Exception
Console.WriteLi ne(_ex.Message)
End Try
_match = _match.NextMatc h()
End While

Console.WriteLi ne("{0} candidates found", _candidates)

Console.WriteLi ne("{0} matches found", _matches)
"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:eF******** ******@TK2MSFTN GP03.phx.gbl...
You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format the
data may come in, however part of the application allow user to define and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable to
the front end that takes the user definitions, I believe it would be
easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the
appropriate
format and identifier I can use regex,replace to normalize the date. As a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:%2******** ********@TK2MSF TNGP04.phx.gbl. ..
>thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in part
2:
> replace the regex string the yyyy to regex year expression with year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
>identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************ @planet.nlwrote in message
news:%2******* *********@TK2MS FTNGP06.phx.gbl ...
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseE xact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.te lusschreef in bericht
news:Ot******** ******@TK2MSFTN GP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order



Jan 8 '07 #13
GS,

As long as you don't know the date format, you can probably do nothing.
As soon as you know the dateformat, you can try to use the
DateTime.ParseE xact with the given patern.
(Don't forget to set the mm in Upercase and let it not be done by the user).

Cor

"GS" <gs************ **********@msne ws.Nomail.comsc hreef in bericht
news:eF******** ******@TK2MSFTN GP03.phx.gbl...
You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format the
data may come in, however part of the application allow user to define and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable to
the front end that takes the user definitions, I believe it would be
easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the
appropriate
format and identifier I can use regex,replace to normalize the date. As a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:%2******** ********@TK2MSF TNGP04.phx.gbl. ..
>thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in part
2:
> replace the regex string the yyyy to regex year expression with year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
>identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************ @planet.nlwrote in message
news:%2******* *********@TK2MS FTNGP06.phx.gbl ...
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseE xact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.te lusschreef in bericht
news:Ot******** ******@TK2MSFTN GP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order



Jan 8 '07 #14
GS
look like I am not expressing myself clearly. although the application does
not know which format is used but does know for a given Set which date
format I deals with and can expect the same format for a given Set of input.
I should not have used the term batch but a set of record. The only
possible variations are some records in certain sets may be split into 2
lines but that is not critical as the conditions can be described before
hand and normalized by the another parse component

sample date

Set1: date format mask is "dd MMM"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 Dec A1234988 Sample Parts description 1 10.00 20
18 Dec A1234988 Sample Parts description 1 10.00 20
19 Dec A1234988 Sample Parts description 1 10.00 20
12 Dec A1234988 Sample Parts description 1 10.00 20
Set 2 date format Mask is "dd MM yy"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 12 06 A1234988 Sample Parts description 1 10.00 20
18 12 06 A1234988 Sample Parts description 1 10.00 20
19 12 06 A1234988 Sample Parts description 1 10.00 20
12 12 06 A1234988 Sample Parts description 1 10.00 20

Set 3 date format mask "dd/MMM/06"
Parts Description location Quantitiy Unit Cost Total Cost
11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/12/06 A1234988 Sample Parts description 1 10.00
2018/12/06 A1234988 Sample Parts description 1 10.00
2019/12/06 A1234988 Sample Parts description 1 10.00
2012/12/06 A1234988 Sample Parts description 1 10.00 20

Set 4 date format mask ""
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/dec/06 A1234988 Sample Parts description 1 10.00 20
18/dec/06 A1234988 Sample Parts description 1 10.00 20
19/dec/06 A1234988 Sample Parts description 1 10.00 20
12/dec/06 A1234988 Sample Parts description 1 10.00 20

how do I deal with format without year, I do have cluse for other parts of
teh originatin website and optional default set by user

the sample data show variation of date format from set to set but the date
format that I need to deal within a given set are consistant and user has
influence to date format mask used.

Like Cor suggestion. don't let user enter the format but let the user pick
from a list. that will like be case at least n the version 0
"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:eF******** ******@TK2MSFTN GP03.phx.gbl...
You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format the
data may come in, however part of the application allow user to define and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable to
the front end that takes the user definitions, I believe it would be
easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the
appropriate
format and identifier I can use regex,replace to normalize the date. As a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:%2******** ********@TK2MSF TNGP04.phx.gbl. ..
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in
part
2:
replace the regex string the yyyy to regex year expression with year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************ @planet.nlwrote in message
news:%2******** ********@TK2MSF TNGP06.phx.gbl. ..
GS,
>
Maybe can you avoid this in 2007 and all things like that as
DateTime.parseE xact, but have a look to the nicely by Microsoft
inbuild
globalization and than the to that related ToString option.
>
Cor
>
"gs" <gs@dontMail.te lusschreef in bericht
news:Ot******** ******@TK2MSFTN GP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order

>
>


Jan 8 '07 #15
GS
thank you.

you do have a point but the application I have in mind to get most of easy
to do but boring and repetitive task out user quickly to get their buy in
for the next phrase. The application is not going to be perfect on version 0
but must be flexible to adapt to need change.

Furthermore I choose normalizing date format to yyyy-mm-dd because that is
the standard string date format that is acceptable by almost all standard
windows applications
for the users that I deal with despite locale, despite default display
format.

as a side note right now this application at version zero is not to automate
everything but help users to do their jobs and help us to gain understanding
of what they do. at the same time validate the transform process that will
be used later for automation. version 1 will automate a lot more and may
actually drive some excel, word application process
you could say the version zero is closer to Mickey mouse utility with, if
you wish

"Stephany Young" <noone@localhos twrote in message
news:uk******** ******@TK2MSFTN GP04.phx.gbl...
Again you're missing the point.

I think the best thing you can do is post a relatively small sample of the
text you are attempting to parse.

While you're doing that, execute the following and observe the results. It
demonstrates what I am talking about:

Dim _source As String = "On 07/01/2007 the quick brown fox jumps over
the
lazy dog." & Environment.New Line & _
"On 08/01/2007 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"On Jan/09/2007 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"On 10/Jan/2007 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"On 11/01/07 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"On 01/12/07 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"On Jan/13/07 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"On 14/Jan/07 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"On 15/01 the quick brown fox again jumps over the lazy dog." &
Environment.New Line & _
"The part number XYZ/72/84 is now discontinued."

Dim _regex As New
Regex("\d{2}/\d{2}/\d{4}|[A-Za-z]{3}/\d{2}/\d{4}|\d{2}/[A-Za-z]{3}/\d{4}|\d{
2}/\d{2}/\d{2}|[A-Za-z]{3}/\d{2}/\d{2}|\d{2}/[A-Za-z]{3}/\d{2}|\d{2}/\d{2}")
>
Dim _candidates As Integer = 0
Dim _matches As Integer = 0

Dim _match As Match = _regex.Match(_s ource)

While _match.Success
_candidates += 1
Console.WriteLi ne("{0} found at index {1}", _match.Value,
_match.Index)
Try
Console.WriteLi ne("Converted value = {0:yyyy-MM-dd}",
DateTime.ParseE xact(_match.Val ue, New String() {"dd/MM/yyyy",
"MM/dd/yyyy",
"MMM/dd/yyyy", "dd/MMM/yyyy", "dd/MM/yy", "MM/dd/yy", "dd/MMM/yy",
"MMM/dd/yy", "dd/MM"}, Nothing, DateTimeStyles. None))
_matches += 1
Catch _ex As Exception
Console.WriteLi ne(_ex.Message)
End Try
_match = _match.NextMatc h()
End While

Console.WriteLi ne("{0} candidates found", _candidates)

Console.WriteLi ne("{0} matches found", _matches)
"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:eF******** ******@TK2MSFTN GP03.phx.gbl...
You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format
the
data may come in, however part of the application allow user to define
and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable
to
the front end that takes the user definitions, I believe it would be
easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may
be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the
appropriate
format and identifier I can use regex,replace to normalize the date. As
a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:%2******** ********@TK2MSF TNGP04.phx.gbl. ..
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in
part
2:
replace the regex string the yyyy to regex year expression with
year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************ @planet.nlwrote in message
news:%2******** ********@TK2MSF TNGP06.phx.gbl. ..
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseE xact, but have a look to the nicely by Microsoft
inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.te lusschreef in bericht
news:Ot******** ******@TK2MSFTN GP03.phx.gbl...
let say I have to deal with various date format and I am give
format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order
>





Jan 8 '07 #16
Now we're cooking with gas. I think that regex is overkill for this
'problem'. Sure, you can use it if you wish but I think you will be making a
rod for your own back.

Here is a solution that works for your sample data. Create a Windows Forms
project, plonk a button on the form and paste the following into the form:

Private m_source1 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"15 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"18 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"19 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 Dec A1234988 Sample Parts description 1 10.00 20"

Private m_source2 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"15 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"18 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"19 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"

Private m_source3 As String = "Parts Parts ID Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"15/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"18/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"19/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12/12/06 A1234988 Sample Parts description 1 10.00 20"

Private m_source4 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"15/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"18/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"19/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12/dec/06 A1234988 Sample Parts description 1 10.00 20"

Private m_source5 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"12 13 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"12 15 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 18 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 19 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"

Private Sub Button1_Click(B yVal sender As System.Object, ByVal e As
System.EventArg s) Handles Button1.Click

Console.WriteLi ne()

Console.WriteLi ne("Sample 1")

ProcessData(m_s ource1)

Console.WriteLi ne()

Console.WriteLi ne("Sample 2")

ProcessData(m_s ource2)

Console.WriteLi ne()

Console.WriteLi ne("Sample 3")

ProcessData(m_s ource3)

Console.WriteLi ne()

Console.WriteLi ne("Sample 4")

ProcessData(m_s ource4)

Console.WriteLi ne()

Console.WriteLi ne("Sample 5")

ProcessData(m_s ource5)

Console.WriteLi ne()

End Sub

Private Sub ProcessData(ByV al source As String)

' Assumption: Lines of data are seperated by a carriage return/line feed
pair
Dim _lines As String() = source.Split(Ne w String()
{Environment.Ne wLine}, StringSplitOpti ons.RemoveEmpty Entries)

' Determined by eyeballing data: All 'fields' are delimited by a pair of
spaces
Dim _ss As String() = _lines(0).Split (New String() {" "},
StringSplitOpti ons.None)

' Determine which line is the first line of actual data
' If the first line is a heading line then all characters of the first
field will be letters
Dim _lettercount As Integer = 0
For Each _c As Char In _ss(0)
If Char.IsLetter(_ c) Then _lettercount += 1
Next
Dim _firstline As Integer = 0
If _lettercount = _ss(0).Length Then _firstline = 1

'Split the first actual line on the field delimiter
_ss = _lines(_firstli ne).Split(New String() {" "},
StringSplitOpti ons.None)

' Determined by eyeballing data: The date field is always the first
field in the line

' Determine the delimiter to be used for the date format
Dim _delimiter As String = ""
If _ss(0).IndexOf( " ") 0 Then
_delimiter = " "
ElseIf _ss(0).IndexOf( "/") 0 Then
_delimiter = "/"
ElseIf _ss(0).IndexOf( "-") 0 Then
_delimiter = "-"
Else
Console.WriteLi ne("Unable to determine delimiter out of " & _ss(0))
Return
End If
Console.WriteLi ne("Determined delimiter as '" & _delimiter & "'")

' Construct the date format to be used
Dim _format As String = String.Empty
' Split the first field on the date format delimiter
Dim _parts As String() = _ss(0).Split(Ne w String() {_delimiter},
StringSplitOpti ons.None)
If _parts.Length = 2 Then
' If there are 2 parts then we only have day and month components
If Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
Char.IsLetter(_ parts(1).Chars( 0)) Then
' The 1st part starts with a digit and the 2nd part starts with a
letter
' so we can assume that the 1st part is the day and the 2nd part is
the month
_format = New String("d"c, _parts(0).Lengt h) & _delimiter & "MMM"
If _parts(1).Lengt h 3 Then _format &= "M"
ElseIf Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
Char.IsDigit(_p arts(1).Chars(0 )) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and the 2nd
part is the month
_format = New String("d"c, _parts(0).Lengt h) & _delimiter & New
String("M"c, _parts(0).Lengt h))
If Integer.Parse(_ parts(1)) 12 Then
' The 1st part must be the month and the 2nd part must be the day
_format = New String("M"c, _parts(0).Lengt h) & _delimiter & New
String("d"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
_parts(2).Lengt h)
End If
' There is big gotcha here if both parts are < 12 and are different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
February or January 2
End If
ElseIf _parts.Length = 3 Then
' If there 3 parts then we have day, month and year components
' Assume that the year is always th 3rd part
If Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
Char.IsLetter(_ parts(1).Chars( 0)) Then
' The 1st part starts with a digit and the 2nd part starts with a
letter
' so we can assume that the 1st part is the day and the 2nd part is
the month
_format = New String("d"c, _parts(0).Lengt h) & _delimiter & "MMM"
If _parts(1).Lengt h 3 Then _format &= "M"
_format &= _delimiter & New String("y"c, _parts(2).Lengt h)
ElseIf Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
Char.IsDigit(_p arts(1).Chars(0 )) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and the 2nd
part is the month
_format = New String("d"c, _parts(0).Lengt h) & _delimiter & New
String("M"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
_parts(2).Lengt h)
If Integer.Parse(_ parts(1)) 12 Then
' The 1st part must be the month and the 2nd part must be the day
_format = New String("M"c, _parts(0).Lengt h) & _delimiter & New
String("d"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
_parts(2).Lengt h)
End If
' There is big gotcha here if the forst two parts are < 12 and are
different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
February or January 2
End If
End If
If _format.Length = 0 Then
' We were unable to determine the date format from the available
information
Console.WriteLi ne("Unable to determine format from " & _ss(0))
Return
End If

' We were able to determine the date format so we can continue and parse
the dates
Console.WriteLi ne("Determined format as " & _format)

' Start from our actual first line of data
For _i As Integer = _firstline To _lines.Length - 1
_ss = _lines(_i).Spli t(New String() {" "}, StringSplitOpti ons.None)
Dim _date As DateTime = DateTime.ParseE xact(_ss(0), _format, Nothing)
Console.WriteLi ne("Read from input: " & _ss(0) & " - Interpreted date:
" & _date.ToString( "yyyy-MM-dd"))
Next

End Sub

Note, from the results, that if there is no year part then
DateTime.ParseE xact will interpret tahe date being in the current year as
determined from the system date at the time the code is executed.
"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:uc******** ******@TK2MSFTN GP06.phx.gbl...
look like I am not expressing myself clearly. although the application
does
not know which format is used but does know for a given Set which date
format I deals with and can expect the same format for a given Set of
input.
I should not have used the term batch but a set of record. The only
possible variations are some records in certain sets may be split into 2
lines but that is not critical as the conditions can be described before
hand and normalized by the another parse component

sample date

Set1: date format mask is "dd MMM"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 Dec A1234988 Sample Parts description 1 10.00 20
18 Dec A1234988 Sample Parts description 1 10.00 20
19 Dec A1234988 Sample Parts description 1 10.00 20
12 Dec A1234988 Sample Parts description 1 10.00 20
Set 2 date format Mask is "dd MM yy"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 12 06 A1234988 Sample Parts description 1 10.00 20
18 12 06 A1234988 Sample Parts description 1 10.00 20
19 12 06 A1234988 Sample Parts description 1 10.00 20
12 12 06 A1234988 Sample Parts description 1 10.00 20

Set 3 date format mask "dd/MMM/06"
Parts Description location Quantitiy Unit Cost Total Cost
11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/12/06 A1234988 Sample Parts description 1 10.00
2018/12/06 A1234988 Sample Parts description 1 10.00
2019/12/06 A1234988 Sample Parts description 1 10.00
2012/12/06 A1234988 Sample Parts description 1 10.00 20

Set 4 date format mask ""
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/dec/06 A1234988 Sample Parts description 1 10.00 20
18/dec/06 A1234988 Sample Parts description 1 10.00 20
19/dec/06 A1234988 Sample Parts description 1 10.00 20
12/dec/06 A1234988 Sample Parts description 1 10.00 20

how do I deal with format without year, I do have cluse for other parts of
teh originatin website and optional default set by user

the sample data show variation of date format from set to set but the date
format that I need to deal within a given set are consistant and user has
influence to date format mask used.

Like Cor suggestion. don't let user enter the format but let the user pick
from a list. that will like be case at least n the version 0
"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:eF******** ******@TK2MSFTN GP03.phx.gbl...
>You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format
the
data may come in, however part of the application allow user to define
and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable
to
the front end that takes the user definitions, I believe it would be
easier
>to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the
appropriate
>format and identifier I can use regex,replace to normalize the date. As
a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:%2******* *********@TK2MS FTNGP04.phx.gbl ...
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in
part
>2:
replace the regex string the yyyy to regex year expression with
year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************ @planet.nlwrote in message
news:%2******** ********@TK2MSF TNGP06.phx.gbl. ..
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseE xact, but have a look to the nicely by Microsoft
inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.te lusschreef in bericht
news:Ot******** ******@TK2MSFTN GP03.phx.gbl...
let say I have to deal with various date format and I am give
format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order





Jan 9 '07 #17
Stephany,

I am curious, what does this phrase mean, I don't know it.
Now we're cooking with gas.
(Living in Holland which is above one of the former biggest gasbells of
Europe)

Cor
Jan 9 '07 #18
It's a euphemism for:

Efficiently performing a task after a long period
of inefficient performance or possibly failed
attempts at the entire task or certain steps in the process.

Vefore we saw the sample data we were 'shooting in the dark'. As soon as the
sample data was posted it all became clear.
"Cor Ligthert [MVP]" <no************ @planet.nlwrote in message
news:%2******** ********@TK2MSF TNGP04.phx.gbl. ..
Stephany,

I am curious, what does this phrase mean, I don't know it.
>Now we're cooking with gas.

(Living in Holland which is above one of the former biggest gasbells of
Europe)

Cor

Jan 9 '07 #19
GS
I see the code work hard on does work most a lot of cases, but it not better
we get assistance from user who knows what date format being used? That is
the rationale I let user somehow pick the date format mask. Guessing date
format is tough to master for all cases. Not only months, days can be
indeterminate at time; worse when 2 digit year is used. I have seen some
sample data that is way out of ordinary date format commonly seen in US.

relying the first 1 or 2 being numeric would miss out quite a few cases.
Nonetheless. the code can be a default in absence of user spec. . thank you
very much for that

Sorry for misleading you with incomplete data samples.
There are sample data set where the first column is not date. on the other
sometimes first 2 columns can also be dates as well as rarely another column
else where can to date. this sound incredulous but that's what users have
to content with.

"Stephany Young" <noone@localhos twrote in message
news:ed******** ******@TK2MSFTN GP02.phx.gbl...
Now we're cooking with gas. I think that regex is overkill for this
'problem'. Sure, you can use it if you wish but I think you will be making
a
rod for your own back.

Here is a solution that works for your sample data. Create a Windows Forms
project, plonk a button on the form and paste the following into the form:

Private m_source1 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"15 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"18 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"19 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 Dec A1234988 Sample Parts description 1 10.00 20"

Private m_source2 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"15 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"18 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"19 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"

Private m_source3 As String = "Parts Parts ID Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"15/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"18/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"19/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12/12/06 A1234988 Sample Parts description 1 10.00 20"

Private m_source4 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"15/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"18/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"19/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12/dec/06 A1234988 Sample Parts description 1 10.00 20"

Private m_source5 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.New Line & _
"12 13 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.New Line & _
"12 15 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 18 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 19 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.New Line & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"

Private Sub Button1_Click(B yVal sender As System.Object, ByVal e As
System.EventArg s) Handles Button1.Click

Console.WriteLi ne()

Console.WriteLi ne("Sample 1")

ProcessData(m_s ource1)

Console.WriteLi ne()

Console.WriteLi ne("Sample 2")

ProcessData(m_s ource2)

Console.WriteLi ne()

Console.WriteLi ne("Sample 3")

ProcessData(m_s ource3)

Console.WriteLi ne()

Console.WriteLi ne("Sample 4")

ProcessData(m_s ource4)

Console.WriteLi ne()

Console.WriteLi ne("Sample 5")

ProcessData(m_s ource5)

Console.WriteLi ne()

End Sub

Private Sub ProcessData(ByV al source As String)

' Assumption: Lines of data are seperated by a carriage return/line
feed
pair
Dim _lines As String() = source.Split(Ne w String()
{Environment.Ne wLine}, StringSplitOpti ons.RemoveEmpty Entries)

' Determined by eyeballing data: All 'fields' are delimited by a pair
of
spaces
Dim _ss As String() = _lines(0).Split (New String() {" "},
StringSplitOpti ons.None)

' Determine which line is the first line of actual data
' If the first line is a heading line then all characters of the first
field will be letters
Dim _lettercount As Integer = 0
For Each _c As Char In _ss(0)
If Char.IsLetter(_ c) Then _lettercount += 1
Next
Dim _firstline As Integer = 0
If _lettercount = _ss(0).Length Then _firstline = 1

'Split the first actual line on the field delimiter
_ss = _lines(_firstli ne).Split(New String() {" "},
StringSplitOpti ons.None)

' Determined by eyeballing data: The date field is always the first
field in the line

' Determine the delimiter to be used for the date format
Dim _delimiter As String = ""
If _ss(0).IndexOf( " ") 0 Then
_delimiter = " "
ElseIf _ss(0).IndexOf( "/") 0 Then
_delimiter = "/"
ElseIf _ss(0).IndexOf( "-") 0 Then
_delimiter = "-"
Else
Console.WriteLi ne("Unable to determine delimiter out of " & _ss(0))
Return
End If
Console.WriteLi ne("Determined delimiter as '" & _delimiter & "'")

' Construct the date format to be used
Dim _format As String = String.Empty
' Split the first field on the date format delimiter
Dim _parts As String() = _ss(0).Split(Ne w String() {_delimiter},
StringSplitOpti ons.None)
If _parts.Length = 2 Then
' If there are 2 parts then we only have day and month components
If Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
Char.IsLetter(_ parts(1).Chars( 0)) Then
' The 1st part starts with a digit and the 2nd part starts with a
letter
' so we can assume that the 1st part is the day and the 2nd part
is
the month
_format = New String("d"c, _parts(0).Lengt h) & _delimiter & "MMM"
If _parts(1).Lengt h 3 Then _format &= "M"
ElseIf Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
Char.IsDigit(_p arts(1).Chars(0 )) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and the
2nd
part is the month
_format = New String("d"c, _parts(0).Lengt h) & _delimiter & New
String("M"c, _parts(0).Lengt h))
If Integer.Parse(_ parts(1)) 12 Then
' The 1st part must be the month and the 2nd part must be the
day
_format = New String("M"c, _parts(0).Lengt h) & _delimiter & New
String("d"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
_parts(2).Lengt h)
End If
' There is big gotcha here if both parts are < 12 and are
different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
February or January 2
End If
ElseIf _parts.Length = 3 Then
' If there 3 parts then we have day, month and year components
' Assume that the year is always th 3rd part
If Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
Char.IsLetter(_ parts(1).Chars( 0)) Then
' The 1st part starts with a digit and the 2nd part starts with a
letter
' so we can assume that the 1st part is the day and the 2nd part
is
the month
_format = New String("d"c, _parts(0).Lengt h) & _delimiter & "MMM"
If _parts(1).Lengt h 3 Then _format &= "M"
_format &= _delimiter & New String("y"c, _parts(2).Lengt h)
ElseIf Char.IsDigit(_p arts(0).Chars(0 )) AndAlso
Char.IsDigit(_p arts(1).Chars(0 )) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and the
2nd
part is the month
_format = New String("d"c, _parts(0).Lengt h) & _delimiter & New
String("M"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
_parts(2).Lengt h)
If Integer.Parse(_ parts(1)) 12 Then
' The 1st part must be the month and the 2nd part must be the
day
_format = New String("M"c, _parts(0).Lengt h) & _delimiter & New
String("d"c, _parts(0).Lengt h) & _delimiter & New String("y"c,
_parts(2).Lengt h)
End If
' There is big gotcha here if the forst two parts are < 12 and are
different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
February or January 2
End If
End If
If _format.Length = 0 Then
' We were unable to determine the date format from the available
information
Console.WriteLi ne("Unable to determine format from " & _ss(0))
Return
End If

' We were able to determine the date format so we can continue and
parse
the dates
Console.WriteLi ne("Determined format as " & _format)

' Start from our actual first line of data
For _i As Integer = _firstline To _lines.Length - 1
_ss = _lines(_i).Spli t(New String() {" "}, StringSplitOpti ons.None)
Dim _date As DateTime = DateTime.ParseE xact(_ss(0), _format,
Nothing)
Console.WriteLi ne("Read from input: " & _ss(0) & " - Interpreted
date:
" & _date.ToString( "yyyy-MM-dd"))
Next

End Sub

Note, from the results, that if there is no year part then
DateTime.ParseE xact will interpret tahe date being in the current year as
determined from the system date at the time the code is executed.
"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:uc******** ******@TK2MSFTN GP06.phx.gbl...
look like I am not expressing myself clearly. although the application
does
not know which format is used but does know for a given Set which date
format I deals with and can expect the same format for a given Set of
input.
I should not have used the term batch but a set of record. The only
possible variations are some records in certain sets may be split into 2
lines but that is not critical as the conditions can be described before
hand and normalized by the another parse component

sample date

Set1: date format mask is "dd MMM"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 Dec A1234988 Sample Parts description 1 10.00 20
18 Dec A1234988 Sample Parts description 1 10.00 20
19 Dec A1234988 Sample Parts description 1 10.00 20
12 Dec A1234988 Sample Parts description 1 10.00 20
Set 2 date format Mask is "dd MM yy"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 12 06 A1234988 Sample Parts description 1 10.00 20
18 12 06 A1234988 Sample Parts description 1 10.00 20
19 12 06 A1234988 Sample Parts description 1 10.00 20
12 12 06 A1234988 Sample Parts description 1 10.00 20

Set 3 date format mask "dd/MMM/06"
Parts Description location Quantitiy Unit Cost Total Cost
11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/12/06 A1234988 Sample Parts description 1 10.00
2018/12/06 A1234988 Sample Parts description 1 10.00
2019/12/06 A1234988 Sample Parts description 1 10.00
2012/12/06 A1234988 Sample Parts description 1 10.00 20

Set 4 date format mask ""
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/dec/06 A1234988 Sample Parts description 1 10.00 20
18/dec/06 A1234988 Sample Parts description 1 10.00 20
19/dec/06 A1234988 Sample Parts description 1 10.00 20
12/dec/06 A1234988 Sample Parts description 1 10.00 20

how do I deal with format without year, I do have cluse for other parts
of
teh originatin website and optional default set by user

the sample data show variation of date format from set to set but the
date
format that I need to deal within a given set are consistant and user
has
influence to date format mask used.

Like Cor suggestion. don't let user enter the format but let the user
pick
from a list. that will like be case at least n the version 0
"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:eF******** ******@TK2MSFTN GP03.phx.gbl...
You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format
the
data may come in, however part of the application allow user to define
and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format
mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable
to
the front end that takes the user definitions, I believe it would be
easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may
be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the
appropriate
format and identifier I can use regex,replace to normalize the date.
As
a
matter of fact the date separator does not have to / but can be space
as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior
project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs************ **********@msne ws.Nomail.comwr ote in message
news:%2******** ********@TK2MSF TNGP04.phx.gbl. ..
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive
the
relevant regex expression to be used for date normalization later in
part
2:
replace the regex string the yyyy to regex year expression with
year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated
with
month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to
yyyy--mm-dd


any problem with the above approach?

"Cor Ligthert [MVP]" <no************ @planet.nlwrote in message
news:%2******** ********@TK2MSF TNGP06.phx.gbl. ..
GS,
>
Maybe can you avoid this in 2007 and all things like that as
DateTime.parseE xact, but have a look to the nicely by Microsoft
inbuild
globalization and than the to that related ToString option.
>
Cor
>
"gs" <gs@dontMail.te lusschreef in bericht
news:Ot******** ******@TK2MSFTN GP03.phx.gbl...
let say I have to deal with various date format and I am give
format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order

>
>





Jan 9 '07 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
7722
by: jacob nikom | last post by:
Hi, I would like to store XML files in MySQL. What is the best solution: 1. Convert it to string and store it as CLOB/text 2. Serialize it and store as byte array 3. Flatten it out and create one column per element, each column is VARCHAR Does MySQL has anything special for XML data? Is there any software which helps to store XML data in MySQL
7
3017
by: Shimon Sim | last post by:
I have a custom composite control I have following property
3
2100
by: danavni | last post by:
i need to build a service that will accept incoming TCP/IP connections. the service should act like a "HUB" where on one side clients connect to it and stay connected for as long as they like and on the other side the service reads messages for these clients from MSMQ and sends them to already connected clients. the clients can also send information back to the "HUB" which will be parsed and sent to MSMQ. when the clients connect to this...
10
1986
by: Mike Logan | last post by:
I am using the "contract first" design methodology. Contract First is design the WSDL first then design the server and client. However I must design my XSD/XML Schema before anything. I am developing my schema now. I have a version on my schema. However once I start the server side code, how is the server now that the right "complexType" is being passed? What happens if this complexType my web service consumes needs to be...
3
2164
by: aurora | last post by:
This is an entry I just added to ASPN. It is a somewhat novel technique I have employed quite successfully in my code. I repost it here for more explosure and discussions. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/475158 wy ------------------------------------------------------------------------
1
1637
by: Bernie Beattie | last post by:
I'm really struggling with deciding on the best way to approach my particular website design scenario. Most tutorials and samples assume an sql backend database. What I have is a proprietary database (which I'm stuck with) that I communicate with via a dll which expects requests in xml and returns the data back in xml. I've initially created a generic data access class that has 2 string parameters containing the dll method to call and...
4
1585
by: =?ISO-8859-15?Q?Luigi_Malag=F2?= | last post by:
Hello, i'm new to function pointers. I have some code that parses some configuration files using flex and bison. I have to use the same parse for different files, but the information i parse have to be add to different collections. I have a C++ class that wraps the code that invoce the parser. I pass to the parser an object that works as a container for the different object i have to parse. Let make an example. I have to parse men.txt...
1
1819
by: Olaf | last post by:
Hi, I try to design a program which has to run on console. There is only one exe/binary and depend on the calling name argv the different tasks/commands should be performed as aliases. It's the technique by busybox or linux lvm tools in C and avoids a bunch of different binaries. Now I want to write those using C++ classes and boost. Ok, now I'm in design considerations. I have problems with the API for this purpose.
4
1530
by: trullock | last post by:
Hi, Can anyone suggest the best way to go about the following... I'm tracking clicks (mouse down x,y coordinates) on a web page by using some javascript to create an XHR which sends the coordinates to a recording service, such as: /Record.ashx?X=123&Y=456
0
9487
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9297
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10069
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9904
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9735
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8736
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6556
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5168
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3395
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.