By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,836 Members | 2,011 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,836 IT Pros & Developers. It's quick & easy.

best design for parse

P: n/a
gs
let say I have to deal with various date format and I am give format string
from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming format
string
a) use two array and statically match
b) use regex to find the order
Jan 6 '07 #1
Share this Question
Share on Google+
29 Replies


P: n/a
"gs" <gs@dontMail.telusschrieb:
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming format
string
Maybe you are looking for 'DateTime.ParseExact'.

--
M S Herfried K. Wagner
M V P <URL:http://dotnet.mvps.org/>
V B <URL:http://dotnet.mvps.org/dotnet/faqs/>

Jan 6 '07 #2

P: n/a
GS
thank you, I give that a shot Hopefully . it will take care most of what I
need, or at least make the rest easier. except one thing,

I am dealing with lines of string data (up to 300 lines) and the date fields
position may not be known before hand although for a given set of lines,
they stay in the same place 99.999 of the time except for the odd comments
which is not that critical;

"Herfried K. Wagner [MVP]" <hi***************@gmx.atwrote in message
news:eT**************@TK2MSFTNGP03.phx.gbl...
"gs" <gs@dontMail.telusschrieb:
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming format
string

Maybe you are looking for 'DateTime.ParseExact'.

--
M S Herfried K. Wagner
M V P <URL:http://dotnet.mvps.org/>
V B <URL:http://dotnet.mvps.org/dotnet/faqs/>

Jan 6 '07 #3

P: n/a
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming format
string
a) use two array and statically match
b) use regex to find the order

Jan 6 '07 #4

P: n/a
GS
thank you, Cor.

However, I must be thick. I don't quite get the drift as with regard to
2007. are we talking about a new release of studio, .net frame work or just
the release or patch to come out in 2007.

how would that handle string date mixed with other data?

Actually the original source of the data is displayed html table placed in
clipboard. the objective to standardize the date string to yyyy-mm-dd and
then pass on to other components for processing and storage

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming format
string
a) use two array and statically match
b) use regex to find the order


Jan 7 '07 #5

P: n/a
GS,

I was thinking about writting that this was not in the case with webpages.
However windowforms is the default in this newsgroup, therefore please tell
this next time.

Cor

"GS" <gs**********************@msnews.Nomail.comschre ef in bericht
news:OI*************@TK2MSFTNGP06.phx.gbl...
thank you, Cor.

However, I must be thick. I don't quite get the drift as with regard to
2007. are we talking about a new release of studio, .net frame work or
just
the release or patch to come out in 2007.

how would that handle string date mixed with other data?

Actually the original source of the data is displayed html table placed in
clipboard. the objective to standardize the date string to yyyy-mm-dd and
then pass on to other components for processing and storage

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
>GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order



Jan 7 '07 #6

P: n/a
It is?

Newsgroup microsoft.public.dotnet.languages.vb provides a forum for
questions and general discussion of Visual Basic .NET.

Source:
http://msdn.microsoft.com/library/en...rogrammers.asp
"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2*****************@TK2MSFTNGP02.phx.gbl...
GS,

I was thinking about writting that this was not in the case with webpages.
However windowforms is the default in this newsgroup, therefore please
tell this next time.

Cor

"GS" <gs**********************@msnews.Nomail.comschre ef in bericht
news:OI*************@TK2MSFTNGP06.phx.gbl...
>thank you, Cor.

However, I must be thick. I don't quite get the drift as with regard to
2007. are we talking about a new release of studio, .net frame work or
just
the release or patch to come out in 2007.

how would that handle string date mixed with other data?

Actually the original source of the data is displayed html table placed
in
clipboard. the objective to standardize the date string to yyyy-mm-dd and
then pass on to other components for processing and storage

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
>>GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order



Jan 7 '07 #7

P: n/a
GS
the target is actually part of a windows .net application with winform that
embed webbrowser control.

I despite the clipboard source may well be in html table, but I can get the
text. the resulting text will have columns delimited by a couple of space
like characters

I am just in the designing stage to find the an easy to maintain approach
that will yield adequate performance on target PCs.

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2*****************@TK2MSFTNGP02.phx.gbl...
GS,

I was thinking about writting that this was not in the case with webpages.
However windowforms is the default in this newsgroup, therefore please
tell
this next time.

Cor

"GS" <gs**********************@msnews.Nomail.comschre ef in bericht
news:OI*************@TK2MSFTNGP06.phx.gbl...
thank you, Cor.

However, I must be thick. I don't quite get the drift as with regard to
2007. are we talking about a new release of studio, .net frame work or
just
the release or patch to come out in 2007.

how would that handle string date mixed with other data?

Actually the original source of the data is displayed html table placed
in
clipboard. the objective to standardize the date string to yyyy-mm-dd
and
then pass on to other components for processing and storage

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order




Jan 7 '07 #8

P: n/a
GS
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in part 2:
replace the regex string the yyyy to regex year expression with year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming format
string
a) use two array and statically match
b) use regex to find the order


Jan 7 '07 #9

P: n/a
Stephany,

You would have seen (you are not a newbie) how much time it took especially
for me, before I got it accepted that the used VB.net language in ASPNET was
also a part of the language and not of the framework and therefore suspect
of this newsgroup. Maybe you even saw that last week I wrote that again in
the C# newsgroup.

I only ask to the OP to tell that if it is specialized on a webpage (what
seems to be not the case) to tell that. Most of the persons answering here
are taking windowsforms as default, and in the case of date times I seldom
ask that, because there is "leiter" no DateTime Value equivalent in HTML.

Cor

"Stephany Young" <noone@localhostschreef in bericht
news:ut**************@TK2MSFTNGP03.phx.gbl...
It is?

Newsgroup microsoft.public.dotnet.languages.vb provides a forum for
questions and general discussion of Visual Basic .NET.

Source:
http://msdn.microsoft.com/library/en...rogrammers.asp
"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2*****************@TK2MSFTNGP02.phx.gbl...
>GS,

I was thinking about writting that this was not in the case with
webpages. However windowforms is the default in this newsgroup, therefore
please tell this next time.

Cor

"GS" <gs**********************@msnews.Nomail.comschre ef in bericht
news:OI*************@TK2MSFTNGP06.phx.gbl...
>>thank you, Cor.

However, I must be thick. I don't quite get the drift as with regard to
2007. are we talking about a new release of studio, .net frame work or
just
the release or patch to come out in 2007.

how would that handle string date mixed with other data?

Actually the original source of the data is displayed html table placed
in
clipboard. the objective to standardize the date string to yyyy-mm-dd
and
then pass on to other components for processing and storage

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl.. .
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order





Jan 7 '07 #10

P: n/a
I think that you are missing the whole point.

Regular Expressions (Regex) are about pattern matching, not format matching.

It does not matter whether the source data comes from a HTML page, a Windows
Forms TextBox or a disk file. The source data is the source data and that is
all there is to it.

If the source data only contained one instance of a 'date' in dd/MM/yyyy
format then to find it by your methodology, you would need to test for up to
3,719,628 permutations from 01/01/0001 all the way up to 31/12/9999, i.e.,
31 (days) * 12 (months) * 9999 (years). Couple this up with the other 8
'formats' and you can how such a task will quickly become unmanagable.

But ... what you really are looking for is a sequence of 2 digits followed
by a slash followed by 2 digits followed by a slash followed by 4 digits.
That immediately takes care of 2 of your 'formats'. Off the top of my head
the regex for that is "[0-9]{2}/[0-9]{2}/[0-9]{4}".

The next pattern you are looking for is 2 digits followed by a slash
followed by 3 alphas followed by a slash followed by 4 digits.
"[0-9]{2}/{A-Za-z}{3}/{0-9}{4}".

The next pattern you are looking for is 3 alphas followed by a slash
followed by 2 digits followed by a slash followed by 4 digits.
"{A-Za-z}{3}/[0-9]{2}/{0-9}{4}".

The next 4 formats are taken care of by varying the above.
"[0-9]{2}/[0-9]{2}/[0-9]{2}", "[0-9]{2}/{A-Za-z}{3}/{0-9}" and
"{A-Za-z}{3}/[0-9]{2}/{0-9}{2}" respectively.

The last format is simply the pattern "[0-9]{2}/{0-9}{2}".

Now, the real secret is what directly precedes and follows your 'dates'. For
instance, are your 'dates' ALWAYS 'wrapped' in a tag? E.g.,
<td>07/01/2007</td>. It might be that there is always a space character
directly before 'date and another directly after the 'date'. Any such
information will allow you to 'tune' your pattern so that it doesn't pick up
false positives. The pattern [0-9]{2}/{0-9}{2} would pick up the 01/02 out
of 01MyQuite01/02YourQuote02.

All the patterns need to be put together in a regular expression woth or's
so that you can find all the candidate dates in one operation.

"\d{2}/\d{2}/\d{4}|\d{2}/[A-Za-z]{3}/\d{4}|[A-Za-z]{3}/\d{2}/\d{4}|\d{2}/\d{2}/\d{2}|\d{2}/[A-Za-z]{3}/\d{2}|[A-Za-z]{3}/\d{2}/\d{2}|\d{2}/\d{2}"

Please feel free to jump in here if I've got that wrong because I'm by no
means a regex expert.
Once you have your candidate dates (matches) you need to deal with each one
in turn.

As Herfried said earlier you need to use DateTime.ParseExact.

For that you need an array of strings to hold all your formats.

Dim _formats As String() = new String() {"dd/MM/yyyy", "MM/dd/yyyy",
"dd/MMM/yyyy", "MMM/dd/yyyy", "dd/MM/yy", "MM/dd/yy", "dd/MMM/yy",
"MMM/dd/yy", "dd/MM"}

For each candidate call DateTime.ParseExact, trapping an exception if it
occurs:

Dim _d As DateTime

Try
_d = DateTime.ParseExact(_candidate, _formats, Nothing,
DateTimeStyles.None)
' DateTime.ParseExact succeeded so we can deal with it
...
Catch _ex As FormatException
' Because we know that _candidate is not an empty string and none of the
elements of _formats is an empty string then _candidate does not contain a
date and time that corresponds to any element of _formats
....
End Try

"GS" <gs**********************@msnews.Nomail.comwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in part
2:
replace the regex string the yyyy to regex year expression with year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
>GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order



Jan 7 '07 #11

P: n/a
GS
You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format the
data may come in, however part of the application allow user to define and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable to
the front end that takes the user definitions, I believe it would be easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the appropriate
format and identifier I can use regex,replace to normalize the date. As a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs**********************@msnews.Nomail.comwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in part
2:
replace the regex string the yyyy to regex year expression with year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order
>


Jan 8 '07 #12

P: n/a
Again you're missing the point.

I think the best thing you can do is post a relatively small sample of the
text you are attempting to parse.

While you're doing that, execute the following and observe the results. It
demonstrates what I am talking about:

Dim _source As String = "On 07/01/2007 the quick brown fox jumps over the
lazy dog." & Environment.NewLine & _
"On 08/01/2007 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On Jan/09/2007 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 10/Jan/2007 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 11/01/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 01/12/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On Jan/13/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 14/Jan/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 15/01 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"The part number XYZ/72/84 is now discontinued."

Dim _regex As New
Regex("\d{2}/\d{2}/\d{4}|[A-Za-z]{3}/\d{2}/\d{4}|\d{2}/[A-Za-z]{3}/\d{4}|\d{2}/\d{2}/\d{2}|[A-Za-z]{3}/\d{2}/\d{2}|\d{2}/[A-Za-z]{3}/\d{2}|\d{2}/\d{2}")

Dim _candidates As Integer = 0
Dim _matches As Integer = 0

Dim _match As Match = _regex.Match(_source)

While _match.Success
_candidates += 1
Console.WriteLine("{0} found at index {1}", _match.Value, _match.Index)
Try
Console.WriteLine("Converted value = {0:yyyy-MM-dd}",
DateTime.ParseExact(_match.Value, New String() {"dd/MM/yyyy", "MM/dd/yyyy",
"MMM/dd/yyyy", "dd/MMM/yyyy", "dd/MM/yy", "MM/dd/yy", "dd/MMM/yy",
"MMM/dd/yy", "dd/MM"}, Nothing, DateTimeStyles.None))
_matches += 1
Catch _ex As Exception
Console.WriteLine(_ex.Message)
End Try
_match = _match.NextMatch()
End While

Console.WriteLine("{0} candidates found", _candidates)

Console.WriteLine("{0} matches found", _matches)
"GS" <gs**********************@msnews.Nomail.comwrote in message
news:eF**************@TK2MSFTNGP03.phx.gbl...
You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format the
data may come in, however part of the application allow user to define and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable to
the front end that takes the user definitions, I believe it would be
easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the
appropriate
format and identifier I can use regex,replace to normalize the date. As a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs**********************@msnews.Nomail.comwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
>thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in part
2:
> replace the regex string the yyyy to regex year expression with year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
>identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order



Jan 8 '07 #13

P: n/a
GS,

As long as you don't know the date format, you can probably do nothing.
As soon as you know the dateformat, you can try to use the
DateTime.ParseExact with the given patern.
(Don't forget to set the mm in Upercase and let it not be done by the user).

Cor

"GS" <gs**********************@msnews.Nomail.comschre ef in bericht
news:eF**************@TK2MSFTNGP03.phx.gbl...
You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format the
data may come in, however part of the application allow user to define and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable to
the front end that takes the user definitions, I believe it would be
easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the
appropriate
format and identifier I can use regex,replace to normalize the date. As a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs**********************@msnews.Nomail.comwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
>thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in part
2:
> replace the regex string the yyyy to regex year expression with year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
>identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order



Jan 8 '07 #14

P: n/a
GS
look like I am not expressing myself clearly. although the application does
not know which format is used but does know for a given Set which date
format I deals with and can expect the same format for a given Set of input.
I should not have used the term batch but a set of record. The only
possible variations are some records in certain sets may be split into 2
lines but that is not critical as the conditions can be described before
hand and normalized by the another parse component

sample date

Set1: date format mask is "dd MMM"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 Dec A1234988 Sample Parts description 1 10.00 20
18 Dec A1234988 Sample Parts description 1 10.00 20
19 Dec A1234988 Sample Parts description 1 10.00 20
12 Dec A1234988 Sample Parts description 1 10.00 20
Set 2 date format Mask is "dd MM yy"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 12 06 A1234988 Sample Parts description 1 10.00 20
18 12 06 A1234988 Sample Parts description 1 10.00 20
19 12 06 A1234988 Sample Parts description 1 10.00 20
12 12 06 A1234988 Sample Parts description 1 10.00 20

Set 3 date format mask "dd/MMM/06"
Parts Description location Quantitiy Unit Cost Total Cost
11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/12/06 A1234988 Sample Parts description 1 10.00
2018/12/06 A1234988 Sample Parts description 1 10.00
2019/12/06 A1234988 Sample Parts description 1 10.00
2012/12/06 A1234988 Sample Parts description 1 10.00 20

Set 4 date format mask ""
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/dec/06 A1234988 Sample Parts description 1 10.00 20
18/dec/06 A1234988 Sample Parts description 1 10.00 20
19/dec/06 A1234988 Sample Parts description 1 10.00 20
12/dec/06 A1234988 Sample Parts description 1 10.00 20

how do I deal with format without year, I do have cluse for other parts of
teh originatin website and optional default set by user

the sample data show variation of date format from set to set but the date
format that I need to deal within a given set are consistant and user has
influence to date format mask used.

Like Cor suggestion. don't let user enter the format but let the user pick
from a list. that will like be case at least n the version 0
"GS" <gs**********************@msnews.Nomail.comwrote in message
news:eF**************@TK2MSFTNGP03.phx.gbl...
You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format the
data may come in, however part of the application allow user to define and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable to
the front end that takes the user definitions, I believe it would be
easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the
appropriate
format and identifier I can use regex,replace to normalize the date. As a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs**********************@msnews.Nomail.comwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in
part
2:
replace the regex string the yyyy to regex year expression with year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
GS,
>
Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft
inbuild
globalization and than the to that related ToString option.
>
Cor
>
"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order

>
>


Jan 8 '07 #15

P: n/a
GS
thank you.

you do have a point but the application I have in mind to get most of easy
to do but boring and repetitive task out user quickly to get their buy in
for the next phrase. The application is not going to be perfect on version 0
but must be flexible to adapt to need change.

Furthermore I choose normalizing date format to yyyy-mm-dd because that is
the standard string date format that is acceptable by almost all standard
windows applications
for the users that I deal with despite locale, despite default display
format.

as a side note right now this application at version zero is not to automate
everything but help users to do their jobs and help us to gain understanding
of what they do. at the same time validate the transform process that will
be used later for automation. version 1 will automate a lot more and may
actually drive some excel, word application process
you could say the version zero is closer to Mickey mouse utility with, if
you wish

"Stephany Young" <noone@localhostwrote in message
news:uk**************@TK2MSFTNGP04.phx.gbl...
Again you're missing the point.

I think the best thing you can do is post a relatively small sample of the
text you are attempting to parse.

While you're doing that, execute the following and observe the results. It
demonstrates what I am talking about:

Dim _source As String = "On 07/01/2007 the quick brown fox jumps over
the
lazy dog." & Environment.NewLine & _
"On 08/01/2007 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On Jan/09/2007 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 10/Jan/2007 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 11/01/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 01/12/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On Jan/13/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 14/Jan/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 15/01 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"The part number XYZ/72/84 is now discontinued."

Dim _regex As New
Regex("\d{2}/\d{2}/\d{4}|[A-Za-z]{3}/\d{2}/\d{4}|\d{2}/[A-Za-z]{3}/\d{4}|\d{
2}/\d{2}/\d{2}|[A-Za-z]{3}/\d{2}/\d{2}|\d{2}/[A-Za-z]{3}/\d{2}|\d{2}/\d{2}")
>
Dim _candidates As Integer = 0
Dim _matches As Integer = 0

Dim _match As Match = _regex.Match(_source)

While _match.Success
_candidates += 1
Console.WriteLine("{0} found at index {1}", _match.Value,
_match.Index)
Try
Console.WriteLine("Converted value = {0:yyyy-MM-dd}",
DateTime.ParseExact(_match.Value, New String() {"dd/MM/yyyy",
"MM/dd/yyyy",
"MMM/dd/yyyy", "dd/MMM/yyyy", "dd/MM/yy", "MM/dd/yy", "dd/MMM/yy",
"MMM/dd/yy", "dd/MM"}, Nothing, DateTimeStyles.None))
_matches += 1
Catch _ex As Exception
Console.WriteLine(_ex.Message)
End Try
_match = _match.NextMatch()
End While

Console.WriteLine("{0} candidates found", _candidates)

Console.WriteLine("{0} matches found", _matches)
"GS" <gs**********************@msnews.Nomail.comwrote in message
news:eF**************@TK2MSFTNGP03.phx.gbl...
You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format
the
data may come in, however part of the application allow user to define
and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable
to
the front end that takes the user definitions, I believe it would be
easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may
be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the
appropriate
format and identifier I can use regex,replace to normalize the date. As
a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs**********************@msnews.Nomail.comwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in
part
2:
replace the regex string the yyyy to regex year expression with
year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft
inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give
format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order
>





Jan 8 '07 #16

P: n/a
Now we're cooking with gas. I think that regex is overkill for this
'problem'. Sure, you can use it if you wish but I think you will be making a
rod for your own back.

Here is a solution that works for your sample data. Create a Windows Forms
project, plonk a button on the form and paste the following into the form:

Private m_source1 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"15 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 Dec A1234988 Sample Parts description 1 10.00 20"

Private m_source2 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"15 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"

Private m_source3 As String = "Parts Parts ID Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"15/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12/12/06 A1234988 Sample Parts description 1 10.00 20"

Private m_source4 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"15/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12/dec/06 A1234988 Sample Parts description 1 10.00 20"

Private m_source5 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"12 13 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"12 15 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 18 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 19 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles Button1.Click

Console.WriteLine()

Console.WriteLine("Sample 1")

ProcessData(m_source1)

Console.WriteLine()

Console.WriteLine("Sample 2")

ProcessData(m_source2)

Console.WriteLine()

Console.WriteLine("Sample 3")

ProcessData(m_source3)

Console.WriteLine()

Console.WriteLine("Sample 4")

ProcessData(m_source4)

Console.WriteLine()

Console.WriteLine("Sample 5")

ProcessData(m_source5)

Console.WriteLine()

End Sub

Private Sub ProcessData(ByVal source As String)

' Assumption: Lines of data are seperated by a carriage return/line feed
pair
Dim _lines As String() = source.Split(New String()
{Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)

' Determined by eyeballing data: All 'fields' are delimited by a pair of
spaces
Dim _ss As String() = _lines(0).Split(New String() {" "},
StringSplitOptions.None)

' Determine which line is the first line of actual data
' If the first line is a heading line then all characters of the first
field will be letters
Dim _lettercount As Integer = 0
For Each _c As Char In _ss(0)
If Char.IsLetter(_c) Then _lettercount += 1
Next
Dim _firstline As Integer = 0
If _lettercount = _ss(0).Length Then _firstline = 1

'Split the first actual line on the field delimiter
_ss = _lines(_firstline).Split(New String() {" "},
StringSplitOptions.None)

' Determined by eyeballing data: The date field is always the first
field in the line

' Determine the delimiter to be used for the date format
Dim _delimiter As String = ""
If _ss(0).IndexOf(" ") 0 Then
_delimiter = " "
ElseIf _ss(0).IndexOf("/") 0 Then
_delimiter = "/"
ElseIf _ss(0).IndexOf("-") 0 Then
_delimiter = "-"
Else
Console.WriteLine("Unable to determine delimiter out of " & _ss(0))
Return
End If
Console.WriteLine("Determined delimiter as '" & _delimiter & "'")

' Construct the date format to be used
Dim _format As String = String.Empty
' Split the first field on the date format delimiter
Dim _parts As String() = _ss(0).Split(New String() {_delimiter},
StringSplitOptions.None)
If _parts.Length = 2 Then
' If there are 2 parts then we only have day and month components
If Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsLetter(_parts(1).Chars(0)) Then
' The 1st part starts with a digit and the 2nd part starts with a
letter
' so we can assume that the 1st part is the day and the 2nd part is
the month
_format = New String("d"c, _parts(0).Length) & _delimiter & "MMM"
If _parts(1).Length 3 Then _format &= "M"
ElseIf Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsDigit(_parts(1).Chars(0)) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and the 2nd
part is the month
_format = New String("d"c, _parts(0).Length) & _delimiter & New
String("M"c, _parts(0).Length))
If Integer.Parse(_parts(1)) 12 Then
' The 1st part must be the month and the 2nd part must be the day
_format = New String("M"c, _parts(0).Length) & _delimiter & New
String("d"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
End If
' There is big gotcha here if both parts are < 12 and are different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
February or January 2
End If
ElseIf _parts.Length = 3 Then
' If there 3 parts then we have day, month and year components
' Assume that the year is always th 3rd part
If Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsLetter(_parts(1).Chars(0)) Then
' The 1st part starts with a digit and the 2nd part starts with a
letter
' so we can assume that the 1st part is the day and the 2nd part is
the month
_format = New String("d"c, _parts(0).Length) & _delimiter & "MMM"
If _parts(1).Length 3 Then _format &= "M"
_format &= _delimiter & New String("y"c, _parts(2).Length)
ElseIf Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsDigit(_parts(1).Chars(0)) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and the 2nd
part is the month
_format = New String("d"c, _parts(0).Length) & _delimiter & New
String("M"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
If Integer.Parse(_parts(1)) 12 Then
' The 1st part must be the month and the 2nd part must be the day
_format = New String("M"c, _parts(0).Length) & _delimiter & New
String("d"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
End If
' There is big gotcha here if the forst two parts are < 12 and are
different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
February or January 2
End If
End If
If _format.Length = 0 Then
' We were unable to determine the date format from the available
information
Console.WriteLine("Unable to determine format from " & _ss(0))
Return
End If

' We were able to determine the date format so we can continue and parse
the dates
Console.WriteLine("Determined format as " & _format)

' Start from our actual first line of data
For _i As Integer = _firstline To _lines.Length - 1
_ss = _lines(_i).Split(New String() {" "}, StringSplitOptions.None)
Dim _date As DateTime = DateTime.ParseExact(_ss(0), _format, Nothing)
Console.WriteLine("Read from input: " & _ss(0) & " - Interpreted date:
" & _date.ToString("yyyy-MM-dd"))
Next

End Sub

Note, from the results, that if there is no year part then
DateTime.ParseExact will interpret tahe date being in the current year as
determined from the system date at the time the code is executed.
"GS" <gs**********************@msnews.Nomail.comwrote in message
news:uc**************@TK2MSFTNGP06.phx.gbl...
look like I am not expressing myself clearly. although the application
does
not know which format is used but does know for a given Set which date
format I deals with and can expect the same format for a given Set of
input.
I should not have used the term batch but a set of record. The only
possible variations are some records in certain sets may be split into 2
lines but that is not critical as the conditions can be described before
hand and normalized by the another parse component

sample date

Set1: date format mask is "dd MMM"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 Dec A1234988 Sample Parts description 1 10.00 20
18 Dec A1234988 Sample Parts description 1 10.00 20
19 Dec A1234988 Sample Parts description 1 10.00 20
12 Dec A1234988 Sample Parts description 1 10.00 20
Set 2 date format Mask is "dd MM yy"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 12 06 A1234988 Sample Parts description 1 10.00 20
18 12 06 A1234988 Sample Parts description 1 10.00 20
19 12 06 A1234988 Sample Parts description 1 10.00 20
12 12 06 A1234988 Sample Parts description 1 10.00 20

Set 3 date format mask "dd/MMM/06"
Parts Description location Quantitiy Unit Cost Total Cost
11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/12/06 A1234988 Sample Parts description 1 10.00
2018/12/06 A1234988 Sample Parts description 1 10.00
2019/12/06 A1234988 Sample Parts description 1 10.00
2012/12/06 A1234988 Sample Parts description 1 10.00 20

Set 4 date format mask ""
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/dec/06 A1234988 Sample Parts description 1 10.00 20
18/dec/06 A1234988 Sample Parts description 1 10.00 20
19/dec/06 A1234988 Sample Parts description 1 10.00 20
12/dec/06 A1234988 Sample Parts description 1 10.00 20

how do I deal with format without year, I do have cluse for other parts of
teh originatin website and optional default set by user

the sample data show variation of date format from set to set but the date
format that I need to deal within a given set are consistant and user has
influence to date format mask used.

Like Cor suggestion. don't let user enter the format but let the user pick
from a list. that will like be case at least n the version 0
"GS" <gs**********************@msnews.Nomail.comwrote in message
news:eF**************@TK2MSFTNGP03.phx.gbl...
>You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format
the
data may come in, however part of the application allow user to define
and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable
to
the front end that takes the user definitions, I believe it would be
easier
>to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the
appropriate
>format and identifier I can use regex,replace to normalize the date. As
a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs**********************@msnews.Nomail.comwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in
part
>2:
replace the regex string the yyyy to regex year expression with
year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with
month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd
any problem with the above approach?

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft
inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give
format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order





Jan 9 '07 #17

P: n/a
Stephany,

I am curious, what does this phrase mean, I don't know it.
Now we're cooking with gas.
(Living in Holland which is above one of the former biggest gasbells of
Europe)

Cor
Jan 9 '07 #18

P: n/a
It's a euphemism for:

Efficiently performing a task after a long period
of inefficient performance or possibly failed
attempts at the entire task or certain steps in the process.

Vefore we saw the sample data we were 'shooting in the dark'. As soon as the
sample data was posted it all became clear.
"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
Stephany,

I am curious, what does this phrase mean, I don't know it.
>Now we're cooking with gas.

(Living in Holland which is above one of the former biggest gasbells of
Europe)

Cor

Jan 9 '07 #19

P: n/a
GS
I see the code work hard on does work most a lot of cases, but it not better
we get assistance from user who knows what date format being used? That is
the rationale I let user somehow pick the date format mask. Guessing date
format is tough to master for all cases. Not only months, days can be
indeterminate at time; worse when 2 digit year is used. I have seen some
sample data that is way out of ordinary date format commonly seen in US.

relying the first 1 or 2 being numeric would miss out quite a few cases.
Nonetheless. the code can be a default in absence of user spec. . thank you
very much for that

Sorry for misleading you with incomplete data samples.
There are sample data set where the first column is not date. on the other
sometimes first 2 columns can also be dates as well as rarely another column
else where can to date. this sound incredulous but that's what users have
to content with.

"Stephany Young" <noone@localhostwrote in message
news:ed**************@TK2MSFTNGP02.phx.gbl...
Now we're cooking with gas. I think that regex is overkill for this
'problem'. Sure, you can use it if you wish but I think you will be making
a
rod for your own back.

Here is a solution that works for your sample data. Create a Windows Forms
project, plonk a button on the form and paste the following into the form:

Private m_source1 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"15 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 Dec A1234988 Sample Parts description 1 10.00 20"

Private m_source2 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"15 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"

Private m_source3 As String = "Parts Parts ID Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"15/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12/12/06 A1234988 Sample Parts description 1 10.00 20"

Private m_source4 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"15/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12/dec/06 A1234988 Sample Parts description 1 10.00 20"

Private m_source5 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"12 13 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"12 15 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 18 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 19 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles Button1.Click

Console.WriteLine()

Console.WriteLine("Sample 1")

ProcessData(m_source1)

Console.WriteLine()

Console.WriteLine("Sample 2")

ProcessData(m_source2)

Console.WriteLine()

Console.WriteLine("Sample 3")

ProcessData(m_source3)

Console.WriteLine()

Console.WriteLine("Sample 4")

ProcessData(m_source4)

Console.WriteLine()

Console.WriteLine("Sample 5")

ProcessData(m_source5)

Console.WriteLine()

End Sub

Private Sub ProcessData(ByVal source As String)

' Assumption: Lines of data are seperated by a carriage return/line
feed
pair
Dim _lines As String() = source.Split(New String()
{Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)

' Determined by eyeballing data: All 'fields' are delimited by a pair
of
spaces
Dim _ss As String() = _lines(0).Split(New String() {" "},
StringSplitOptions.None)

' Determine which line is the first line of actual data
' If the first line is a heading line then all characters of the first
field will be letters
Dim _lettercount As Integer = 0
For Each _c As Char In _ss(0)
If Char.IsLetter(_c) Then _lettercount += 1
Next
Dim _firstline As Integer = 0
If _lettercount = _ss(0).Length Then _firstline = 1

'Split the first actual line on the field delimiter
_ss = _lines(_firstline).Split(New String() {" "},
StringSplitOptions.None)

' Determined by eyeballing data: The date field is always the first
field in the line

' Determine the delimiter to be used for the date format
Dim _delimiter As String = ""
If _ss(0).IndexOf(" ") 0 Then
_delimiter = " "
ElseIf _ss(0).IndexOf("/") 0 Then
_delimiter = "/"
ElseIf _ss(0).IndexOf("-") 0 Then
_delimiter = "-"
Else
Console.WriteLine("Unable to determine delimiter out of " & _ss(0))
Return
End If
Console.WriteLine("Determined delimiter as '" & _delimiter & "'")

' Construct the date format to be used
Dim _format As String = String.Empty
' Split the first field on the date format delimiter
Dim _parts As String() = _ss(0).Split(New String() {_delimiter},
StringSplitOptions.None)
If _parts.Length = 2 Then
' If there are 2 parts then we only have day and month components
If Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsLetter(_parts(1).Chars(0)) Then
' The 1st part starts with a digit and the 2nd part starts with a
letter
' so we can assume that the 1st part is the day and the 2nd part
is
the month
_format = New String("d"c, _parts(0).Length) & _delimiter & "MMM"
If _parts(1).Length 3 Then _format &= "M"
ElseIf Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsDigit(_parts(1).Chars(0)) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and the
2nd
part is the month
_format = New String("d"c, _parts(0).Length) & _delimiter & New
String("M"c, _parts(0).Length))
If Integer.Parse(_parts(1)) 12 Then
' The 1st part must be the month and the 2nd part must be the
day
_format = New String("M"c, _parts(0).Length) & _delimiter & New
String("d"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
End If
' There is big gotcha here if both parts are < 12 and are
different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
February or January 2
End If
ElseIf _parts.Length = 3 Then
' If there 3 parts then we have day, month and year components
' Assume that the year is always th 3rd part
If Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsLetter(_parts(1).Chars(0)) Then
' The 1st part starts with a digit and the 2nd part starts with a
letter
' so we can assume that the 1st part is the day and the 2nd part
is
the month
_format = New String("d"c, _parts(0).Length) & _delimiter & "MMM"
If _parts(1).Length 3 Then _format &= "M"
_format &= _delimiter & New String("y"c, _parts(2).Length)
ElseIf Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsDigit(_parts(1).Chars(0)) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and the
2nd
part is the month
_format = New String("d"c, _parts(0).Length) & _delimiter & New
String("M"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
If Integer.Parse(_parts(1)) 12 Then
' The 1st part must be the month and the 2nd part must be the
day
_format = New String("M"c, _parts(0).Length) & _delimiter & New
String("d"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
End If
' There is big gotcha here if the forst two parts are < 12 and are
different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
February or January 2
End If
End If
If _format.Length = 0 Then
' We were unable to determine the date format from the available
information
Console.WriteLine("Unable to determine format from " & _ss(0))
Return
End If

' We were able to determine the date format so we can continue and
parse
the dates
Console.WriteLine("Determined format as " & _format)

' Start from our actual first line of data
For _i As Integer = _firstline To _lines.Length - 1
_ss = _lines(_i).Split(New String() {" "}, StringSplitOptions.None)
Dim _date As DateTime = DateTime.ParseExact(_ss(0), _format,
Nothing)
Console.WriteLine("Read from input: " & _ss(0) & " - Interpreted
date:
" & _date.ToString("yyyy-MM-dd"))
Next

End Sub

Note, from the results, that if there is no year part then
DateTime.ParseExact will interpret tahe date being in the current year as
determined from the system date at the time the code is executed.
"GS" <gs**********************@msnews.Nomail.comwrote in message
news:uc**************@TK2MSFTNGP06.phx.gbl...
look like I am not expressing myself clearly. although the application
does
not know which format is used but does know for a given Set which date
format I deals with and can expect the same format for a given Set of
input.
I should not have used the term batch but a set of record. The only
possible variations are some records in certain sets may be split into 2
lines but that is not critical as the conditions can be described before
hand and normalized by the another parse component

sample date

Set1: date format mask is "dd MMM"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 Dec A1234988 Sample Parts description 1 10.00 20
18 Dec A1234988 Sample Parts description 1 10.00 20
19 Dec A1234988 Sample Parts description 1 10.00 20
12 Dec A1234988 Sample Parts description 1 10.00 20
Set 2 date format Mask is "dd MM yy"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 12 06 A1234988 Sample Parts description 1 10.00 20
18 12 06 A1234988 Sample Parts description 1 10.00 20
19 12 06 A1234988 Sample Parts description 1 10.00 20
12 12 06 A1234988 Sample Parts description 1 10.00 20

Set 3 date format mask "dd/MMM/06"
Parts Description location Quantitiy Unit Cost Total Cost
11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/12/06 A1234988 Sample Parts description 1 10.00
2018/12/06 A1234988 Sample Parts description 1 10.00
2019/12/06 A1234988 Sample Parts description 1 10.00
2012/12/06 A1234988 Sample Parts description 1 10.00 20

Set 4 date format mask ""
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/dec/06 A1234988 Sample Parts description 1 10.00 20
18/dec/06 A1234988 Sample Parts description 1 10.00 20
19/dec/06 A1234988 Sample Parts description 1 10.00 20
12/dec/06 A1234988 Sample Parts description 1 10.00 20

how do I deal with format without year, I do have cluse for other parts
of
teh originatin website and optional default set by user

the sample data show variation of date format from set to set but the
date
format that I need to deal within a given set are consistant and user
has
influence to date format mask used.

Like Cor suggestion. don't let user enter the format but let the user
pick
from a list. that will like be case at least n the version 0
"GS" <gs**********************@msnews.Nomail.comwrote in message
news:eF**************@TK2MSFTNGP03.phx.gbl...
You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format
the
data may come in, however part of the application allow user to define
and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format
mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable
to
the front end that takes the user definitions, I believe it would be
easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may
be
text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the
appropriate
format and identifier I can use regex,replace to normalize the date.
As
a
matter of fact the date separator does not have to / but can be space
as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior
project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs**********************@msnews.Nomail.comwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive
the
relevant regex expression to be used for date normalization later in
part
2:
replace the regex string the yyyy to regex year expression with
year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated
with
month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to
yyyy--mm-dd


any problem with the above approach?

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
GS,
>
Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft
inbuild
globalization and than the to that related ToString option.
>
Cor
>
"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give
format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming
format
string
a) use two array and statically match
b) use regex to find the order

>
>





Jan 9 '07 #20

P: n/a
GS
oops. Please pardon my bad typo and proof reading
"GS" <gs**********************@msnews.Nomail.comwrote in message
news:Oe**************@TK2MSFTNGP06.phx.gbl...
I see the code (you put a lot effort in) does work on a lot of cases. much
appreciated.

However would it not be better we get assistance from user who knows what
date format being used?

That is the rationale I let user somehow pick the date format mask.
Guessing date format is tough to master for all cases. Not only months,
days can be
indeterminate at times; worse when 2 digit year is used. I have seen some
sample data that is way out of ordinary date format commonly seen in US.

Relying the first 1 or 2 being numeric would miss out quite a few cases.
Nonetheless. the code can be a default process in absence of user spec. .
thank you
very much for that

Sorry for misleading you with incomplete data samples.
There are sample data set where the first column is not date. on the other
sometimes first 2 columns can also be dates as well as rarely another column
else where can to date. this sound incredulous but that's what users have
to contend with.
"Stephany Young" <noone@localhostwrote in message
news:ed**************@TK2MSFTNGP02.phx.gbl...
Now we're cooking with gas. I think that regex is overkill for this
'problem'. Sure, you can use it if you wish but I think you will be
making
a
rod for your own back.

Here is a solution that works for your sample data. Create a Windows
Forms
project, plonk a button on the form and paste the following into the
form:

Private m_source1 As String = "Date Parts ID Parts Description
location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"15 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 Dec A1234988 Sample Parts description 1 10.00 20"

Private m_source2 As String = "Date Parts ID Parts Description
location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00"
&
Environment.NewLine & _
"15 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"

Private m_source3 As String = "Parts Parts ID Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00"
&
Environment.NewLine & _
"15/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12/12/06 A1234988 Sample Parts description 1 10.00 20"

Private m_source4 As String = "Date Parts ID Parts Description
location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00"
&
Environment.NewLine & _
"15/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12/dec/06 A1234988 Sample Parts description 1 10.00 20"

Private m_source5 As String = "Date Parts ID Parts Description
location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"12 13 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00"
&
Environment.NewLine & _
"12 15 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 18 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 19 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles Button1.Click

Console.WriteLine()

Console.WriteLine("Sample 1")

ProcessData(m_source1)

Console.WriteLine()

Console.WriteLine("Sample 2")

ProcessData(m_source2)

Console.WriteLine()

Console.WriteLine("Sample 3")

ProcessData(m_source3)

Console.WriteLine()

Console.WriteLine("Sample 4")

ProcessData(m_source4)

Console.WriteLine()

Console.WriteLine("Sample 5")

ProcessData(m_source5)

Console.WriteLine()

End Sub

Private Sub ProcessData(ByVal source As String)

' Assumption: Lines of data are seperated by a carriage return/line
feed
pair
Dim _lines As String() = source.Split(New String()
{Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)

' Determined by eyeballing data: All 'fields' are delimited by a
pair
of
spaces
Dim _ss As String() = _lines(0).Split(New String() {" "},
StringSplitOptions.None)

' Determine which line is the first line of actual data
' If the first line is a heading line then all characters of the
first
field will be letters
Dim _lettercount As Integer = 0
For Each _c As Char In _ss(0)
If Char.IsLetter(_c) Then _lettercount += 1
Next
Dim _firstline As Integer = 0
If _lettercount = _ss(0).Length Then _firstline = 1

'Split the first actual line on the field delimiter
_ss = _lines(_firstline).Split(New String() {" "},
StringSplitOptions.None)

' Determined by eyeballing data: The date field is always the first
field in the line

' Determine the delimiter to be used for the date format
Dim _delimiter As String = ""
If _ss(0).IndexOf(" ") 0 Then
_delimiter = " "
ElseIf _ss(0).IndexOf("/") 0 Then
_delimiter = "/"
ElseIf _ss(0).IndexOf("-") 0 Then
_delimiter = "-"
Else
Console.WriteLine("Unable to determine delimiter out of " &
_ss(0))
Return
End If
Console.WriteLine("Determined delimiter as '" & _delimiter & "'")

' Construct the date format to be used
Dim _format As String = String.Empty
' Split the first field on the date format delimiter
Dim _parts As String() = _ss(0).Split(New String() {_delimiter},
StringSplitOptions.None)
If _parts.Length = 2 Then
' If there are 2 parts then we only have day and month components
If Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsLetter(_parts(1).Chars(0)) Then
' The 1st part starts with a digit and the 2nd part starts with
a
letter
' so we can assume that the 1st part is the day and the 2nd part
is
the month
_format = New String("d"c, _parts(0).Length) & _delimiter &
"MMM"
If _parts(1).Length 3 Then _format &= "M"
ElseIf Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsDigit(_parts(1).Chars(0)) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and the
2nd
part is the month
_format = New String("d"c, _parts(0).Length) & _delimiter & New
String("M"c, _parts(0).Length))
If Integer.Parse(_parts(1)) 12 Then
' The 1st part must be the month and the 2nd part must be the
day
_format = New String("M"c, _parts(0).Length) & _delimiter &
New
String("d"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
End If
' There is big gotcha here if both parts are < 12 and are
different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
February or January 2
End If
ElseIf _parts.Length = 3 Then
' If there 3 parts then we have day, month and year components
' Assume that the year is always th 3rd part
If Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsLetter(_parts(1).Chars(0)) Then
' The 1st part starts with a digit and the 2nd part starts with
a
letter
' so we can assume that the 1st part is the day and the 2nd part
is
the month
_format = New String("d"c, _parts(0).Length) & _delimiter &
"MMM"
If _parts(1).Length 3 Then _format &= "M"
_format &= _delimiter & New String("y"c, _parts(2).Length)
ElseIf Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsDigit(_parts(1).Chars(0)) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and the
2nd
part is the month
_format = New String("d"c, _parts(0).Length) & _delimiter & New
String("M"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
If Integer.Parse(_parts(1)) 12 Then
' The 1st part must be the month and the 2nd part must be the
day
_format = New String("M"c, _parts(0).Length) & _delimiter &
New
String("d"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
End If
' There is big gotcha here if the forst two parts are < 12 and
are
different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
February or January 2
End If
End If
If _format.Length = 0 Then
' We were unable to determine the date format from the available
information
Console.WriteLine("Unable to determine format from " & _ss(0))
Return
End If

' We were able to determine the date format so we can continue and
parse
the dates
Console.WriteLine("Determined format as " & _format)

' Start from our actual first line of data
For _i As Integer = _firstline To _lines.Length - 1
_ss = _lines(_i).Split(New String() {" "},
StringSplitOptions.None)
Dim _date As DateTime = DateTime.ParseExact(_ss(0), _format,
Nothing)
Console.WriteLine("Read from input: " & _ss(0) & " - Interpreted
date:
" & _date.ToString("yyyy-MM-dd"))
Next

End Sub

Note, from the results, that if there is no year part then
DateTime.ParseExact will interpret tahe date being in the current year
as
determined from the system date at the time the code is executed.
"GS" <gs**********************@msnews.Nomail.comwrote in message
news:uc**************@TK2MSFTNGP06.phx.gbl...
look like I am not expressing myself clearly. although the
application
does
not know which format is used but does know for a given Set which date
format I deals with and can expect the same format for a given Set of
input.
I should not have used the term batch but a set of record. The only
possible variations are some records in certain sets may be split into
2
lines but that is not critical as the conditions can be described
before
hand and normalized by the another parse component
>
sample date
>
Set1: date format mask is "dd MMM"
Date Parts ID Parts Description location Quantitiy Unit Cost
Total
Cost
11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 Dec A1234988 Sample Parts description 1 10.00 20
18 Dec A1234988 Sample Parts description 1 10.00 20
19 Dec A1234988 Sample Parts description 1 10.00 20
12 Dec A1234988 Sample Parts description 1 10.00 20
>
>
Set 2 date format Mask is "dd MM yy"
Date Parts ID Parts Description location Quantitiy Unit Cost
Total
Cost
11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00
20.00
15 12 06 A1234988 Sample Parts description 1 10.00 20
18 12 06 A1234988 Sample Parts description 1 10.00 20
19 12 06 A1234988 Sample Parts description 1 10.00 20
12 12 06 A1234988 Sample Parts description 1 10.00 20
>
Set 3 date format mask "dd/MMM/06"
Parts Description location Quantitiy Unit Cost Total Cost
11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00
20.00
15/12/06 A1234988 Sample Parts description 1 10.00
2018/12/06 A1234988 Sample Parts description 1 10.00
2019/12/06 A1234988 Sample Parts description 1 10.00
2012/12/06 A1234988 Sample Parts description 1 10.00 20
>
Set 4 date format mask ""
Date Parts ID Parts Description location Quantitiy Unit Cost
Total
Cost
11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00
20.00
15/dec/06 A1234988 Sample Parts description 1 10.00 20
18/dec/06 A1234988 Sample Parts description 1 10.00 20
19/dec/06 A1234988 Sample Parts description 1 10.00 20
12/dec/06 A1234988 Sample Parts description 1 10.00 20
>
how do I deal with format without year, I do have cluse for other
parts
of
teh originatin website and optional default set by user
>
the sample data show variation of date format from set to set but the
date
format that I need to deal within a given set are consistant and user
has
influence to date format mask used.
>
Like Cor suggestion. don't let user enter the format but let the user
pick
from a list. that will like be case at least n the version 0
>
>
"GS" <gs**********************@msnews.Nomail.comwrote in message
news:eF**************@TK2MSFTNGP03.phx.gbl...
>You are sort of on the same track as mine.
>>
>>
>I must first apologize I did not tell you the complete story.
>>
>Although the application does not exactly know before hand what
format
>the
>data may come in, however part of the application allow user to
define
>and
>record favourite for a website
> - to extract by text or html
> - header content and format
> - record format and date format ( that is where the date format
mask
>come in)
> - optionally ordinal number for each column or re-ordering
> - trailer content and format
>>
>For a given batch, at least for the body, date format are uniform
>>
>furthermore, the need to make the extract process generic and
adaptable
>to
>the front end that takes the user definitions, I believe it would be
easier
>to "normalize" date string to "yyyy-mm-dd".
>>
>Also the end target for of may not necessarily be SQL database but
may
be
>text, pasted to word report. or excel by user
>>
>>
>Therefore, I can transform the date format mask to regex in the
appropriate
>format and identifier I can use regex,replace to normalize the date.
As
>a
>matter of fact the date separator does not have to / but can be space
as
>long as there are identifiable delimiter around the date string.
>>
>I already have code for dealing with regex for dates from prior
project.
>all I have to do is adapt to the present need
>>
>who knows, maybe I taken on a totally offbeat tract
>>
>"GS" <gs**********************@msnews.Nomail.comwrote in message
>news:%2****************@TK2MSFTNGP04.phx.gbl...
thanks for all pitched in so far.
>
let give it another shot.
>
looks like an easier way out would be
1.copy the date format string regex string holder and then derive
the
relevant regex expression to be used for date normalization later
in
part
>2:
replace the regex string the yyyy to regex year expression with
year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with
month
identifier
replace mm with the 2 digit month regex expression associated
with
>month
identifier
replace dd with the 2 digit day regix expression assoc. with
day
identifier
>
2. use the resulting regex in regex replace to normalize to
yyyy--mm-dd
>
>
any problem with the above approach?
>
"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
GS,
>
Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft
inbuild
globalization and than the to that related ToString option.
>
Cor
>
"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give
format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the
incoming
>format
string
a) use two array and statically match
b) use regex to find the order

>
>
>
>
>>
>>
>
>


Jan 9 '07 #21

P: n/a
Now I'm confused.

You have being the impression that you don't have any control on how the
data is 'gathered'.

Now you seem to be saying that you do have control.

If that is the case simply validate the data at the time the user inputs it.

If that is not the case then I think it's time you explained the big
picture.
"GS" <gs**********************@msnews.Nomail.comwrote in message
news:eS**************@TK2MSFTNGP02.phx.gbl...
oops. Please pardon my bad typo and proof reading
"GS" <gs**********************@msnews.Nomail.comwrote in message
news:Oe**************@TK2MSFTNGP06.phx.gbl...
I see the code (you put a lot effort in) does work on a lot of cases.
much
appreciated.

However would it not be better we get assistance from user who knows what
date format being used?

That is the rationale I let user somehow pick the date format mask.
Guessing date format is tough to master for all cases. Not only months,
days can be
indeterminate at times; worse when 2 digit year is used. I have seen
some
sample data that is way out of ordinary date format commonly seen in US.

Relying the first 1 or 2 being numeric would miss out quite a few cases.
Nonetheless. the code can be a default process in absence of user spec. .
thank you
very much for that

Sorry for misleading you with incomplete data samples.
There are sample data set where the first column is not date. on the
other
sometimes first 2 columns can also be dates as well as rarely another
column
else where can to date. this sound incredulous but that's what users have
to contend with.
>"Stephany Young" <noone@localhostwrote in message
news:ed**************@TK2MSFTNGP02.phx.gbl...
Now we're cooking with gas. I think that regex is overkill for this
'problem'. Sure, you can use it if you wish but I think you will be
making
>a
rod for your own back.

Here is a solution that works for your sample data. Create a Windows
Forms
project, plonk a button on the form and paste the following into the
form:
>
Private m_source1 As String = "Date Parts ID Parts Description
location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"15 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 Dec A1234988 Sample Parts description 1 10.00 20"

Private m_source2 As String = "Date Parts ID Parts Description
location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00"
&
Environment.NewLine & _
"15 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"

Private m_source3 As String = "Parts Parts ID Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00"
&
Environment.NewLine & _
"15/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12/12/06 A1234988 Sample Parts description 1 10.00 20"

Private m_source4 As String = "Date Parts ID Parts Description
location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00
20.00"
&
Environment.NewLine & _
"15/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12/dec/06 A1234988 Sample Parts description 1 10.00 20"

Private m_source5 As String = "Date Parts ID Parts Description
location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"12 13 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00"
&
Environment.NewLine & _
"12 15 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 18 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 19 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles Button1.Click

Console.WriteLine()

Console.WriteLine("Sample 1")

ProcessData(m_source1)

Console.WriteLine()

Console.WriteLine("Sample 2")

ProcessData(m_source2)

Console.WriteLine()

Console.WriteLine("Sample 3")

ProcessData(m_source3)

Console.WriteLine()

Console.WriteLine("Sample 4")

ProcessData(m_source4)

Console.WriteLine()

Console.WriteLine("Sample 5")

ProcessData(m_source5)

Console.WriteLine()

End Sub

Private Sub ProcessData(ByVal source As String)

' Assumption: Lines of data are seperated by a carriage return/line
feed
pair
Dim _lines As String() = source.Split(New String()
{Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)

' Determined by eyeballing data: All 'fields' are delimited by a
pair
>of
spaces
Dim _ss As String() = _lines(0).Split(New String() {" "},
StringSplitOptions.None)

' Determine which line is the first line of actual data
' If the first line is a heading line then all characters of the
first
field will be letters
Dim _lettercount As Integer = 0
For Each _c As Char In _ss(0)
If Char.IsLetter(_c) Then _lettercount += 1
Next
Dim _firstline As Integer = 0
If _lettercount = _ss(0).Length Then _firstline = 1

'Split the first actual line on the field delimiter
_ss = _lines(_firstline).Split(New String() {" "},
StringSplitOptions.None)

' Determined by eyeballing data: The date field is always the first
field in the line

' Determine the delimiter to be used for the date format
Dim _delimiter As String = ""
If _ss(0).IndexOf(" ") 0 Then
_delimiter = " "
ElseIf _ss(0).IndexOf("/") 0 Then
_delimiter = "/"
ElseIf _ss(0).IndexOf("-") 0 Then
_delimiter = "-"
Else
Console.WriteLine("Unable to determine delimiter out of " &
_ss(0))
Return
End If
Console.WriteLine("Determined delimiter as '" & _delimiter & "'")

' Construct the date format to be used
Dim _format As String = String.Empty
' Split the first field on the date format delimiter
Dim _parts As String() = _ss(0).Split(New String() {_delimiter},
StringSplitOptions.None)
If _parts.Length = 2 Then
' If there are 2 parts then we only have day and month components
If Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsLetter(_parts(1).Chars(0)) Then
' The 1st part starts with a digit and the 2nd part starts with
a
letter
' so we can assume that the 1st part is the day and the 2nd
part
is
the month
_format = New String("d"c, _parts(0).Length) & _delimiter &
"MMM"
If _parts(1).Length 3 Then _format &= "M"
ElseIf Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsDigit(_parts(1).Chars(0)) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and
the
2nd
part is the month
_format = New String("d"c, _parts(0).Length) & _delimiter & New
String("M"c, _parts(0).Length))
If Integer.Parse(_parts(1)) 12 Then
' The 1st part must be the month and the 2nd part must be the
day
_format = New String("M"c, _parts(0).Length) & _delimiter &
New
String("d"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
End If
' There is big gotcha here if both parts are < 12 and are
different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be
1
February or January 2
End If
ElseIf _parts.Length = 3 Then
' If there 3 parts then we have day, month and year components
' Assume that the year is always th 3rd part
If Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsLetter(_parts(1).Chars(0)) Then
' The 1st part starts with a digit and the 2nd part starts with
a
letter
' so we can assume that the 1st part is the day and the 2nd
part
is
the month
_format = New String("d"c, _parts(0).Length) & _delimiter &
"MMM"
If _parts(1).Length 3 Then _format &= "M"
_format &= _delimiter & New String("y"c, _parts(2).Length)
ElseIf Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsDigit(_parts(1).Chars(0)) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and
the
2nd
part is the month
_format = New String("d"c, _parts(0).Length) & _delimiter & New
String("M"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
If Integer.Parse(_parts(1)) 12 Then
' The 1st part must be the month and the 2nd part must be the
day
_format = New String("M"c, _parts(0).Length) & _delimiter &
New
String("d"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
End If
' There is big gotcha here if the forst two parts are < 12 and
are
different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be
1
February or January 2
End If
End If
If _format.Length = 0 Then
' We were unable to determine the date format from the available
information
Console.WriteLine("Unable to determine format from " & _ss(0))
Return
End If

' We were able to determine the date format so we can continue and
parse
the dates
Console.WriteLine("Determined format as " & _format)

' Start from our actual first line of data
For _i As Integer = _firstline To _lines.Length - 1
_ss = _lines(_i).Split(New String() {" "},
StringSplitOptions.None)
Dim _date As DateTime = DateTime.ParseExact(_ss(0), _format,
Nothing)
Console.WriteLine("Read from input: " & _ss(0) & " - Interpreted
date:
" & _date.ToString("yyyy-MM-dd"))
Next

End Sub

Note, from the results, that if there is no year part then
DateTime.ParseExact will interpret tahe date being in the current year
as
determined from the system date at the time the code is executed.
"GS" <gs**********************@msnews.Nomail.comwrote in message
news:uc**************@TK2MSFTNGP06.phx.gbl...
look like I am not expressing myself clearly. although the
application
does
not know which format is used but does know for a given Set which
date
format I deals with and can expect the same format for a given Set of
input.
I should not have used the term batch but a set of record. The only
possible variations are some records in certain sets may be split
into
2
lines but that is not critical as the conditions can be described
before
hand and normalized by the another parse component

sample date

Set1: date format mask is "dd MMM"
Date Parts ID Parts Description location Quantitiy Unit Cost
Total
Cost
11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 Dec A1234988 Sample Parts description 1 10.00 20
18 Dec A1234988 Sample Parts description 1 10.00 20
19 Dec A1234988 Sample Parts description 1 10.00 20
12 Dec A1234988 Sample Parts description 1 10.00 20
Set 2 date format Mask is "dd MM yy"
Date Parts ID Parts Description location Quantitiy Unit Cost
Total
Cost
11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00
20.00
15 12 06 A1234988 Sample Parts description 1 10.00 20
18 12 06 A1234988 Sample Parts description 1 10.00 20
19 12 06 A1234988 Sample Parts description 1 10.00 20
12 12 06 A1234988 Sample Parts description 1 10.00 20

Set 3 date format mask "dd/MMM/06"
Parts Description location Quantitiy Unit Cost Total Cost
11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00
20.00
15/12/06 A1234988 Sample Parts description 1 10.00
2018/12/06 A1234988 Sample Parts description 1 10.00
2019/12/06 A1234988 Sample Parts description 1 10.00
2012/12/06 A1234988 Sample Parts description 1 10.00 20

Set 4 date format mask ""
Date Parts ID Parts Description location Quantitiy Unit Cost
Total
Cost
11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00
20.00
15/dec/06 A1234988 Sample Parts description 1 10.00 20
18/dec/06 A1234988 Sample Parts description 1 10.00 20
19/dec/06 A1234988 Sample Parts description 1 10.00 20
12/dec/06 A1234988 Sample Parts description 1 10.00 20

how do I deal with format without year, I do have cluse for other
parts
>of
teh originatin website and optional default set by user

the sample data show variation of date format from set to set but the
date
format that I need to deal within a given set are consistant and user
has
influence to date format mask used.

Like Cor suggestion. don't let user enter the format but let the user
pick
from a list. that will like be case at least n the version 0
"GS" <gs**********************@msnews.Nomail.comwrote in message
news:eF**************@TK2MSFTNGP03.phx.gbl...
You are sort of on the same track as mine.
I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what
format
>the
data may come in, however part of the application allow user to
define
>and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format
mask
>come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and
adaptable
>to
the front end that takes the user definitions, I believe it would be
easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but
may
>be
>text, pasted to word report. or excel by user
Therefore, I can transform the date format mask to regex in the
appropriate
format and identifier I can use regex,replace to normalize the date.
As
>a
matter of fact the date separator does not have to / but can be
space
as
>long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior
project.
>all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

"GS" <gs**********************@msnews.Nomail.comwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive
the
relevant regex expression to be used for date normalization later
in
part
2:
replace the regex string the yyyy to regex year expression
with
year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with
month
identifier
replace mm with the 2 digit month regex expression associated
with
>month
identifier
replace dd with the 2 digit day regix expression assoc. with
day
identifier

2. use the resulting regex in regex replace to normalize to
yyyy--mm-dd
>

any problem with the above approach?

"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft
inbuild
globalization and than the to that related ToString option.

Cor

"gs" <gs@dontMail.telusschreef in bericht
news:Ot**************@TK2MSFTNGP03.phx.gbl...
let say I have to deal with various date format and I am give
format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the
incoming
>format
string
a) use two array and statically match
b) use regex to find the order







Jan 9 '07 #22

P: n/a

Cor Ligthert [MVP] wrote:
Stephany,

I am curious, what does this phrase mean, I don't know it.
Now we're cooking with gas.

(Living in Holland which is above one of the former biggest gasbells of
Europe)

Cor
It's somewhat, in some senses but not perfectly, an opposite of
"gezellig"

:)

Jan 9 '07 #23

P: n/a
GS,

You are assuming something that is standard. For today I can write in my
country

9-1-07
09-1-07
9-01-7
09-1-2007
9-jan-2007
9-januari-2007
etc. not any law tels me how to do it, it is not seldom done as
2007-01-09 as well in ISO.

And than every culture has its own style.
Cor
Jan 9 '07 #24

P: n/a
If my understanding of the meaning 'gezellig' is correct, then 'cooking with
gas' is nothing like any sort of opposite of 'gezellig'.

My understanding of 'gezellig' is that, although not directly translatable,
means something like 'feeling good amongst family and/or friends' but has
much more subtle meanings than that.

'Cooking with gas' means to be working fast/proceeding rapidly. For example:

After working with thos old hand tools, power tools will
make you feel like you are really cooking with gas.

Metaphorically it is comparing a gas cooker where you get instant heat when
you light it, to other cookers (electric, wood coal, etc) where they take a
while to warm up.
"kgerritsen" <ki***@drexel.eduwrote in message
news:11**********************@s34g2000cwa.googlegr oups.com...
>
Cor Ligthert [MVP] wrote:
>Stephany,

I am curious, what does this phrase mean, I don't know it.
Now we're cooking with gas.

(Living in Holland which is above one of the former biggest gasbells of
Europe)

Cor

It's somewhat, in some senses but not perfectly, an opposite of
"gezellig"

:)

Jan 9 '07 #25

P: n/a
GS
you're right every culture has it won style.

However the current project charter covers only data from
professional/commercial site with reputation of accuracy of date format
and content
other application that has consistent date format
of course if we can cover other unusual date format variations without
exceeding budget, it will be welcomed but I sure don't want to get involved
until everything is completed for the charter.

the date in the data gathered by user ( they don't key directly, thank
goodness)do fall in 2 digit day and month, standard English 3 letter month
or full months - no spelling errors. The users don't really enter the data.
user controls the site the application to visit. One way or another user
specify a date format mask for the data to be processed

NO the component and application is not expected to handle spelling error
but expected to deal with common date format in US, Canada(English ). There
may be more later on but that is not my worry for this project scope.

thanks to the aborted metrication (and so call freedom of speech) there are
a few more variants of date format from US.
As of yyyy-mm-dd format is a safe common format to use in N. America for
software published by Microsoft. I have yet to seen any Microsoft
application fails to convert the yyyy-mm-dd string to date properly among my
users base unless they arbitrary to set the windows date format to
yyyy-dd-mm

I suppose using yyyy-MMM-dd as the remediate string date will avoid that
issue al together

The real tricky part is to validate the users' date format mask against
actual data found. that is why regex replace was tempting to me
you are right regex replace will still not be able handle all mistakes
"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:u8**************@TK2MSFTNGP04.phx.gbl...
GS,

You are assuming something that is standard. For today I can write in my
country

9-1-07
09-1-07
9-01-7
09-1-2007
9-jan-2007
9-januari-2007
etc. not any law tels me how to do it, it is not seldom done as
2007-01-09 as well in ISO.

And than every culture has its own style.
Cor


Jan 10 '07 #26

P: n/a
GS,

In my idea dit you not see my first advice just checking agains the DateTime
with TryParse will give you direct the idea if the dateTime can be valid.

Another addition. Canada(English) has AFAIK the same date time patern as all
former and current Gemenebest members in the parts where English is the
spoken languages.

Cor

"GS" <gs**********************@msnews.Nomail.comschre ef in bericht
news:es**************@TK2MSFTNGP03.phx.gbl...
you're right every culture has it won style.

However the current project charter covers only data from
professional/commercial site with reputation of accuracy of date format
and content
other application that has consistent date format
of course if we can cover other unusual date format variations without
exceeding budget, it will be welcomed but I sure don't want to get
involved
until everything is completed for the charter.

the date in the data gathered by user ( they don't key directly, thank
goodness)do fall in 2 digit day and month, standard English 3 letter
month
or full months - no spelling errors. The users don't really enter the
data.
user controls the site the application to visit. One way or another user
specify a date format mask for the data to be processed

NO the component and application is not expected to handle spelling error
but expected to deal with common date format in US, Canada(English ).
There
may be more later on but that is not my worry for this project scope.

thanks to the aborted metrication (and so call freedom of speech) there
are
a few more variants of date format from US.
As of yyyy-mm-dd format is a safe common format to use in N. America for
software published by Microsoft. I have yet to seen any Microsoft
application fails to convert the yyyy-mm-dd string to date properly among
my
users base unless they arbitrary to set the windows date format to
yyyy-dd-mm

I suppose using yyyy-MMM-dd as the remediate string date will avoid that
issue al together

The real tricky part is to validate the users' date format mask against
actual data found. that is why regex replace was tempting to me
you are right regex replace will still not be able handle all mistakes
"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:u8**************@TK2MSFTNGP04.phx.gbl...
>GS,

You are assuming something that is standard. For today I can write in my
country

9-1-07
09-1-07
9-01-7
09-1-2007
9-jan-2007
9-januari-2007
etc. not any law tels me how to do it, it is not seldom done as
2007-01-09 as well in ISO.

And than every culture has its own style.
Cor



Jan 10 '07 #27

P: n/a
Stephany,

Living in a country where Gas is the same as the most ordinair basic stuff,
does your sentence not add anything than not "gezellig".

Sitting at an open fire in a open wood and talk with each other is more
something for us as gezellig, for which we by the way have to go out of our
country if we don't do it illegal or are really rich enough.

If I am well informed, than you are not living in a country with not so much
people on a square kilometre as here, so the idea about that can be
completely opposite.
After working with thos old hand tools, power tools will
make you feel like you are really cooking with gas.
Some people can make a methaphoor to simple task fullfiling drag and drop
tools, I can assure you that I find that far from gezellig.

:-)

Cor
..

"Stephany Young" <noone@localhostschreef in bericht
news:%2****************@TK2MSFTNGP04.phx.gbl...
If my understanding of the meaning 'gezellig' is correct, then 'cooking
with gas' is nothing like any sort of opposite of 'gezellig'.

My understanding of 'gezellig' is that, although not directly
translatable, means something like 'feeling good amongst family and/or
friends' but has much more subtle meanings than that.

'Cooking with gas' means to be working fast/proceeding rapidly. For
example:

After working with thos old hand tools, power tools will
make you feel like you are really cooking with gas.

Metaphorically it is comparing a gas cooker where you get instant heat
when you light it, to other cookers (electric, wood coal, etc) where they
take a while to warm up.
"kgerritsen" <ki***@drexel.eduwrote in message
news:11**********************@s34g2000cwa.googlegr oups.com...
>>
Cor Ligthert [MVP] wrote:
>>Stephany,

I am curious, what does this phrase mean, I don't know it.

Now we're cooking with gas.

(Living in Holland which is above one of the former biggest gasbells of
Europe)

Cor

It's somewhat, in some senses but not perfectly, an opposite of
"gezellig"

:)


Jan 10 '07 #28

P: n/a
Let me put it this way:

You are working on a project and you are unable
to make any progress because you are waiting on
some vital information. At this point you are
'bogged down'.

The information that you are waiting on arrives,
and, as a result, you are now able to make rapid
progress toward completion of the project. Now
you are 'cooking with gas'.

'Cooking with gas' is to do with the 'rush' of activity.
"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
Stephany,

Living in a country where Gas is the same as the most ordinair basic
stuff, does your sentence not add anything than not "gezellig".

Sitting at an open fire in a open wood and talk with each other is more
something for us as gezellig, for which we by the way have to go out of
our country if we don't do it illegal or are really rich enough.

If I am well informed, than you are not living in a country with not so
much people on a square kilometre as here, so the idea about that can be
completely opposite.
> After working with thos old hand tools, power tools will
make you feel like you are really cooking with gas.

Some people can make a methaphoor to simple task fullfiling drag and drop
tools, I can assure you that I find that far from gezellig.

:-)

Cor
.

"Stephany Young" <noone@localhostschreef in bericht
news:%2****************@TK2MSFTNGP04.phx.gbl...
>If my understanding of the meaning 'gezellig' is correct, then 'cooking
with gas' is nothing like any sort of opposite of 'gezellig'.

My understanding of 'gezellig' is that, although not directly
translatable, means something like 'feeling good amongst family and/or
friends' but has much more subtle meanings than that.

'Cooking with gas' means to be working fast/proceeding rapidly. For
example:

After working with thos old hand tools, power tools will
make you feel like you are really cooking with gas.

Metaphorically it is comparing a gas cooker where you get instant heat
when you light it, to other cookers (electric, wood coal, etc) where they
take a while to warm up.
"kgerritsen" <ki***@drexel.eduwrote in message
news:11**********************@s34g2000cwa.googleg roups.com...
>>>
Cor Ligthert [MVP] wrote:
Stephany,

I am curious, what does this phrase mean, I don't know it.

Now we're cooking with gas.

(Living in Holland which is above one of the former biggest gasbells of
Europe)

Cor

It's somewhat, in some senses but not perfectly, an opposite of
"gezellig"

:)



Jan 10 '07 #29

P: n/a
Stephany,

I had understood this sentence already the moment you had placed it.

However this was really a statement I had never seen (it exist before you
start to correct me). I could not resist to write as I did.

Cor

"Stephany Young" <noone@localhostschreef in bericht
news:O8*************@TK2MSFTNGP03.phx.gbl...
Let me put it this way:

You are working on a project and you are unable
to make any progress because you are waiting on
some vital information. At this point you are
'bogged down'.

The information that you are waiting on arrives,
and, as a result, you are now able to make rapid
progress toward completion of the project. Now
you are 'cooking with gas'.

'Cooking with gas' is to do with the 'rush' of activity.
"Cor Ligthert [MVP]" <no************@planet.nlwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
>Stephany,

Living in a country where Gas is the same as the most ordinair basic
stuff, does your sentence not add anything than not "gezellig".

Sitting at an open fire in a open wood and talk with each other is more
something for us as gezellig, for which we by the way have to go out of
our country if we don't do it illegal or are really rich enough.

If I am well informed, than you are not living in a country with not so
much people on a square kilometre as here, so the idea about that can be
completely opposite.
>> After working with thos old hand tools, power tools will
make you feel like you are really cooking with gas.

Some people can make a methaphoor to simple task fullfiling drag and drop
tools, I can assure you that I find that far from gezellig.

:-)

Cor
.

"Stephany Young" <noone@localhostschreef in bericht
news:%2****************@TK2MSFTNGP04.phx.gbl...
>>If my understanding of the meaning 'gezellig' is correct, then 'cooking
with gas' is nothing like any sort of opposite of 'gezellig'.

My understanding of 'gezellig' is that, although not directly
translatable, means something like 'feeling good amongst family and/or
friends' but has much more subtle meanings than that.

'Cooking with gas' means to be working fast/proceeding rapidly. For
example:

After working with thos old hand tools, power tools will
make you feel like you are really cooking with gas.

Metaphorically it is comparing a gas cooker where you get instant heat
when you light it, to other cookers (electric, wood coal, etc) where
they take a while to warm up.
"kgerritsen" <ki***@drexel.eduwrote in message
news:11**********************@s34g2000cwa.google groups.com...

Cor Ligthert [MVP] wrote:
Stephany,
>
I am curious, what does this phrase mean, I don't know it.
>
Now we're cooking with gas.
>
(Living in Holland which is above one of the former biggest gasbells
of
Europe)
>
Cor

It's somewhat, in some senses but not perfectly, an opposite of
"gezellig"

:)



Jan 11 '07 #30

This discussion thread is closed

Replies have been disabled for this discussion.