By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,538 Members | 2,211 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,538 IT Pros & Developers. It's quick & easy.

Which RegEx Testing Tool Do You Prefer?

P: n/a
I'm using an .aspx tool I found at [1] but as nice as the interface is I
think I need to consider using others. Some can generate C# I understand.
Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/
Nov 19 '05 #1
Share this Question
Share on Google+
17 Replies


P: n/a
Regex Buddy is very good. It costs around $30.00, includes quite a few nice
features, including the ability to copy regular expressions in various
language string syntaxes, including C#. It has the ability to create
libraries of regular expressions, a nice visual builder, color-coding, and
quite a bit more. Good testing environment. And it has some nice reference
material included.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:%2******************@tk2msftngp13.phx.gbl...
I'm using an .aspx tool I found at [1] but as nice as the interface is I
think I need to consider using others. Some can generate C# I understand.
Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/

Nov 19 '05 #2

P: n/a
I was looking at PowerGrep from the same dev group but like Regex Buddy I
don't like the buy before you try business model so that choice has to be on
the shelf for the moment but thanks for bringing it up. I assume you've used
Regex Buddy?

<%= Clinton Gallagher

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2***************@tk2msftngp13.phx.gbl...
Regex Buddy is very good. It costs around $30.00, includes quite a few
nice features, including the ability to copy regular expressions in
various language string syntaxes, including C#. It has the ability to
create libraries of regular expressions, a nice visual builder,
color-coding, and quite a bit more. Good testing environment. And it has
some nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:%2******************@tk2msftngp13.phx.gbl...
I'm using an .aspx tool I found at [1] but as nice as the interface is I
think I need to consider using others. Some can generate C# I understand.
Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/


Nov 19 '05 #3

P: n/a
Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility, but it
is nowhere near as complete in its support for various newer Regular
Expression syntax and programming languages in general. It did have one nice
feature about it. You could split a Regular Expression across multiple
lines, which often made it easier to analyze. However, Regex Buddy has the
graphical tree view, and it is synchronized with the Regular Expression
itself, which more than makes up for the omission of breaking a Regular
Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:%2****************@TK2MSFTNGP15.phx.gbl...
I was looking at PowerGrep from the same dev group but like Regex Buddy I
don't like the buy before you try business model so that choice has to be
on the shelf for the moment but thanks for bringing it up. I assume you've
used Regex Buddy?

<%= Clinton Gallagher

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2***************@tk2msftngp13.phx.gbl...
Regex Buddy is very good. It costs around $30.00, includes quite a few
nice features, including the ability to copy regular expressions in
various language string syntaxes, including C#. It has the ability to
create libraries of regular expressions, a nice visual builder,
color-coding, and quite a bit more. Good testing environment. And it has
some nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:%2******************@tk2msftngp13.phx.gbl...
I'm using an .aspx tool I found at [1] but as nice as the interface is I
think I need to consider using others. Some can generate C# I
understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/



Nov 19 '05 #4

P: n/a
I saw a response to this question in the CSharp group, regarding a product
named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and playing
with it, I'd give it a try! So far I have found it to be excellent, having
capabilities that Regex Buddy does not have, and a much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2****************@TK2MSFTNGP12.phx.gbl...
Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility, but it
is nowhere near as complete in its support for various newer Regular
Expression syntax and programming languages in general. It did have one
nice feature about it. You could split a Regular Expression across
multiple lines, which often made it easier to analyze. However, Regex
Buddy has the graphical tree view, and it is synchronized with the Regular
Expression itself, which more than makes up for the omission of breaking a
Regular Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:%2****************@TK2MSFTNGP15.phx.gbl...
I was looking at PowerGrep from the same dev group but like Regex Buddy I
don't like the buy before you try business model so that choice has to be
on the shelf for the moment but thanks for bringing it up. I assume you've
used Regex Buddy?

<%= Clinton Gallagher

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2***************@tk2msftngp13.phx.gbl...
Regex Buddy is very good. It costs around $30.00, includes quite a few
nice features, including the ability to copy regular expressions in
various language string syntaxes, including C#. It has the ability to
create libraries of regular expressions, a nice visual builder,
color-coding, and quite a bit more. Good testing environment. And it has
some nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:%2******************@tk2msftngp13.phx.gbl...
I'm using an .aspx tool I found at [1] but as nice as the interface is
I think I need to consider using others. Some can generate C# I
understand. Your preferences please...

<%= Clinton Gallagher

[1] http://forta.com/books/0672325667/



Nov 19 '05 #5

P: n/a
Thanks Kevin. I saw that post too and am going to download Expresso in a few
minutes. I know you don't need to be psychic to figure out what I'm likely
to be asking next :-)

<%= Clinton Gallagher
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:O0**************@tk2msftngp13.phx.gbl...
I saw a response to this question in the CSharp group, regarding a product
named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and playing
with it, I'd give it a try! So far I have found it to be excellent, having
capabilities that Regex Buddy does not have, and a much more intuitive
GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2****************@TK2MSFTNGP12.phx.gbl...
Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility, but
it is nowhere near as complete in its support for various newer Regular
Expression syntax and programming languages in general. It did have one
nice feature about it. You could split a Regular Expression across
multiple lines, which often made it easier to analyze. However, Regex
Buddy has the graphical tree view, and it is synchronized with the
Regular Expression itself, which more than makes up for the omission of
breaking a Regular Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:%2****************@TK2MSFTNGP15.phx.gbl...
I was looking at PowerGrep from the same dev group but like Regex Buddy I
don't like the buy before you try business model so that choice has to be
on the shelf for the moment but thanks for bringing it up. I assume
you've used Regex Buddy?

<%= Clinton Gallagher

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2***************@tk2msftngp13.phx.gbl...
Regex Buddy is very good. It costs around $30.00, includes quite a few
nice features, including the ability to copy regular expressions in
various language string syntaxes, including C#. It has the ability to
create libraries of regular expressions, a nice visual builder,
color-coding, and quite a bit more. Good testing environment. And it
has some nice reference material included.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:%2******************@tk2msftngp13.phx.gbl...
> I'm using an .aspx tool I found at [1] but as nice as the interface is
> I think I need to consider using others. Some can generate C# I
> understand. Your preferences please...
>
> <%= Clinton Gallagher
>
> [1] http://forta.com/books/0672325667/
>



Nov 19 '05 #6

P: n/a
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.


Juan T. Llibre, ASP.NET MVP
ASP.NET FAQ : http://asp.net.do/faq/
Foros de ASP.NET en Español : http://asp.net.do/foros/
======================================
"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:Of**************@TK2MSFTNGP10.phx.gbl...
Thanks Kevin. I saw that post too and am going to download Expresso in a few minutes. I
know you don't need to be psychic to figure out what I'm likely to be asking next :-)

<%= Clinton Gallagher
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:O0**************@tk2msftngp13.phx.gbl...
I saw a response to this question in the CSharp group, regarding a product named
"Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and playing with it, I'd
give it a try! So far I have found it to be excellent, having capabilities that Regex
Buddy does not have, and a much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2****************@TK2MSFTNGP12.phx.gbl...
Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility, but it is nowhere
near as complete in its support for various newer Regular Expression syntax and
programming languages in general. It did have one nice feature about it. You could
split a Regular Expression across multiple lines, which often made it easier to
analyze. However, Regex Buddy has the graphical tree view, and it is synchronized with
the Regular Expression itself, which more than makes up for the omission of breaking a
Regular Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:%2****************@TK2MSFTNGP15.phx.gbl...
I was looking at PowerGrep from the same dev group but like Regex Buddy I don't like
the buy before you try business model so that choice has to be on the shelf for the
moment but thanks for bringing it up. I assume you've used Regex Buddy?

<%= Clinton Gallagher

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2***************@tk2msftngp13.phx.gbl...
> Regex Buddy is very good. It costs around $30.00, includes quite a few nice
> features, including the ability to copy regular expressions in various language
> string syntaxes, including C#. It has the ability to create libraries of regular
> expressions, a nice visual builder, color-coding, and quite a bit more. Good testing
> environment. And it has some nice reference material included.
>
> --
> HTH,
>
> Kevin Spencer
> Microsoft MVP
> .Net Developer
> Ambiguity has a certain quality to it.
>
> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
> news:%2******************@tk2msftngp13.phx.gbl...
>> I'm using an .aspx tool I found at [1] but as nice as the interface is I think I
>> need to consider using others. Some can generate C# I understand. Your preferences
>> please...
>>
>> <%= Clinton Gallagher
>>
>> [1] http://forta.com/books/0672325667/
>>
>
>



Nov 19 '05 #7

P: n/a
Hi Juan,
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.
The problem with that is that you can write a Regular Expression that
matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring "literal
string" in my first sentence. Of course, the real power of regular
expressions is the abilty to match *patterns* in a string, perform grouping,
etc. So, like any programming language (which it is, in a sense), Regular
Expressions have a shorthand syntax that allows one to create patterns of a
large variety of types. A simple example of this would be:

(literal) (string)

This captures the same match as the first, but puts the string "literal"
into a group, and the string "string" into a second group. But of course, we
have already exceeded your desired requirement. On the other hand, we have
made a regular expression that is perhaps more useful (in some situations)
than the first.

And of course, the possible types and combinations of patterns are almost
endless, including wildcard patterns, special characters, boolean rules, and
so on.

Yeah, it's like reading some kind of incredibly concise shorthand code,
without even line breaks or brackets to help. That's why I was so pleased to
see that Expresso allows you to break your regular expression across
multiple lines while building it. That helps a good bit!

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

"Juan T. Llibre" <no***********@nowhere.com> wrote in message
news:ei**************@TK2MSFTNGP12.phx.gbl... The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.


Juan T. Llibre, ASP.NET MVP
ASP.NET FAQ : http://asp.net.do/faq/
Foros de ASP.NET en Español : http://asp.net.do/foros/
======================================
"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:Of**************@TK2MSFTNGP10.phx.gbl...
Thanks Kevin. I saw that post too and am going to download Expresso in a
few minutes. I know you don't need to be psychic to figure out what I'm
likely to be asking next :-)

<%= Clinton Gallagher
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:O0**************@tk2msftngp13.phx.gbl...
I saw a response to this question in the CSharp group, regarding a
product named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and
playing with it, I'd give it a try! So far I have found it to be
excellent, having capabilities that Regex Buddy does not have, and a
much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2****************@TK2MSFTNGP12.phx.gbl...
Hi Clinton,

Yes, I have it. I previously used the freeware Regex Coach Utility, but
it is nowhere near as complete in its support for various newer Regular
Expression syntax and programming languages in general. It did have one
nice feature about it. You could split a Regular Expression across
multiple lines, which often made it easier to analyze. However, Regex
Buddy has the graphical tree view, and it is synchronized with the
Regular Expression itself, which more than makes up for the omission of
breaking a Regular Expression across multiple lines.

BTW, it also has a GREP utility built in.

In short, it is well worth the 30 bucks.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:%2****************@TK2MSFTNGP15.phx.gbl...
>I was looking at PowerGrep from the same dev group but like Regex Buddy
>I don't like the buy before you try business model so that choice has
>to be on the shelf for the moment but thanks for bringing it up. I
>assume you've used Regex Buddy?
>
> <%= Clinton Gallagher
>
>
>
> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
> news:%2***************@tk2msftngp13.phx.gbl...
>> Regex Buddy is very good. It costs around $30.00, includes quite a
>> few nice features, including the ability to copy regular expressions
>> in various language string syntaxes, including C#. It has the ability
>> to create libraries of regular expressions, a nice visual builder,
>> color-coding, and quite a bit more. Good testing environment. And it
>> has some nice reference material included.
>>
>> --
>> HTH,
>>
>> Kevin Spencer
>> Microsoft MVP
>> .Net Developer
>> Ambiguity has a certain quality to it.
>>
>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
>> message news:%2******************@tk2msftngp13.phx.gbl...
>>> I'm using an .aspx tool I found at [1] but as nice as the interface
>>> is I think I need to consider using others. Some can generate C# I
>>> understand. Your preferences please...
>>>
>>> <%= Clinton Gallagher
>>>
>>> [1] http://forta.com/books/0672325667/
>>>
>>
>>
>
>



Nov 19 '05 #8

P: n/a
re:
That's why I was so pleased to see that Expresso allows you to break your regular
expression across multiple lines while building it. That helps a good bit!
I really like its "Analyze" feature. The "Builder" is quite good, too!

Juan T. Llibre, ASP.NET MVP
ASP.NET FAQ : http://asp.net.do/faq/
Foros de ASP.NET en Español : http://asp.net.do/foros/
======================================
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:Ow****************@tk2msftngp13.phx.gbl... Hi Juan,
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.


The problem with that is that you can write a Regular Expression that matches a literal
string quite easily. For example:

literal string

The above is a regular expression which will match the substring "literal string" in my
first sentence. Of course, the real power of regular expressions is the abilty to match
*patterns* in a string, perform grouping, etc. So, like any programming language (which
it is, in a sense), Regular Expressions have a shorthand syntax that allows one to
create patterns of a large variety of types. A simple example of this would be:

(literal) (string)

This captures the same match as the first, but puts the string "literal" into a group,
and the string "string" into a second group. But of course, we have already exceeded
your desired requirement. On the other hand, we have made a regular expression that is
perhaps more useful (in some situations) than the first.

And of course, the possible types and combinations of patterns are almost endless,
including wildcard patterns, special characters, boolean rules, and so on.

Yeah, it's like reading some kind of incredibly concise shorthand code, without even
line breaks or brackets to help. That's why I was so pleased to see that Expresso allows
you to break your regular expression across multiple lines while building it. That helps
a good bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"Juan T. Llibre" <no***********@nowhere.com> wrote in message
news:ei**************@TK2MSFTNGP12.phx.gbl...
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.


Juan T. Llibre, ASP.NET MVP
ASP.NET FAQ : http://asp.net.do/faq/
Foros de ASP.NET en Español : http://asp.net.do/foros/
======================================
"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:Of**************@TK2MSFTNGP10.phx.gbl...
Thanks Kevin. I saw that post too and am going to download Expresso in a few minutes.
I know you don't need to be psychic to figure out what I'm likely to be asking next
:-)

<%= Clinton Gallagher
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:O0**************@tk2msftngp13.phx.gbl...
I saw a response to this question in the CSharp group, regarding a product named
"Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and playing with it,
I'd give it a try! So far I have found it to be excellent, having capabilities that
Regex Buddy does not have, and a much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2****************@TK2MSFTNGP12.phx.gbl...
> Hi Clinton,
>
> Yes, I have it. I previously used the freeware Regex Coach Utility, but it is
> nowhere near as complete in its support for various newer Regular Expression syntax
> and programming languages in general. It did have one nice feature about it. You
> could split a Regular Expression across multiple lines, which often made it easier
> to analyze. However, Regex Buddy has the graphical tree view, and it is synchronized
> with the Regular Expression itself, which more than makes up for the omission of
> breaking a Regular Expression across multiple lines.
>
> BTW, it also has a GREP utility built in.
>
> In short, it is well worth the 30 bucks.
>
> --
> HTH,
>
> Kevin Spencer
> Microsoft MVP
> .Net Developer
> Ambiguity has a certain quality to it.
>
> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
> news:%2****************@TK2MSFTNGP15.phx.gbl...
>>I was looking at PowerGrep from the same dev group but like Regex Buddy I don't like
>>the buy before you try business model so that choice has to be on the shelf for the
>>moment but thanks for bringing it up. I assume you've used Regex Buddy?
>>
>> <%= Clinton Gallagher
>>
>>
>>
>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
>> news:%2***************@tk2msftngp13.phx.gbl...
>>> Regex Buddy is very good. It costs around $30.00, includes quite a few nice
>>> features, including the ability to copy regular expressions in various language
>>> string syntaxes, including C#. It has the ability to create libraries of regular
>>> expressions, a nice visual builder, color-coding, and quite a bit more. Good
>>> testing environment. And it has some nice reference material included.
>>>
>>> --
>>> HTH,
>>>
>>> Kevin Spencer
>>> Microsoft MVP
>>> .Net Developer
>>> Ambiguity has a certain quality to it.
>>>
>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
>>> news:%2******************@tk2msftngp13.phx.gbl...
>>>> I'm using an .aspx tool I found at [1] but as nice as the interface is I think I
>>>> need to consider using others. Some can generate C# I understand. Your
>>>> preferences please...
>>>>
>>>> <%= Clinton Gallagher
>>>>
>>>> [1] http://forta.com/books/0672325667/
>>>>
>>>
>>>
>>
>>
>
>



Nov 19 '05 #9

P: n/a
Kevin, have you ever heard the expression "preaching to the choir?" :-)

I've got the basic pattern matching theory understood but its the use of
expressions to disallow or replace certain characters and/or strings that
I'm trying to really understand thoroughly. The following example
illustrates...

// Example
Lawn Mowers, Repairs & Services - lawnmowers.com

A typical page title that when entered into a TextBox meant to capture
string data for an RSS 2.0 title element should use &amp; instead of the &
to represent the ampersand. I've got an expression that works well for the
example but can't figure out (with the expression I have) how to match the &
and replace it with &amp; (yet) -- or -- how to use the expression I have to
force the 2.0 Regular Expression Validator to fail when the & is present in
the string.

// Expression
[a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*

I also really appreciate Expresso's Analyzer. It is outstanding that
Expresso seems to make it easy for us to pick expressions apart piece by
piece and explain them in English.
<%= Clinton Gallagher


"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:Ow****************@tk2msftngp13.phx.gbl...
Hi Juan,
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.


The problem with that is that you can write a Regular Expression that
matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring "literal
string" in my first sentence. Of course, the real power of regular
expressions is the abilty to match *patterns* in a string, perform
grouping, etc. So, like any programming language (which it is, in a
sense), Regular Expressions have a shorthand syntax that allows one to
create patterns of a large variety of types. A simple example of this
would be:

(literal) (string)

This captures the same match as the first, but puts the string "literal"
into a group, and the string "string" into a second group. But of course,
we have already exceeded your desired requirement. On the other hand, we
have made a regular expression that is perhaps more useful (in some
situations) than the first.

And of course, the possible types and combinations of patterns are almost
endless, including wildcard patterns, special characters, boolean rules,
and so on.

Yeah, it's like reading some kind of incredibly concise shorthand code,
without even line breaks or brackets to help. That's why I was so pleased
to see that Expresso allows you to break your regular expression across
multiple lines while building it. That helps a good bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"Juan T. Llibre" <no***********@nowhere.com> wrote in message
news:ei**************@TK2MSFTNGP12.phx.gbl...
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.


Juan T. Llibre, ASP.NET MVP
ASP.NET FAQ : http://asp.net.do/faq/
Foros de ASP.NET en Español : http://asp.net.do/foros/
======================================
"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:Of**************@TK2MSFTNGP10.phx.gbl...
Thanks Kevin. I saw that post too and am going to download Expresso in a
few minutes. I know you don't need to be psychic to figure out what I'm
likely to be asking next :-)

<%= Clinton Gallagher
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:O0**************@tk2msftngp13.phx.gbl...
I saw a response to this question in the CSharp group, regarding a
product named "Expresso"

http://www.ultrapico.com/Expresso.htm

Expresso is .Net freeware, and after downloading, installing, and
playing with it, I'd give it a try! So far I have found it to be
excellent, having capabilities that Regex Buddy does not have, and a
much more intuitive GUI.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2****************@TK2MSFTNGP12.phx.gbl...
> Hi Clinton,
>
> Yes, I have it. I previously used the freeware Regex Coach Utility,
> but it is nowhere near as complete in its support for various newer
> Regular Expression syntax and programming languages in general. It did
> have one nice feature about it. You could split a Regular Expression
> across multiple lines, which often made it easier to analyze. However,
> Regex Buddy has the graphical tree view, and it is synchronized with
> the Regular Expression itself, which more than makes up for the
> omission of breaking a Regular Expression across multiple lines.
>
> BTW, it also has a GREP utility built in.
>
> In short, it is well worth the 30 bucks.
>
> --
> HTH,
>
> Kevin Spencer
> Microsoft MVP
> .Net Developer
> Ambiguity has a certain quality to it.
>
> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
> message news:%2****************@TK2MSFTNGP15.phx.gbl...
>>I was looking at PowerGrep from the same dev group but like Regex
>>Buddy I don't like the buy before you try business model so that
>>choice has to be on the shelf for the moment but thanks for bringing
>>it up. I assume you've used Regex Buddy?
>>
>> <%= Clinton Gallagher
>>
>>
>>
>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
>> news:%2***************@tk2msftngp13.phx.gbl...
>>> Regex Buddy is very good. It costs around $30.00, includes quite a
>>> few nice features, including the ability to copy regular expressions
>>> in various language string syntaxes, including C#. It has the
>>> ability to create libraries of regular expressions, a nice visual
>>> builder, color-coding, and quite a bit more. Good testing
>>> environment. And it has some nice reference material included.
>>>
>>> --
>>> HTH,
>>>
>>> Kevin Spencer
>>> Microsoft MVP
>>> .Net Developer
>>> Ambiguity has a certain quality to it.
>>>
>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
>>> message news:%2******************@tk2msftngp13.phx.gbl...
>>>> I'm using an .aspx tool I found at [1] but as nice as the interface
>>>> is I think I need to consider using others. Some can generate C# I
>>>> understand. Your preferences please...
>>>>
>>>> <%= Clinton Gallagher
>>>>
>>>> [1] http://forta.com/books/0672325667/
>>>>
>>>
>>>
>>
>>
>
>



Nov 19 '05 #10

P: n/a
Hi Clinton,

The following Regular Expression will give you the ability to do a
Regex.Replace on a string containing both single "&" characters and "&amp;"
strings. It captures the "&amp;" strings into their own separate matches,
and the "&" characters into their own matches, putting the "&" characters
into a Group. It is also case-insensitive:

(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))

Here's some sample code for reeplacing the single "&" characters with
&amp; -

/// <summary>
/// Replaces Ampersand in a Match with "&amp;"
/// </summary>
/// <param name="m">Match</param>
/// <returns>Replaced Match value</returns>
public static string ampReplacer(Match m)
{
if (m.Groups[1].Captures.Count == 0) return m.Value;
return m.Value.Replace("&", "&amp;");
}

/// <summary>
/// Replaces all single Ampersand characters in a string with "&amp;"
/// </summary>
/// <param name="s">String to process</param>
/// <returns>Processed String</returns>
public static string ReplaceAmpersand(string s)
{
return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
new MatchEvaluator(ampReplacer));
}

The "ampReplacer function is the function passed as the MatchEvaluator
delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
method. The "ReplaceAmpersand" method takes a string as an argument, and
uses Regex.Replace to replace all matches in the string that contain a value
in Groups[1] with "&amp;".

As a side note, I used both Expresso and Regex Buddy to come up with this.
It was indeed a challenge, as I'm not quite a master of Regular Expressions.
But I enjoy learning, so it was a good exercise for me! :)

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:%2***************@tk2msftngp13.phx.gbl...
Kevin, have you ever heard the expression "preaching to the choir?" :-)

I've got the basic pattern matching theory understood but its the use of
expressions to disallow or replace certain characters and/or strings that
I'm trying to really understand thoroughly. The following example
illustrates...

// Example
Lawn Mowers, Repairs & Services - lawnmowers.com

A typical page title that when entered into a TextBox meant to capture
string data for an RSS 2.0 title element should use &amp; instead of the &
to represent the ampersand. I've got an expression that works well for the
example but can't figure out (with the expression I have) how to match the
& and replace it with &amp; (yet) -- or -- how to use the expression I
have to force the 2.0 Regular Expression Validator to fail when the & is
present in the string.

// Expression
[a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*

I also really appreciate Expresso's Analyzer. It is outstanding that
Expresso seems to make it easy for us to pick expressions apart piece by
piece and explain them in English.
<%= Clinton Gallagher


"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:Ow****************@tk2msftngp13.phx.gbl...
Hi Juan,
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.


The problem with that is that you can write a Regular Expression that
matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring "literal
string" in my first sentence. Of course, the real power of regular
expressions is the abilty to match *patterns* in a string, perform
grouping, etc. So, like any programming language (which it is, in a
sense), Regular Expressions have a shorthand syntax that allows one to
create patterns of a large variety of types. A simple example of this
would be:

(literal) (string)

This captures the same match as the first, but puts the string "literal"
into a group, and the string "string" into a second group. But of course,
we have already exceeded your desired requirement. On the other hand, we
have made a regular expression that is perhaps more useful (in some
situations) than the first.

And of course, the possible types and combinations of patterns are almost
endless, including wildcard patterns, special characters, boolean rules,
and so on.

Yeah, it's like reading some kind of incredibly concise shorthand code,
without even line breaks or brackets to help. That's why I was so pleased
to see that Expresso allows you to break your regular expression across
multiple lines while building it. That helps a good bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"Juan T. Llibre" <no***********@nowhere.com> wrote in message
news:ei**************@TK2MSFTNGP12.phx.gbl...
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.


Juan T. Llibre, ASP.NET MVP
ASP.NET FAQ : http://asp.net.do/faq/
Foros de ASP.NET en Español : http://asp.net.do/foros/
======================================
"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:Of**************@TK2MSFTNGP10.phx.gbl...
Thanks Kevin. I saw that post too and am going to download Expresso in
a few minutes. I know you don't need to be psychic to figure out what
I'm likely to be asking next :-)

<%= Clinton Gallagher
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:O0**************@tk2msftngp13.phx.gbl...
>I saw a response to this question in the CSharp group, regarding a
>product named "Expresso"
>
> http://www.ultrapico.com/Expresso.htm
>
> Expresso is .Net freeware, and after downloading, installing, and
> playing with it, I'd give it a try! So far I have found it to be
> excellent, having capabilities that Regex Buddy does not have, and a
> much more intuitive GUI.
>
> --
> HTH,
>
> Kevin Spencer
> Microsoft MVP
> .Net Developer
> Ambiguity has a certain quality to it.
>
> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
> news:%2****************@TK2MSFTNGP12.phx.gbl...
>> Hi Clinton,
>>
>> Yes, I have it. I previously used the freeware Regex Coach Utility,
>> but it is nowhere near as complete in its support for various newer
>> Regular Expression syntax and programming languages in general. It
>> did have one nice feature about it. You could split a Regular
>> Expression across multiple lines, which often made it easier to
>> analyze. However, Regex Buddy has the graphical tree view, and it is
>> synchronized with the Regular Expression itself, which more than
>> makes up for the omission of breaking a Regular Expression across
>> multiple lines.
>>
>> BTW, it also has a GREP utility built in.
>>
>> In short, it is well worth the 30 bucks.
>>
>> --
>> HTH,
>>
>> Kevin Spencer
>> Microsoft MVP
>> .Net Developer
>> Ambiguity has a certain quality to it.
>>
>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
>> message news:%2****************@TK2MSFTNGP15.phx.gbl...
>>>I was looking at PowerGrep from the same dev group but like Regex
>>>Buddy I don't like the buy before you try business model so that
>>>choice has to be on the shelf for the moment but thanks for bringing
>>>it up. I assume you've used Regex Buddy?
>>>
>>> <%= Clinton Gallagher
>>>
>>>
>>>
>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
>>> news:%2***************@tk2msftngp13.phx.gbl...
>>>> Regex Buddy is very good. It costs around $30.00, includes quite a
>>>> few nice features, including the ability to copy regular
>>>> expressions in various language string syntaxes, including C#. It
>>>> has the ability to create libraries of regular expressions, a nice
>>>> visual builder, color-coding, and quite a bit more. Good testing
>>>> environment. And it has some nice reference material included.
>>>>
>>>> --
>>>> HTH,
>>>>
>>>> Kevin Spencer
>>>> Microsoft MVP
>>>> .Net Developer
>>>> Ambiguity has a certain quality to it.
>>>>
>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
>>>> message news:%2******************@tk2msftngp13.phx.gbl...
>>>>> I'm using an .aspx tool I found at [1] but as nice as the
>>>>> interface is I think I need to consider using others. Some can
>>>>> generate C# I understand. Your preferences please...
>>>>>
>>>>> <%= Clinton Gallagher
>>>>>
>>>>> [1] http://forta.com/books/0672325667/
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>



Nov 19 '05 #11

P: n/a
Hello Kevin,

Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
working this out. Its one of those 'must know' issues one needs to be
concerned with when generating valid XML from an application. I'll be
working with it later today and I'm starting to get a feel for Expresso
which I have a question about. I'm at the point where I've almost come to
understand how expressions are actually processed which -- for me -- means I
will understand how I need to think to put them together. You've been a real
help again and your source is an inspiration which shows how elegant
self-documenting code can be.

As for the Expresso question, what is 1:? supposed to indicate? (noting
that's the closest I could come at the moment to replicate the rectangular
'non-printable' character Expresso uses to indicate some 'thing' it has
matched) In the following simple example it seems to match a white space
although in a manner that is confusing as I will point out but in other
examples with many more characters and white space in the string to be
matched I have counted the position where the ? is said to be matched and
the position reported does not fall on a white space at all.

// Expression
\w*@\w*\.\w*((\.\w*)*)?

// String to match
An example so*****@somewhere.com of an email address.

Expresso reports 1:? at Postion 32 Length 0 which infers white space in the
simple example as given noting there was white space characters before the
matched characters and motivating one to ask why Expresso would ignore those
previous white space characters and then report 2:? at Position 0 Length 0
which suggests the parser returned to the beginning of the string to be
matched and found what?

Is this clear as mud or what :-)

<%= Clinton Gallagher
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
Hi Clinton,

The following Regular Expression will give you the ability to do a
Regex.Replace on a string containing both single "&" characters and
"&amp;" strings. It captures the "&amp;" strings into their own separate
matches, and the "&" characters into their own matches, putting the "&"
characters into a Group. It is also case-insensitive:

(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))

Here's some sample code for reeplacing the single "&" characters with
&amp; -

/// <summary>
/// Replaces Ampersand in a Match with "&amp;"
/// </summary>
/// <param name="m">Match</param>
/// <returns>Replaced Match value</returns>
public static string ampReplacer(Match m)
{
if (m.Groups[1].Captures.Count == 0) return m.Value;
return m.Value.Replace("&", "&amp;");
}

/// <summary>
/// Replaces all single Ampersand characters in a string with "&amp;"
/// </summary>
/// <param name="s">String to process</param>
/// <returns>Processed String</returns>
public static string ReplaceAmpersand(string s)
{
return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
new MatchEvaluator(ampReplacer));
}

The "ampReplacer function is the function passed as the MatchEvaluator
delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
method. The "ReplaceAmpersand" method takes a string as an argument, and
uses Regex.Replace to replace all matches in the string that contain a
value in Groups[1] with "&amp;".

As a side note, I used both Expresso and Regex Buddy to come up with this.
It was indeed a challenge, as I'm not quite a master of Regular
Expressions. But I enjoy learning, so it was a good exercise for me! :)

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:%2***************@tk2msftngp13.phx.gbl...
Kevin, have you ever heard the expression "preaching to the choir?" :-)

I've got the basic pattern matching theory understood but its the use of
expressions to disallow or replace certain characters and/or strings that
I'm trying to really understand thoroughly. The following example
illustrates...

// Example
Lawn Mowers, Repairs & Services - lawnmowers.com

A typical page title that when entered into a TextBox meant to capture
string data for an RSS 2.0 title element should use &amp; instead of the
& to represent the ampersand. I've got an expression that works well for
the example but can't figure out (with the expression I have) how to
match the & and replace it with &amp; (yet) -- or -- how to use the
expression I have to force the 2.0 Regular Expression Validator to fail
when the & is present in the string.

// Expression
[a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*

I also really appreciate Expresso's Analyzer. It is outstanding that
Expresso seems to make it easy for us to pick expressions apart piece by
piece and explain them in English.
<%= Clinton Gallagher


"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:Ow****************@tk2msftngp13.phx.gbl...
Hi Juan,

The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

The problem with that is that you can write a Regular Expression that
matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring
"literal string" in my first sentence. Of course, the real power of
regular expressions is the abilty to match *patterns* in a string,
perform grouping, etc. So, like any programming language (which it is,
in a sense), Regular Expressions have a shorthand syntax that allows one
to create patterns of a large variety of types. A simple example of this
would be:

(literal) (string)

This captures the same match as the first, but puts the string "literal"
into a group, and the string "string" into a second group. But of
course, we have already exceeded your desired requirement. On the other
hand, we have made a regular expression that is perhaps more useful (in
some situations) than the first.

And of course, the possible types and combinations of patterns are
almost endless, including wildcard patterns, special characters, boolean
rules, and so on.

Yeah, it's like reading some kind of incredibly concise shorthand code,
without even line breaks or brackets to help. That's why I was so
pleased to see that Expresso allows you to break your regular expression
across multiple lines while building it. That helps a good bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"Juan T. Llibre" <no***********@nowhere.com> wrote in message
news:ei**************@TK2MSFTNGP12.phx.gbl...
The kind of RegEx tool I'd like is one which can take a string
I write, and create a RegEx expression which matches it.

*That* will be the RegEx tool that will corner the market.


Juan T. Llibre, ASP.NET MVP
ASP.NET FAQ : http://asp.net.do/faq/
Foros de ASP.NET en Español : http://asp.net.do/foros/
======================================
"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:Of**************@TK2MSFTNGP10.phx.gbl...
> Thanks Kevin. I saw that post too and am going to download Expresso in
> a few minutes. I know you don't need to be psychic to figure out what
> I'm likely to be asking next :-)
>
> <%= Clinton Gallagher
>
>
> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
> news:O0**************@tk2msftngp13.phx.gbl...
>>I saw a response to this question in the CSharp group, regarding a
>>product named "Expresso"
>>
>> http://www.ultrapico.com/Expresso.htm
>>
>> Expresso is .Net freeware, and after downloading, installing, and
>> playing with it, I'd give it a try! So far I have found it to be
>> excellent, having capabilities that Regex Buddy does not have, and a
>> much more intuitive GUI.
>>
>> --
>> HTH,
>>
>> Kevin Spencer
>> Microsoft MVP
>> .Net Developer
>> Ambiguity has a certain quality to it.
>>
>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
>> news:%2****************@TK2MSFTNGP12.phx.gbl...
>>> Hi Clinton,
>>>
>>> Yes, I have it. I previously used the freeware Regex Coach Utility,
>>> but it is nowhere near as complete in its support for various newer
>>> Regular Expression syntax and programming languages in general. It
>>> did have one nice feature about it. You could split a Regular
>>> Expression across multiple lines, which often made it easier to
>>> analyze. However, Regex Buddy has the graphical tree view, and it is
>>> synchronized with the Regular Expression itself, which more than
>>> makes up for the omission of breaking a Regular Expression across
>>> multiple lines.
>>>
>>> BTW, it also has a GREP utility built in.
>>>
>>> In short, it is well worth the 30 bucks.
>>>
>>> --
>>> HTH,
>>>
>>> Kevin Spencer
>>> Microsoft MVP
>>> .Net Developer
>>> Ambiguity has a certain quality to it.
>>>
>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
>>> message news:%2****************@TK2MSFTNGP15.phx.gbl...
>>>>I was looking at PowerGrep from the same dev group but like Regex
>>>>Buddy I don't like the buy before you try business model so that
>>>>choice has to be on the shelf for the moment but thanks for bringing
>>>>it up. I assume you've used Regex Buddy?
>>>>
>>>> <%= Clinton Gallagher
>>>>
>>>>
>>>>
>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
>>>> news:%2***************@tk2msftngp13.phx.gbl...
>>>>> Regex Buddy is very good. It costs around $30.00, includes quite a
>>>>> few nice features, including the ability to copy regular
>>>>> expressions in various language string syntaxes, including C#. It
>>>>> has the ability to create libraries of regular expressions, a nice
>>>>> visual builder, color-coding, and quite a bit more. Good testing
>>>>> environment. And it has some nice reference material included.
>>>>>
>>>>> --
>>>>> HTH,
>>>>>
>>>>> Kevin Spencer
>>>>> Microsoft MVP
>>>>> .Net Developer
>>>>> Ambiguity has a certain quality to it.
>>>>>
>>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
>>>>> message news:%2******************@tk2msftngp13.phx.gbl...
>>>>>> I'm using an .aspx tool I found at [1] but as nice as the
>>>>>> interface is I think I need to consider using others. Some can
>>>>>> generate C# I understand. Your preferences please...
>>>>>>
>>>>>> <%= Clinton Gallagher
>>>>>>
>>>>>> [1] http://forta.com/books/0672325667/
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>



Nov 19 '05 #12

P: n/a
Hi Clinton,

Regular Expressions are a bear to learn, ieven if you have good tools to
work with them. I've spent hours working out a relatively "simple" one (at
least it seemed simple at first), but learning a bit more with each hour.
Still, I'm a long way from an expert. I can read most of it fairly well by
now, but certain concepts are still a bit difficult to deal with. I still
struggle some with Lookarounds in particular. One thing to keep in mind is
that Regular Expressions consume a string as they move through it, with a
few exceptions (like Lookarounds). They are basically sequential in nature.

You may find the "Analyze" tool helpful with this sort of thing.
Fortunately, I have not 2 but THREE Regular Expression tools to work with (2
of them are Freeware), which enables me to use the one(s) that are best for
the particular type of work I need regarding any individual Regular
Expression and/or problem with one.

The expression you posted,

\w*@\w*\.\w*((\.\w*)*)?

Can be analyzed in so many words as (with the parsing of the email address
where the match begins):

Match any word character, zero or more times. \w*
someone
Next, Match the '@' character once. @
@
Next match any word character zero or more times \w*
somewhere
Next, Match the '.' character once \.
..
Next, Match any word character zero or more times \w*
com
Next, put the following into Group 1 zero or 1 time: (......)?
Match the following into Group 2 zero or more times: (......)*
Match the '.' character once
\.
Match any word character zero or more times \w*
Result of Group 1:
(\.\w*)* Group 2 (Nothing)
Result of Group 2
\.\w* Nothing

Basically, there is no match for either Group 1 or Group 2, as the '.' has
been consumed by the previous Match. However, as both Groups specify a
minimum of Zero times, they don't disqualify the Match, as they appear zero
times each.

Why does Expresso report Group 1 at position 32 (end of string)? Well, no
match has been returned prior to the end of the string. So, that's where the
null match begins. Why does Expressio begin at position 0? Well, I'm not
that good with it!

Still, your regular expression is a bit lax in terms of standards. We worked
one up for valid email addresses the other day, and you may want to borrow
it:

(?i)([-.\w]+)\@(?:((?:\d{1,3}\.){3}\d{1,3})|([-a-z0-9]+(?:\.[-a-z0-9]+)*)\.((?:com|edu|gov|int|mil|net|org|biz|info|nam e|museum|coop|aero|[a-z]{2})))

It is case-insensitive, and matches both domain name and IP domain email
addresses. It puts the results into 4 possible groups:

1-User Name, 2-Domain IP Address, 3-Domain Name, 4-Root Domain.

Note that groups 2 and (3,4) are exclusive of one another. The email address
can either be an IP address, or a named domain, but not both. It supports
2-letter country suffixes, and multiple-dot domain addresses. And it's
case-sensitive.

I'm not sure we covered all the possible permutations, but it's pretty
strong.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.
Basically, the whole string has been consumed by the
"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:%2****************@TK2MSFTNGP14.phx.gbl...
Hello Kevin,

Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
working this out. Its one of those 'must know' issues one needs to be
concerned with when generating valid XML from an application. I'll be
working with it later today and I'm starting to get a feel for Expresso
which I have a question about. I'm at the point where I've almost come to
understand how expressions are actually processed which -- for me -- means
I will understand how I need to think to put them together. You've been a
real help again and your source is an inspiration which shows how elegant
self-documenting code can be.

As for the Expresso question, what is 1:? supposed to indicate? (noting
that's the closest I could come at the moment to replicate the rectangular
'non-printable' character Expresso uses to indicate some 'thing' it has
matched) In the following simple example it seems to match a white space
although in a manner that is confusing as I will point out but in other
examples with many more characters and white space in the string to be
matched I have counted the position where the ? is said to be matched and
the position reported does not fall on a white space at all.

// Expression
\w*@\w*\.\w*((\.\w*)*)?

// String to match
An example so*****@somewhere.com of an email address.

Expresso reports 1:? at Postion 32 Length 0 which infers white space in
the simple example as given noting there was white space characters before
the matched characters and motivating one to ask why Expresso would ignore
those previous white space characters and then report 2:? at Position 0
Length 0 which suggests the parser returned to the beginning of the string
to be matched and found what?

Is this clear as mud or what :-)

<%= Clinton Gallagher
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
Hi Clinton,

The following Regular Expression will give you the ability to do a
Regex.Replace on a string containing both single "&" characters and
"&amp;" strings. It captures the "&amp;" strings into their own separate
matches, and the "&" characters into their own matches, putting the "&"
characters into a Group. It is also case-insensitive:

(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))

Here's some sample code for reeplacing the single "&" characters with
&amp; -

/// <summary>
/// Replaces Ampersand in a Match with "&amp;"
/// </summary>
/// <param name="m">Match</param>
/// <returns>Replaced Match value</returns>
public static string ampReplacer(Match m)
{
if (m.Groups[1].Captures.Count == 0) return m.Value;
return m.Value.Replace("&", "&amp;");
}

/// <summary>
/// Replaces all single Ampersand characters in a string with "&amp;"
/// </summary>
/// <param name="s">String to process</param>
/// <returns>Processed String</returns>
public static string ReplaceAmpersand(string s)
{
return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
new MatchEvaluator(ampReplacer));
}

The "ampReplacer function is the function passed as the MatchEvaluator
delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
method. The "ReplaceAmpersand" method takes a string as an argument, and
uses Regex.Replace to replace all matches in the string that contain a
value in Groups[1] with "&amp;".

As a side note, I used both Expresso and Regex Buddy to come up with
this. It was indeed a challenge, as I'm not quite a master of Regular
Expressions. But I enjoy learning, so it was a good exercise for me! :)

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:%2***************@tk2msftngp13.phx.gbl...
Kevin, have you ever heard the expression "preaching to the choir?" :-)

I've got the basic pattern matching theory understood but its the use of
expressions to disallow or replace certain characters and/or strings
that I'm trying to really understand thoroughly. The following example
illustrates...

// Example
Lawn Mowers, Repairs & Services - lawnmowers.com

A typical page title that when entered into a TextBox meant to capture
string data for an RSS 2.0 title element should use &amp; instead of the
& to represent the ampersand. I've got an expression that works well for
the example but can't figure out (with the expression I have) how to
match the & and replace it with &amp; (yet) -- or -- how to use the
expression I have to force the 2.0 Regular Expression Validator to fail
when the & is present in the string.

// Expression
[a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*

I also really appreciate Expresso's Analyzer. It is outstanding that
Expresso seems to make it easy for us to pick expressions apart piece by
piece and explain them in English.
<%= Clinton Gallagher


"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:Ow****************@tk2msftngp13.phx.gbl...
Hi Juan,

> The kind of RegEx tool I'd like is one which can take a string
> I write, and create a RegEx expression which matches it.

The problem with that is that you can write a Regular Expression that
matches a literal string quite easily. For example:

literal string

The above is a regular expression which will match the substring
"literal string" in my first sentence. Of course, the real power of
regular expressions is the abilty to match *patterns* in a string,
perform grouping, etc. So, like any programming language (which it is,
in a sense), Regular Expressions have a shorthand syntax that allows
one to create patterns of a large variety of types. A simple example of
this would be:

(literal) (string)

This captures the same match as the first, but puts the string
"literal" into a group, and the string "string" into a second group.
But of course, we have already exceeded your desired requirement. On
the other hand, we have made a regular expression that is perhaps more
useful (in some situations) than the first.

And of course, the possible types and combinations of patterns are
almost endless, including wildcard patterns, special characters,
boolean rules, and so on.

Yeah, it's like reading some kind of incredibly concise shorthand code,
without even line breaks or brackets to help. That's why I was so
pleased to see that Expresso allows you to break your regular
expression across multiple lines while building it. That helps a good
bit!

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"Juan T. Llibre" <no***********@nowhere.com> wrote in message
news:ei**************@TK2MSFTNGP12.phx.gbl...
> The kind of RegEx tool I'd like is one which can take a string
> I write, and create a RegEx expression which matches it.
>
> *That* will be the RegEx tool that will corner the market.
>
>
>
>
> Juan T. Llibre, ASP.NET MVP
> ASP.NET FAQ : http://asp.net.do/faq/
> Foros de ASP.NET en Español : http://asp.net.do/foros/
> ======================================
> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
> message news:Of**************@TK2MSFTNGP10.phx.gbl...
>> Thanks Kevin. I saw that post too and am going to download Expresso
>> in a few minutes. I know you don't need to be psychic to figure out
>> what I'm likely to be asking next :-)
>>
>> <%= Clinton Gallagher
>>
>>
>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
>> news:O0**************@tk2msftngp13.phx.gbl...
>>>I saw a response to this question in the CSharp group, regarding a
>>>product named "Expresso"
>>>
>>> http://www.ultrapico.com/Expresso.htm
>>>
>>> Expresso is .Net freeware, and after downloading, installing, and
>>> playing with it, I'd give it a try! So far I have found it to be
>>> excellent, having capabilities that Regex Buddy does not have, and a
>>> much more intuitive GUI.
>>>
>>> --
>>> HTH,
>>>
>>> Kevin Spencer
>>> Microsoft MVP
>>> .Net Developer
>>> Ambiguity has a certain quality to it.
>>>
>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
>>> news:%2****************@TK2MSFTNGP12.phx.gbl...
>>>> Hi Clinton,
>>>>
>>>> Yes, I have it. I previously used the freeware Regex Coach Utility,
>>>> but it is nowhere near as complete in its support for various newer
>>>> Regular Expression syntax and programming languages in general. It
>>>> did have one nice feature about it. You could split a Regular
>>>> Expression across multiple lines, which often made it easier to
>>>> analyze. However, Regex Buddy has the graphical tree view, and it
>>>> is synchronized with the Regular Expression itself, which more than
>>>> makes up for the omission of breaking a Regular Expression across
>>>> multiple lines.
>>>>
>>>> BTW, it also has a GREP utility built in.
>>>>
>>>> In short, it is well worth the 30 bucks.
>>>>
>>>> --
>>>> HTH,
>>>>
>>>> Kevin Spencer
>>>> Microsoft MVP
>>>> .Net Developer
>>>> Ambiguity has a certain quality to it.
>>>>
>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
>>>> message news:%2****************@TK2MSFTNGP15.phx.gbl...
>>>>>I was looking at PowerGrep from the same dev group but like Regex
>>>>>Buddy I don't like the buy before you try business model so that
>>>>>choice has to be on the shelf for the moment but thanks for
>>>>>bringing it up. I assume you've used Regex Buddy?
>>>>>
>>>>> <%= Clinton Gallagher
>>>>>
>>>>>
>>>>>
>>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in
>>>>> message news:%2***************@tk2msftngp13.phx.gbl...
>>>>>> Regex Buddy is very good. It costs around $30.00, includes quite
>>>>>> a few nice features, including the ability to copy regular
>>>>>> expressions in various language string syntaxes, including C#. It
>>>>>> has the ability to create libraries of regular expressions, a
>>>>>> nice visual builder, color-coding, and quite a bit more. Good
>>>>>> testing environment. And it has some nice reference material
>>>>>> included.
>>>>>>
>>>>>> --
>>>>>> HTH,
>>>>>>
>>>>>> Kevin Spencer
>>>>>> Microsoft MVP
>>>>>> .Net Developer
>>>>>> Ambiguity has a certain quality to it.
>>>>>>
>>>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote
>>>>>> in message news:%2******************@tk2msftngp13.phx.gbl...
>>>>>>> I'm using an .aspx tool I found at [1] but as nice as the
>>>>>>> interface is I think I need to consider using others. Some can
>>>>>>> generate C# I understand. Your preferences please...
>>>>>>>
>>>>>>> <%= Clinton Gallagher
>>>>>>>
>>>>>>> [1] http://forta.com/books/0672325667/
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>



Nov 19 '05 #13

P: n/a
Hello Kevin,

With exception of bass ackward Lookarounds, when you say expressions are
sequential in nature that is exactly what I am trying to nail down.

What confuses me is they appear to be constructed using set theory like
algabraic expressions whose terms are resolved from the inner most set
working outwards yet when a tool processes them they appear to be parsed
linearly as if serialized. Does that make sense to you?

I sure wish Microsoft would publish a document explaining how the
expressions are parsed the same way they have explained how the life cycle
of a page is processed. The control tree is easy to understand
theoretically. I can only wonder why it can not be similarly mapped and
explained for regular expressions.

I've tried the following:

* RegexDesigner.Net
* Rad Regex Designer
* Regular Expression Tester
* Regulator RegEx
* Expresso

Downloading this afternoon...

* Regex Coach [1] donationware
Which is also said to have an English language 'analyzer' like Expresso
which I consider to have the best interface and functionality but still
can't make a genius from a dummy :-).

I'm going to delve into some lists and forums [2] for the next week to see
what I can learn.
This book [3] sounds interesting. I'll take a trip to Barnes and Noble to
see if its on the shelves.
I also have this book [4] which is very good despite the "10 Minutes"
inferences and of course I'll be looking for [5] from O'Rielly.

Let's see what that does for now.

<%= Clinton Gallagher
[1] http://www.weitz.de/regex-coach/
[2] http://regexadvice.com/
[3]
http://books.slashdot.org/article.pl...&tid=192&tid=6
[4] http://forta.com/books/0672325667/
[5] http://www.oreilly.com/catalog/regex...L-4GV796923290

I appreciate your interest in this context. I've discovered a regular
expression mailint list eminates from

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:uH**************@TK2MSFTNGP10.phx.gbl...
Hi Clinton,

Regular Expressions are a bear to learn, ieven if you have good tools to
work with them. I've spent hours working out a relatively "simple" one (at
least it seemed simple at first), but learning a bit more with each hour.
Still, I'm a long way from an expert. I can read most of it fairly well by
now, but certain concepts are still a bit difficult to deal with. I still
struggle some with Lookarounds in particular. One thing to keep in mind is
that Regular Expressions consume a string as they move through it, with a
few exceptions (like Lookarounds). They are basically sequential in
nature.

You may find the "Analyze" tool helpful with this sort of thing.
Fortunately, I have not 2 but THREE Regular Expression tools to work with
(2 of them are Freeware), which enables me to use the one(s) that are best
for the particular type of work I need regarding any individual Regular
Expression and/or problem with one.

The expression you posted,

\w*@\w*\.\w*((\.\w*)*)?

Can be analyzed in so many words as (with the parsing of the email address
where the match begins):

Match any word character, zero or more times. \w* someone
Next, Match the '@' character once. @ @
Next match any word character zero or more times \w* somewhere
Next, Match the '.' character once \.
.
Next, Match any word character zero or more times \w* com
Next, put the following into Group 1 zero or 1 time: (......)?
Match the following into Group 2 zero or more times: (......)*
Match the '.' character once \.
Match any word character zero or more times \w*
Result of Group 1: (\.\w*)* Group 2 (Nothing)
Result of Group 2 \.\w* Nothing

Basically, there is no match for either Group 1 or Group 2, as the '.' has
been consumed by the previous Match. However, as both Groups specify a
minimum of Zero times, they don't disqualify the Match, as they appear
zero times each.

Why does Expresso report Group 1 at position 32 (end of string)? Well, no
match has been returned prior to the end of the string. So, that's where
the null match begins. Why does Expressio begin at position 0? Well, I'm
not that good with it!

Still, your regular expression is a bit lax in terms of standards. We
worked one up for valid email addresses the other day, and you may want to
borrow it:

(?i)([-.\w]+)\@(?:((?:\d{1,3}\.){3}\d{1,3})|([-a-z0-9]+(?:\.[-a-z0-9]+)*)\.((?:com|edu|gov|int|mil|net|org|biz|info|nam e|museum|coop|aero|[a-z]{2})))

It is case-insensitive, and matches both domain name and IP domain email
addresses. It puts the results into 4 possible groups:

1-User Name, 2-Domain IP Address, 3-Domain Name, 4-Root Domain.

Note that groups 2 and (3,4) are exclusive of one another. The email
address can either be an IP address, or a named domain, but not both. It
supports 2-letter country suffixes, and multiple-dot domain addresses. And
it's case-sensitive.

I'm not sure we covered all the possible permutations, but it's pretty
strong.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.
Basically, the whole string has been consumed by the
"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:%2****************@TK2MSFTNGP14.phx.gbl...
Hello Kevin,

Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
working this out. Its one of those 'must know' issues one needs to be
concerned with when generating valid XML from an application. I'll be
working with it later today and I'm starting to get a feel for Expresso
which I have a question about. I'm at the point where I've almost come to
understand how expressions are actually processed which -- for me --
means I will understand how I need to think to put them together. You've
been a real help again and your source is an inspiration which shows how
elegant self-documenting code can be.

As for the Expresso question, what is 1:? supposed to indicate? (noting
that's the closest I could come at the moment to replicate the
rectangular 'non-printable' character Expresso uses to indicate some
'thing' it has matched) In the following simple example it seems to match
a white space although in a manner that is confusing as I will point out
but in other examples with many more characters and white space in the
string to be matched I have counted the position where the ? is said to
be matched and the position reported does not fall on a white space at
all.

// Expression
\w*@\w*\.\w*((\.\w*)*)?

// String to match
An example so*****@somewhere.com of an email address.

Expresso reports 1:? at Postion 32 Length 0 which infers white space in
the simple example as given noting there was white space characters
before the matched characters and motivating one to ask why Expresso
would ignore those previous white space characters and then report 2:? at
Position 0 Length 0 which suggests the parser returned to the beginning
of the string to be matched and found what?

Is this clear as mud or what :-)

<%= Clinton Gallagher
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
Hi Clinton,

The following Regular Expression will give you the ability to do a
Regex.Replace on a string containing both single "&" characters and
"&amp;" strings. It captures the "&amp;" strings into their own separate
matches, and the "&" characters into their own matches, putting the "&"
characters into a Group. It is also case-insensitive:

(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))

Here's some sample code for reeplacing the single "&" characters with
&amp; -

/// <summary>
/// Replaces Ampersand in a Match with "&amp;"
/// </summary>
/// <param name="m">Match</param>
/// <returns>Replaced Match value</returns>
public static string ampReplacer(Match m)
{
if (m.Groups[1].Captures.Count == 0) return m.Value;
return m.Value.Replace("&", "&amp;");
}

/// <summary>
/// Replaces all single Ampersand characters in a string with "&amp;"
/// </summary>
/// <param name="s">String to process</param>
/// <returns>Processed String</returns>
public static string ReplaceAmpersand(string s)
{
return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
new MatchEvaluator(ampReplacer));
}

The "ampReplacer function is the function passed as the MatchEvaluator
delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
method. The "ReplaceAmpersand" method takes a string as an argument, and
uses Regex.Replace to replace all matches in the string that contain a
value in Groups[1] with "&amp;".

As a side note, I used both Expresso and Regex Buddy to come up with
this. It was indeed a challenge, as I'm not quite a master of Regular
Expressions. But I enjoy learning, so it was a good exercise for me! :)

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:%2***************@tk2msftngp13.phx.gbl...
Kevin, have you ever heard the expression "preaching to the choir?" :-)

I've got the basic pattern matching theory understood but its the use
of expressions to disallow or replace certain characters and/or strings
that I'm trying to really understand thoroughly. The following example
illustrates...

// Example
Lawn Mowers, Repairs & Services - lawnmowers.com

A typical page title that when entered into a TextBox meant to capture
string data for an RSS 2.0 title element should use &amp; instead of
the & to represent the ampersand. I've got an expression that works
well for the example but can't figure out (with the expression I have)
how to match the & and replace it with &amp; (yet) -- or -- how to use
the expression I have to force the 2.0 Regular Expression Validator to
fail when the & is present in the string.

// Expression
[a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*

I also really appreciate Expresso's Analyzer. It is outstanding that
Expresso seems to make it easy for us to pick expressions apart piece
by piece and explain them in English.
<%= Clinton Gallagher


"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:Ow****************@tk2msftngp13.phx.gbl...
> Hi Juan,
>
>> The kind of RegEx tool I'd like is one which can take a string
>> I write, and create a RegEx expression which matches it.
>
> The problem with that is that you can write a Regular Expression that
> matches a literal string quite easily. For example:
>
> literal string
>
> The above is a regular expression which will match the substring
> "literal string" in my first sentence. Of course, the real power of
> regular expressions is the abilty to match *patterns* in a string,
> perform grouping, etc. So, like any programming language (which it is,
> in a sense), Regular Expressions have a shorthand syntax that allows
> one to create patterns of a large variety of types. A simple example
> of this would be:
>
> (literal) (string)
>
> This captures the same match as the first, but puts the string
> "literal" into a group, and the string "string" into a second group.
> But of course, we have already exceeded your desired requirement. On
> the other hand, we have made a regular expression that is perhaps more
> useful (in some situations) than the first.
>
> And of course, the possible types and combinations of patterns are
> almost endless, including wildcard patterns, special characters,
> boolean rules, and so on.
>
> Yeah, it's like reading some kind of incredibly concise shorthand
> code, without even line breaks or brackets to help. That's why I was
> so pleased to see that Expresso allows you to break your regular
> expression across multiple lines while building it. That helps a good
> bit!
>
> --
> HTH,
>
> Kevin Spencer
> Microsoft MVP
> .Net Developer
> Ambiguity has a certain quality to it.
>
> "Juan T. Llibre" <no***********@nowhere.com> wrote in message
> news:ei**************@TK2MSFTNGP12.phx.gbl...
>> The kind of RegEx tool I'd like is one which can take a string
>> I write, and create a RegEx expression which matches it.
>>
>> *That* will be the RegEx tool that will corner the market.
>>
>>
>>
>>
>> Juan T. Llibre, ASP.NET MVP
>> ASP.NET FAQ : http://asp.net.do/faq/
>> Foros de ASP.NET en Español : http://asp.net.do/foros/
>> ======================================
>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
>> message news:Of**************@TK2MSFTNGP10.phx.gbl...
>>> Thanks Kevin. I saw that post too and am going to download Expresso
>>> in a few minutes. I know you don't need to be psychic to figure out
>>> what I'm likely to be asking next :-)
>>>
>>> <%= Clinton Gallagher
>>>
>>>
>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
>>> news:O0**************@tk2msftngp13.phx.gbl...
>>>>I saw a response to this question in the CSharp group, regarding a
>>>>product named "Expresso"
>>>>
>>>> http://www.ultrapico.com/Expresso.htm
>>>>
>>>> Expresso is .Net freeware, and after downloading, installing, and
>>>> playing with it, I'd give it a try! So far I have found it to be
>>>> excellent, having capabilities that Regex Buddy does not have, and
>>>> a much more intuitive GUI.
>>>>
>>>> --
>>>> HTH,
>>>>
>>>> Kevin Spencer
>>>> Microsoft MVP
>>>> .Net Developer
>>>> Ambiguity has a certain quality to it.
>>>>
>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
>>>> news:%2****************@TK2MSFTNGP12.phx.gbl...
>>>>> Hi Clinton,
>>>>>
>>>>> Yes, I have it. I previously used the freeware Regex Coach
>>>>> Utility, but it is nowhere near as complete in its support for
>>>>> various newer Regular Expression syntax and programming languages
>>>>> in general. It did have one nice feature about it. You could split
>>>>> a Regular Expression across multiple lines, which often made it
>>>>> easier to analyze. However, Regex Buddy has the graphical tree
>>>>> view, and it is synchronized with the Regular Expression itself,
>>>>> which more than makes up for the omission of breaking a Regular
>>>>> Expression across multiple lines.
>>>>>
>>>>> BTW, it also has a GREP utility built in.
>>>>>
>>>>> In short, it is well worth the 30 bucks.
>>>>>
>>>>> --
>>>>> HTH,
>>>>>
>>>>> Kevin Spencer
>>>>> Microsoft MVP
>>>>> .Net Developer
>>>>> Ambiguity has a certain quality to it.
>>>>>
>>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
>>>>> message news:%2****************@TK2MSFTNGP15.phx.gbl...
>>>>>>I was looking at PowerGrep from the same dev group but like Regex
>>>>>>Buddy I don't like the buy before you try business model so that
>>>>>>choice has to be on the shelf for the moment but thanks for
>>>>>>bringing it up. I assume you've used Regex Buddy?
>>>>>>
>>>>>> <%= Clinton Gallagher
>>>>>>
>>>>>>
>>>>>>
>>>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in
>>>>>> message news:%2***************@tk2msftngp13.phx.gbl...
>>>>>>> Regex Buddy is very good. It costs around $30.00, includes quite
>>>>>>> a few nice features, including the ability to copy regular
>>>>>>> expressions in various language string syntaxes, including C#.
>>>>>>> It has the ability to create libraries of regular expressions, a
>>>>>>> nice visual builder, color-coding, and quite a bit more. Good
>>>>>>> testing environment. And it has some nice reference material
>>>>>>> included.
>>>>>>>
>>>>>>> --
>>>>>>> HTH,
>>>>>>>
>>>>>>> Kevin Spencer
>>>>>>> Microsoft MVP
>>>>>>> .Net Developer
>>>>>>> Ambiguity has a certain quality to it.
>>>>>>>
>>>>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote
>>>>>>> in message news:%2******************@tk2msftngp13.phx.gbl...
>>>>>>>> I'm using an .aspx tool I found at [1] but as nice as the
>>>>>>>> interface is I think I need to consider using others. Some can
>>>>>>>> generate C# I understand. Your preferences please...
>>>>>>>>
>>>>>>>> <%= Clinton Gallagher
>>>>>>>>
>>>>>>>> [1] http://forta.com/books/0672325667/
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>



Nov 19 '05 #14

P: n/a
Regex Coach is very interesting. It has a unique tree that graphically
represents each part of the expression as well as an English 'Analyzer.'

<%= Clnton Gallagher
"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:ej*************@tk2msftngp13.phx.gbl...
Hello Kevin,

With exception of bass ackward Lookarounds, when you say expressions are
sequential in nature that is exactly what I am trying to nail down.

What confuses me is they appear to be constructed using set theory like
algabraic expressions whose terms are resolved from the inner most set
working outwards yet when a tool processes them they appear to be parsed
linearly as if serialized. Does that make sense to you?

I sure wish Microsoft would publish a document explaining how the
expressions are parsed the same way they have explained how the life cycle
of a page is processed. The control tree is easy to understand
theoretically. I can only wonder why it can not be similarly mapped and
explained for regular expressions.

I've tried the following:

* RegexDesigner.Net
* Rad Regex Designer
* Regular Expression Tester
* Regulator RegEx
* Expresso

Downloading this afternoon...

* Regex Coach [1] donationware
Which is also said to have an English language 'analyzer' like Expresso
which I consider to have the best interface and functionality but still
can't make a genius from a dummy :-).

I'm going to delve into some lists and forums [2] for the next week to
see what I can learn.
This book [3] sounds interesting. I'll take a trip to Barnes and Noble to
see if its on the shelves.
I also have this book [4] which is very good despite the "10 Minutes"
inferences and of course I'll be looking for [5] from O'Rielly.

Let's see what that does for now.

<%= Clinton Gallagher
[1] http://www.weitz.de/regex-coach/
[2] http://regexadvice.com/
[3]
http://books.slashdot.org/article.pl...&tid=192&tid=6
[4] http://forta.com/books/0672325667/
[5] http://www.oreilly.com/catalog/regex...L-4GV796923290

I appreciate your interest in this context. I've discovered a regular
expression mailint list eminates from

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:uH**************@TK2MSFTNGP10.phx.gbl...
Hi Clinton,

Regular Expressions are a bear to learn, ieven if you have good tools to
work with them. I've spent hours working out a relatively "simple" one
(at least it seemed simple at first), but learning a bit more with each
hour. Still, I'm a long way from an expert. I can read most of it fairly
well by now, but certain concepts are still a bit difficult to deal with.
I still struggle some with Lookarounds in particular. One thing to keep
in mind is that Regular Expressions consume a string as they move through
it, with a few exceptions (like Lookarounds). They are basically
sequential in nature.

You may find the "Analyze" tool helpful with this sort of thing.
Fortunately, I have not 2 but THREE Regular Expression tools to work with
(2 of them are Freeware), which enables me to use the one(s) that are
best for the particular type of work I need regarding any individual
Regular Expression and/or problem with one.

The expression you posted,

\w*@\w*\.\w*((\.\w*)*)?

Can be analyzed in so many words as (with the parsing of the email
address where the match begins):

Match any word character, zero or more times. \w* someone
Next, Match the '@' character once. @ @
Next match any word character zero or more times \w* somewhere
Next, Match the '.' character once \.
.
Next, Match any word character zero or more times \w* com
Next, put the following into Group 1 zero or 1 time: (......)?
Match the following into Group 2 zero or more times: (......)*
Match the '.' character once \.
Match any word character zero or more times \w*
Result of Group 1: (\.\w*)* Group 2 (Nothing)
Result of Group 2 \.\w* Nothing

Basically, there is no match for either Group 1 or Group 2, as the '.'
has been consumed by the previous Match. However, as both Groups specify
a minimum of Zero times, they don't disqualify the Match, as they appear
zero times each.

Why does Expresso report Group 1 at position 32 (end of string)? Well, no
match has been returned prior to the end of the string. So, that's where
the null match begins. Why does Expressio begin at position 0? Well, I'm
not that good with it!

Still, your regular expression is a bit lax in terms of standards. We
worked one up for valid email addresses the other day, and you may want
to borrow it:

(?i)([-.\w]+)\@(?:((?:\d{1,3}\.){3}\d{1,3})|([-a-z0-9]+(?:\.[-a-z0-9]+)*)\.((?:com|edu|gov|int|mil|net|org|biz|info|nam e|museum|coop|aero|[a-z]{2})))

It is case-insensitive, and matches both domain name and IP domain email
addresses. It puts the results into 4 possible groups:

1-User Name, 2-Domain IP Address, 3-Domain Name, 4-Root Domain.

Note that groups 2 and (3,4) are exclusive of one another. The email
address can either be an IP address, or a named domain, but not both. It
supports 2-letter country suffixes, and multiple-dot domain addresses.
And it's case-sensitive.

I'm not sure we covered all the possible permutations, but it's pretty
strong.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.
Basically, the whole string has been consumed by the
"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:%2****************@TK2MSFTNGP14.phx.gbl...
Hello Kevin,

Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
working this out. Its one of those 'must know' issues one needs to be
concerned with when generating valid XML from an application. I'll be
working with it later today and I'm starting to get a feel for Expresso
which I have a question about. I'm at the point where I've almost come
to understand how expressions are actually processed which -- for me --
means I will understand how I need to think to put them together. You've
been a real help again and your source is an inspiration which shows how
elegant self-documenting code can be.

As for the Expresso question, what is 1:? supposed to indicate? (noting
that's the closest I could come at the moment to replicate the
rectangular 'non-printable' character Expresso uses to indicate some
'thing' it has matched) In the following simple example it seems to
match a white space although in a manner that is confusing as I will
point out but in other examples with many more characters and white
space in the string to be matched I have counted the position where the
? is said to be matched and the position reported does not fall on a
white space at all.

// Expression
\w*@\w*\.\w*((\.\w*)*)?

// String to match
An example so*****@somewhere.com of an email address.

Expresso reports 1:? at Postion 32 Length 0 which infers white space in
the simple example as given noting there was white space characters
before the matched characters and motivating one to ask why Expresso
would ignore those previous white space characters and then report 2:?
at Position 0 Length 0 which suggests the parser returned to the
beginning of the string to be matched and found what?

Is this clear as mud or what :-)

<%= Clinton Gallagher
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
Hi Clinton,

The following Regular Expression will give you the ability to do a
Regex.Replace on a string containing both single "&" characters and
"&amp;" strings. It captures the "&amp;" strings into their own
separate matches, and the "&" characters into their own matches,
putting the "&" characters into a Group. It is also case-insensitive:

(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))

Here's some sample code for reeplacing the single "&" characters with
&amp; -

/// <summary>
/// Replaces Ampersand in a Match with "&amp;"
/// </summary>
/// <param name="m">Match</param>
/// <returns>Replaced Match value</returns>
public static string ampReplacer(Match m)
{
if (m.Groups[1].Captures.Count == 0) return m.Value;
return m.Value.Replace("&", "&amp;");
}

/// <summary>
/// Replaces all single Ampersand characters in a string with "&amp;"
/// </summary>
/// <param name="s">String to process</param>
/// <returns>Processed String</returns>
public static string ReplaceAmpersand(string s)
{
return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
new MatchEvaluator(ampReplacer));
}

The "ampReplacer function is the function passed as the MatchEvaluator
delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
method. The "ReplaceAmpersand" method takes a string as an argument,
and uses Regex.Replace to replace all matches in the string that
contain a value in Groups[1] with "&amp;".

As a side note, I used both Expresso and Regex Buddy to come up with
this. It was indeed a challenge, as I'm not quite a master of Regular
Expressions. But I enjoy learning, so it was a good exercise for me! :)

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:%2***************@tk2msftngp13.phx.gbl...
> Kevin, have you ever heard the expression "preaching to the choir?"
> :-)
>
> I've got the basic pattern matching theory understood but its the use
> of expressions to disallow or replace certain characters and/or
> strings that I'm trying to really understand thoroughly. The following
> example illustrates...
>
> // Example
> Lawn Mowers, Repairs & Services - lawnmowers.com
>
> A typical page title that when entered into a TextBox meant to capture
> string data for an RSS 2.0 title element should use &amp; instead of
> the & to represent the ampersand. I've got an expression that works
> well for the example but can't figure out (with the expression I have)
> how to match the & and replace it with &amp; (yet) -- or -- how to use
> the expression I have to force the 2.0 Regular Expression Validator to
> fail when the & is present in the string.
>
> // Expression
> [a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*
>
> I also really appreciate Expresso's Analyzer. It is outstanding that
> Expresso seems to make it easy for us to pick expressions apart piece
> by piece and explain them in English.
>
>
> <%= Clinton Gallagher
>
>
>
>
>
>
> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
> news:Ow****************@tk2msftngp13.phx.gbl...
>> Hi Juan,
>>
>>> The kind of RegEx tool I'd like is one which can take a string
>>> I write, and create a RegEx expression which matches it.
>>
>> The problem with that is that you can write a Regular Expression that
>> matches a literal string quite easily. For example:
>>
>> literal string
>>
>> The above is a regular expression which will match the substring
>> "literal string" in my first sentence. Of course, the real power of
>> regular expressions is the abilty to match *patterns* in a string,
>> perform grouping, etc. So, like any programming language (which it
>> is, in a sense), Regular Expressions have a shorthand syntax that
>> allows one to create patterns of a large variety of types. A simple
>> example of this would be:
>>
>> (literal) (string)
>>
>> This captures the same match as the first, but puts the string
>> "literal" into a group, and the string "string" into a second group.
>> But of course, we have already exceeded your desired requirement. On
>> the other hand, we have made a regular expression that is perhaps
>> more useful (in some situations) than the first.
>>
>> And of course, the possible types and combinations of patterns are
>> almost endless, including wildcard patterns, special characters,
>> boolean rules, and so on.
>>
>> Yeah, it's like reading some kind of incredibly concise shorthand
>> code, without even line breaks or brackets to help. That's why I was
>> so pleased to see that Expresso allows you to break your regular
>> expression across multiple lines while building it. That helps a good
>> bit!
>>
>> --
>> HTH,
>>
>> Kevin Spencer
>> Microsoft MVP
>> .Net Developer
>> Ambiguity has a certain quality to it.
>>
>> "Juan T. Llibre" <no***********@nowhere.com> wrote in message
>> news:ei**************@TK2MSFTNGP12.phx.gbl...
>>> The kind of RegEx tool I'd like is one which can take a string
>>> I write, and create a RegEx expression which matches it.
>>>
>>> *That* will be the RegEx tool that will corner the market.
>>>
>>>
>>>
>>>
>>> Juan T. Llibre, ASP.NET MVP
>>> ASP.NET FAQ : http://asp.net.do/faq/
>>> Foros de ASP.NET en Español : http://asp.net.do/foros/
>>> ======================================
>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
>>> message news:Of**************@TK2MSFTNGP10.phx.gbl...
>>>> Thanks Kevin. I saw that post too and am going to download Expresso
>>>> in a few minutes. I know you don't need to be psychic to figure out
>>>> what I'm likely to be asking next :-)
>>>>
>>>> <%= Clinton Gallagher
>>>>
>>>>
>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
>>>> news:O0**************@tk2msftngp13.phx.gbl...
>>>>>I saw a response to this question in the CSharp group, regarding a
>>>>>product named "Expresso"
>>>>>
>>>>> http://www.ultrapico.com/Expresso.htm
>>>>>
>>>>> Expresso is .Net freeware, and after downloading, installing, and
>>>>> playing with it, I'd give it a try! So far I have found it to be
>>>>> excellent, having capabilities that Regex Buddy does not have, and
>>>>> a much more intuitive GUI.
>>>>>
>>>>> --
>>>>> HTH,
>>>>>
>>>>> Kevin Spencer
>>>>> Microsoft MVP
>>>>> .Net Developer
>>>>> Ambiguity has a certain quality to it.
>>>>>
>>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in
>>>>> message news:%2****************@TK2MSFTNGP12.phx.gbl...
>>>>>> Hi Clinton,
>>>>>>
>>>>>> Yes, I have it. I previously used the freeware Regex Coach
>>>>>> Utility, but it is nowhere near as complete in its support for
>>>>>> various newer Regular Expression syntax and programming languages
>>>>>> in general. It did have one nice feature about it. You could
>>>>>> split a Regular Expression across multiple lines, which often
>>>>>> made it easier to analyze. However, Regex Buddy has the graphical
>>>>>> tree view, and it is synchronized with the Regular Expression
>>>>>> itself, which more than makes up for the omission of breaking a
>>>>>> Regular Expression across multiple lines.
>>>>>>
>>>>>> BTW, it also has a GREP utility built in.
>>>>>>
>>>>>> In short, it is well worth the 30 bucks.
>>>>>>
>>>>>> --
>>>>>> HTH,
>>>>>>
>>>>>> Kevin Spencer
>>>>>> Microsoft MVP
>>>>>> .Net Developer
>>>>>> Ambiguity has a certain quality to it.
>>>>>>
>>>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote
>>>>>> in message news:%2****************@TK2MSFTNGP15.phx.gbl...
>>>>>>>I was looking at PowerGrep from the same dev group but like Regex
>>>>>>>Buddy I don't like the buy before you try business model so that
>>>>>>>choice has to be on the shelf for the moment but thanks for
>>>>>>>bringing it up. I assume you've used Regex Buddy?
>>>>>>>
>>>>>>> <%= Clinton Gallagher
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in
>>>>>>> message news:%2***************@tk2msftngp13.phx.gbl...
>>>>>>>> Regex Buddy is very good. It costs around $30.00, includes
>>>>>>>> quite a few nice features, including the ability to copy
>>>>>>>> regular expressions in various language string syntaxes,
>>>>>>>> including C#. It has the ability to create libraries of regular
>>>>>>>> expressions, a nice visual builder, color-coding, and quite a
>>>>>>>> bit more. Good testing environment. And it has some nice
>>>>>>>> reference material included.
>>>>>>>>
>>>>>>>> --
>>>>>>>> HTH,
>>>>>>>>
>>>>>>>> Kevin Spencer
>>>>>>>> Microsoft MVP
>>>>>>>> .Net Developer
>>>>>>>> Ambiguity has a certain quality to it.
>>>>>>>>
>>>>>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote
>>>>>>>> in message news:%2******************@tk2msftngp13.phx.gbl...
>>>>>>>>> I'm using an .aspx tool I found at [1] but as nice as the
>>>>>>>>> interface is I think I need to consider using others. Some can
>>>>>>>>> generate C# I understand. Your preferences please...
>>>>>>>>>
>>>>>>>>> <%= Clinton Gallagher
>>>>>>>>>
>>>>>>>>> [1] http://forta.com/books/0672325667/
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>



Nov 19 '05 #15

P: n/a
Yes, actually that is my third Regex Software package, along with Regex
Buddy and Expresso. I find it helpful to use them concurrently.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:Of**************@TK2MSFTNGP12.phx.gbl...
Regex Coach is very interesting. It has a unique tree that graphically
represents each part of the expression as well as an English 'Analyzer.'

<%= Clnton Gallagher
"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:ej*************@tk2msftngp13.phx.gbl...
Hello Kevin,

With exception of bass ackward Lookarounds, when you say expressions are
sequential in nature that is exactly what I am trying to nail down.

What confuses me is they appear to be constructed using set theory like
algabraic expressions whose terms are resolved from the inner most set
working outwards yet when a tool processes them they appear to be parsed
linearly as if serialized. Does that make sense to you?

I sure wish Microsoft would publish a document explaining how the
expressions are parsed the same way they have explained how the life
cycle of a page is processed. The control tree is easy to understand
theoretically. I can only wonder why it can not be similarly mapped and
explained for regular expressions.

I've tried the following:

* RegexDesigner.Net
* Rad Regex Designer
* Regular Expression Tester
* Regulator RegEx
* Expresso

Downloading this afternoon...

* Regex Coach [1] donationware
Which is also said to have an English language 'analyzer' like Expresso
which I consider to have the best interface and functionality but still
can't make a genius from a dummy :-).

I'm going to delve into some lists and forums [2] for the next week to
see what I can learn.
This book [3] sounds interesting. I'll take a trip to Barnes and Noble to
see if its on the shelves.
I also have this book [4] which is very good despite the "10 Minutes"
inferences and of course I'll be looking for [5] from O'Rielly.

Let's see what that does for now.

<%= Clinton Gallagher
[1] http://www.weitz.de/regex-coach/
[2] http://regexadvice.com/
[3]
http://books.slashdot.org/article.pl...&tid=192&tid=6
[4] http://forta.com/books/0672325667/
[5] http://www.oreilly.com/catalog/regex...L-4GV796923290

I appreciate your interest in this context. I've discovered a regular
expression mailint list eminates from

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:uH**************@TK2MSFTNGP10.phx.gbl...
Hi Clinton,

Regular Expressions are a bear to learn, ieven if you have good tools to
work with them. I've spent hours working out a relatively "simple" one
(at least it seemed simple at first), but learning a bit more with each
hour. Still, I'm a long way from an expert. I can read most of it fairly
well by now, but certain concepts are still a bit difficult to deal
with. I still struggle some with Lookarounds in particular. One thing to
keep in mind is that Regular Expressions consume a string as they move
through it, with a few exceptions (like Lookarounds). They are basically
sequential in nature.

You may find the "Analyze" tool helpful with this sort of thing.
Fortunately, I have not 2 but THREE Regular Expression tools to work
with (2 of them are Freeware), which enables me to use the one(s) that
are best for the particular type of work I need regarding any individual
Regular Expression and/or problem with one.

The expression you posted,

\w*@\w*\.\w*((\.\w*)*)?

Can be analyzed in so many words as (with the parsing of the email
address where the match begins):

Match any word character, zero or more times. \w*
someone
Next, Match the '@' character once. @ @
Next match any word character zero or more times \w*
somewhere
Next, Match the '.' character once
\. .
Next, Match any word character zero or more times \w* com
Next, put the following into Group 1 zero or 1 time: (......)?
Match the following into Group 2 zero or more times: (......)*
Match the '.' character once \.
Match any word character zero or more times \w*
Result of Group 1: (\.\w*)* Group 2 (Nothing)
Result of Group 2 \.\w* Nothing

Basically, there is no match for either Group 1 or Group 2, as the '.'
has been consumed by the previous Match. However, as both Groups specify
a minimum of Zero times, they don't disqualify the Match, as they appear
zero times each.

Why does Expresso report Group 1 at position 32 (end of string)? Well,
no match has been returned prior to the end of the string. So, that's
where the null match begins. Why does Expressio begin at position 0?
Well, I'm not that good with it!

Still, your regular expression is a bit lax in terms of standards. We
worked one up for valid email addresses the other day, and you may want
to borrow it:

(?i)([-.\w]+)\@(?:((?:\d{1,3}\.){3}\d{1,3})|([-a-z0-9]+(?:\.[-a-z0-9]+)*)\.((?:com|edu|gov|int|mil|net|org|biz|info|nam e|museum|coop|aero|[a-z]{2})))

It is case-insensitive, and matches both domain name and IP domain email
addresses. It puts the results into 4 possible groups:

1-User Name, 2-Domain IP Address, 3-Domain Name, 4-Root Domain.

Note that groups 2 and (3,4) are exclusive of one another. The email
address can either be an IP address, or a named domain, but not both. It
supports 2-letter country suffixes, and multiple-dot domain addresses.
And it's case-sensitive.

I'm not sure we covered all the possible permutations, but it's pretty
strong.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.
Basically, the whole string has been consumed by the
"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:%2****************@TK2MSFTNGP14.phx.gbl...
Hello Kevin,

Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
working this out. Its one of those 'must know' issues one needs to be
concerned with when generating valid XML from an application. I'll be
working with it later today and I'm starting to get a feel for Expresso
which I have a question about. I'm at the point where I've almost come
to understand how expressions are actually processed which -- for me --
means I will understand how I need to think to put them together.
You've been a real help again and your source is an inspiration which
shows how elegant self-documenting code can be.

As for the Expresso question, what is 1:? supposed to indicate? (noting
that's the closest I could come at the moment to replicate the
rectangular 'non-printable' character Expresso uses to indicate some
'thing' it has matched) In the following simple example it seems to
match a white space although in a manner that is confusing as I will
point out but in other examples with many more characters and white
space in the string to be matched I have counted the position where the
? is said to be matched and the position reported does not fall on a
white space at all.

// Expression
\w*@\w*\.\w*((\.\w*)*)?

// String to match
An example so*****@somewhere.com of an email address.

Expresso reports 1:? at Postion 32 Length 0 which infers white space in
the simple example as given noting there was white space characters
before the matched characters and motivating one to ask why Expresso
would ignore those previous white space characters and then report 2:?
at Position 0 Length 0 which suggests the parser returned to the
beginning of the string to be matched and found what?

Is this clear as mud or what :-)

<%= Clinton Gallagher
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
> Hi Clinton,
>
> The following Regular Expression will give you the ability to do a
> Regex.Replace on a string containing both single "&" characters and
> "&amp;" strings. It captures the "&amp;" strings into their own
> separate matches, and the "&" characters into their own matches,
> putting the "&" characters into a Group. It is also case-insensitive:
>
> (?i)[^&amp;][^&]*|&amp;|(&(?!=amp))
>
> Here's some sample code for reeplacing the single "&" characters with
> &amp; -
>
> /// <summary>
> /// Replaces Ampersand in a Match with "&amp;"
> /// </summary>
> /// <param name="m">Match</param>
> /// <returns>Replaced Match value</returns>
> public static string ampReplacer(Match m)
> {
> if (m.Groups[1].Captures.Count == 0) return m.Value;
> return m.Value.Replace("&", "&amp;");
> }
>
> /// <summary>
> /// Replaces all single Ampersand characters in a string with "&amp;"
> /// </summary>
> /// <param name="s">String to process</param>
> /// <returns>Processed String</returns>
> public static string ReplaceAmpersand(string s)
> {
> return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
> new MatchEvaluator(ampReplacer));
> }
>
> The "ampReplacer function is the function passed as the MatchEvaluator
> delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
> method. The "ReplaceAmpersand" method takes a string as an argument,
> and uses Regex.Replace to replace all matches in the string that
> contain a value in Groups[1] with "&amp;".
>
> As a side note, I used both Expresso and Regex Buddy to come up with
> this. It was indeed a challenge, as I'm not quite a master of Regular
> Expressions. But I enjoy learning, so it was a good exercise for me!
> :)
>
> --
> HTH,
>
> Kevin Spencer
> Microsoft MVP
> .Net Developer
> Ambiguity has a certain quality to it.
>
> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
> message news:%2***************@tk2msftngp13.phx.gbl...
>> Kevin, have you ever heard the expression "preaching to the choir?"
>> :-)
>>
>> I've got the basic pattern matching theory understood but its the use
>> of expressions to disallow or replace certain characters and/or
>> strings that I'm trying to really understand thoroughly. The
>> following example illustrates...
>>
>> // Example
>> Lawn Mowers, Repairs & Services - lawnmowers.com
>>
>> A typical page title that when entered into a TextBox meant to
>> capture string data for an RSS 2.0 title element should use &amp;
>> instead of the & to represent the ampersand. I've got an expression
>> that works well for the example but can't figure out (with the
>> expression I have) how to match the & and replace it with &amp;
>> (yet) -- or -- how to use the expression I have to force the 2.0
>> Regular Expression Validator to fail when the & is present in the
>> string.
>>
>> // Expression
>> [a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*
>>
>> I also really appreciate Expresso's Analyzer. It is outstanding that
>> Expresso seems to make it easy for us to pick expressions apart piece
>> by piece and explain them in English.
>>
>>
>> <%= Clinton Gallagher
>>
>>
>>
>>
>>
>>
>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
>> news:Ow****************@tk2msftngp13.phx.gbl...
>>> Hi Juan,
>>>
>>>> The kind of RegEx tool I'd like is one which can take a string
>>>> I write, and create a RegEx expression which matches it.
>>>
>>> The problem with that is that you can write a Regular Expression
>>> that matches a literal string quite easily. For example:
>>>
>>> literal string
>>>
>>> The above is a regular expression which will match the substring
>>> "literal string" in my first sentence. Of course, the real power of
>>> regular expressions is the abilty to match *patterns* in a string,
>>> perform grouping, etc. So, like any programming language (which it
>>> is, in a sense), Regular Expressions have a shorthand syntax that
>>> allows one to create patterns of a large variety of types. A simple
>>> example of this would be:
>>>
>>> (literal) (string)
>>>
>>> This captures the same match as the first, but puts the string
>>> "literal" into a group, and the string "string" into a second group.
>>> But of course, we have already exceeded your desired requirement. On
>>> the other hand, we have made a regular expression that is perhaps
>>> more useful (in some situations) than the first.
>>>
>>> And of course, the possible types and combinations of patterns are
>>> almost endless, including wildcard patterns, special characters,
>>> boolean rules, and so on.
>>>
>>> Yeah, it's like reading some kind of incredibly concise shorthand
>>> code, without even line breaks or brackets to help. That's why I was
>>> so pleased to see that Expresso allows you to break your regular
>>> expression across multiple lines while building it. That helps a
>>> good bit!
>>>
>>> --
>>> HTH,
>>>
>>> Kevin Spencer
>>> Microsoft MVP
>>> .Net Developer
>>> Ambiguity has a certain quality to it.
>>>
>>> "Juan T. Llibre" <no***********@nowhere.com> wrote in message
>>> news:ei**************@TK2MSFTNGP12.phx.gbl...
>>>> The kind of RegEx tool I'd like is one which can take a string
>>>> I write, and create a RegEx expression which matches it.
>>>>
>>>> *That* will be the RegEx tool that will corner the market.
>>>>
>>>>
>>>>
>>>>
>>>> Juan T. Llibre, ASP.NET MVP
>>>> ASP.NET FAQ : http://asp.net.do/faq/
>>>> Foros de ASP.NET en Español : http://asp.net.do/foros/
>>>> ======================================
>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
>>>> message news:Of**************@TK2MSFTNGP10.phx.gbl...
>>>>> Thanks Kevin. I saw that post too and am going to download
>>>>> Expresso in a few minutes. I know you don't need to be psychic to
>>>>> figure out what I'm likely to be asking next :-)
>>>>>
>>>>> <%= Clinton Gallagher
>>>>>
>>>>>
>>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in
>>>>> message news:O0**************@tk2msftngp13.phx.gbl...
>>>>>>I saw a response to this question in the CSharp group, regarding a
>>>>>>product named "Expresso"
>>>>>>
>>>>>> http://www.ultrapico.com/Expresso.htm
>>>>>>
>>>>>> Expresso is .Net freeware, and after downloading, installing, and
>>>>>> playing with it, I'd give it a try! So far I have found it to be
>>>>>> excellent, having capabilities that Regex Buddy does not have,
>>>>>> and a much more intuitive GUI.
>>>>>>
>>>>>> --
>>>>>> HTH,
>>>>>>
>>>>>> Kevin Spencer
>>>>>> Microsoft MVP
>>>>>> .Net Developer
>>>>>> Ambiguity has a certain quality to it.
>>>>>>
>>>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in
>>>>>> message news:%2****************@TK2MSFTNGP12.phx.gbl...
>>>>>>> Hi Clinton,
>>>>>>>
>>>>>>> Yes, I have it. I previously used the freeware Regex Coach
>>>>>>> Utility, but it is nowhere near as complete in its support for
>>>>>>> various newer Regular Expression syntax and programming
>>>>>>> languages in general. It did have one nice feature about it. You
>>>>>>> could split a Regular Expression across multiple lines, which
>>>>>>> often made it easier to analyze. However, Regex Buddy has the
>>>>>>> graphical tree view, and it is synchronized with the Regular
>>>>>>> Expression itself, which more than makes up for the omission of
>>>>>>> breaking a Regular Expression across multiple lines.
>>>>>>>
>>>>>>> BTW, it also has a GREP utility built in.
>>>>>>>
>>>>>>> In short, it is well worth the 30 bucks.
>>>>>>>
>>>>>>> --
>>>>>>> HTH,
>>>>>>>
>>>>>>> Kevin Spencer
>>>>>>> Microsoft MVP
>>>>>>> .Net Developer
>>>>>>> Ambiguity has a certain quality to it.
>>>>>>>
>>>>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote
>>>>>>> in message news:%2****************@TK2MSFTNGP15.phx.gbl...
>>>>>>>>I was looking at PowerGrep from the same dev group but like
>>>>>>>>Regex Buddy I don't like the buy before you try business model
>>>>>>>>so that choice has to be on the shelf for the moment but thanks
>>>>>>>>for bringing it up. I assume you've used Regex Buddy?
>>>>>>>>
>>>>>>>> <%= Clinton Gallagher
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in
>>>>>>>> message news:%2***************@tk2msftngp13.phx.gbl...
>>>>>>>>> Regex Buddy is very good. It costs around $30.00, includes
>>>>>>>>> quite a few nice features, including the ability to copy
>>>>>>>>> regular expressions in various language string syntaxes,
>>>>>>>>> including C#. It has the ability to create libraries of
>>>>>>>>> regular expressions, a nice visual builder, color-coding, and
>>>>>>>>> quite a bit more. Good testing environment. And it has some
>>>>>>>>> nice reference material included.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> HTH,
>>>>>>>>>
>>>>>>>>> Kevin Spencer
>>>>>>>>> Microsoft MVP
>>>>>>>>> .Net Developer
>>>>>>>>> Ambiguity has a certain quality to it.
>>>>>>>>>
>>>>>>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com>
>>>>>>>>> wrote in message
>>>>>>>>> news:%2******************@tk2msftngp13.phx.gbl...
>>>>>>>>>> I'm using an .aspx tool I found at [1] but as nice as the
>>>>>>>>>> interface is I think I need to consider using others. Some
>>>>>>>>>> can generate C# I understand. Your preferences please...
>>>>>>>>>>
>>>>>>>>>> <%= Clinton Gallagher
>>>>>>>>>>
>>>>>>>>>> [1] http://forta.com/books/0672325667/
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>



Nov 19 '05 #16

P: n/a
Hi Clinton,

Your remarks piqued my curiosity. I have found a few technical articles on
the inner working of regular expressions:

http://www.cs.rochester.edu/u/leblanc/csc173/fa/
http://perldoc.perl.org/perlre.html#...ar-Expressions
http://research.microsoft.com/projects/greta/
http://en.wikipedia.org/wiki/Regular...anguage_theory

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:ej*************@tk2msftngp13.phx.gbl...
Hello Kevin,

With exception of bass ackward Lookarounds, when you say expressions are
sequential in nature that is exactly what I am trying to nail down.

What confuses me is they appear to be constructed using set theory like
algabraic expressions whose terms are resolved from the inner most set
working outwards yet when a tool processes them they appear to be parsed
linearly as if serialized. Does that make sense to you?

I sure wish Microsoft would publish a document explaining how the
expressions are parsed the same way they have explained how the life cycle
of a page is processed. The control tree is easy to understand
theoretically. I can only wonder why it can not be similarly mapped and
explained for regular expressions.

I've tried the following:

* RegexDesigner.Net
* Rad Regex Designer
* Regular Expression Tester
* Regulator RegEx
* Expresso

Downloading this afternoon...

* Regex Coach [1] donationware
Which is also said to have an English language 'analyzer' like Expresso
which I consider to have the best interface and functionality but still
can't make a genius from a dummy :-).

I'm going to delve into some lists and forums [2] for the next week to
see what I can learn.
This book [3] sounds interesting. I'll take a trip to Barnes and Noble to
see if its on the shelves.
I also have this book [4] which is very good despite the "10 Minutes"
inferences and of course I'll be looking for [5] from O'Rielly.

Let's see what that does for now.

<%= Clinton Gallagher
[1] http://www.weitz.de/regex-coach/
[2] http://regexadvice.com/
[3]
http://books.slashdot.org/article.pl...&tid=192&tid=6
[4] http://forta.com/books/0672325667/
[5] http://www.oreilly.com/catalog/regex...L-4GV796923290

I appreciate your interest in this context. I've discovered a regular
expression mailint list eminates from

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:uH**************@TK2MSFTNGP10.phx.gbl...
Hi Clinton,

Regular Expressions are a bear to learn, ieven if you have good tools to
work with them. I've spent hours working out a relatively "simple" one
(at least it seemed simple at first), but learning a bit more with each
hour. Still, I'm a long way from an expert. I can read most of it fairly
well by now, but certain concepts are still a bit difficult to deal with.
I still struggle some with Lookarounds in particular. One thing to keep
in mind is that Regular Expressions consume a string as they move through
it, with a few exceptions (like Lookarounds). They are basically
sequential in nature.

You may find the "Analyze" tool helpful with this sort of thing.
Fortunately, I have not 2 but THREE Regular Expression tools to work with
(2 of them are Freeware), which enables me to use the one(s) that are
best for the particular type of work I need regarding any individual
Regular Expression and/or problem with one.

The expression you posted,

\w*@\w*\.\w*((\.\w*)*)?

Can be analyzed in so many words as (with the parsing of the email
address where the match begins):

Match any word character, zero or more times. \w* someone
Next, Match the '@' character once. @ @
Next match any word character zero or more times \w* somewhere
Next, Match the '.' character once \.
.
Next, Match any word character zero or more times \w* com
Next, put the following into Group 1 zero or 1 time: (......)?
Match the following into Group 2 zero or more times: (......)*
Match the '.' character once \.
Match any word character zero or more times \w*
Result of Group 1: (\.\w*)* Group 2 (Nothing)
Result of Group 2 \.\w* Nothing

Basically, there is no match for either Group 1 or Group 2, as the '.'
has been consumed by the previous Match. However, as both Groups specify
a minimum of Zero times, they don't disqualify the Match, as they appear
zero times each.

Why does Expresso report Group 1 at position 32 (end of string)? Well, no
match has been returned prior to the end of the string. So, that's where
the null match begins. Why does Expressio begin at position 0? Well, I'm
not that good with it!

Still, your regular expression is a bit lax in terms of standards. We
worked one up for valid email addresses the other day, and you may want
to borrow it:

(?i)([-.\w]+)\@(?:((?:\d{1,3}\.){3}\d{1,3})|([-a-z0-9]+(?:\.[-a-z0-9]+)*)\.((?:com|edu|gov|int|mil|net|org|biz|info|nam e|museum|coop|aero|[a-z]{2})))

It is case-insensitive, and matches both domain name and IP domain email
addresses. It puts the results into 4 possible groups:

1-User Name, 2-Domain IP Address, 3-Domain Name, 4-Root Domain.

Note that groups 2 and (3,4) are exclusive of one another. The email
address can either be an IP address, or a named domain, but not both. It
supports 2-letter country suffixes, and multiple-dot domain addresses.
And it's case-sensitive.

I'm not sure we covered all the possible permutations, but it's pretty
strong.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.
Basically, the whole string has been consumed by the
"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:%2****************@TK2MSFTNGP14.phx.gbl...
Hello Kevin,

Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
working this out. Its one of those 'must know' issues one needs to be
concerned with when generating valid XML from an application. I'll be
working with it later today and I'm starting to get a feel for Expresso
which I have a question about. I'm at the point where I've almost come
to understand how expressions are actually processed which -- for me --
means I will understand how I need to think to put them together. You've
been a real help again and your source is an inspiration which shows how
elegant self-documenting code can be.

As for the Expresso question, what is 1:? supposed to indicate? (noting
that's the closest I could come at the moment to replicate the
rectangular 'non-printable' character Expresso uses to indicate some
'thing' it has matched) In the following simple example it seems to
match a white space although in a manner that is confusing as I will
point out but in other examples with many more characters and white
space in the string to be matched I have counted the position where the
? is said to be matched and the position reported does not fall on a
white space at all.

// Expression
\w*@\w*\.\w*((\.\w*)*)?

// String to match
An example so*****@somewhere.com of an email address.

Expresso reports 1:? at Postion 32 Length 0 which infers white space in
the simple example as given noting there was white space characters
before the matched characters and motivating one to ask why Expresso
would ignore those previous white space characters and then report 2:?
at Position 0 Length 0 which suggests the parser returned to the
beginning of the string to be matched and found what?

Is this clear as mud or what :-)

<%= Clinton Gallagher
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
Hi Clinton,

The following Regular Expression will give you the ability to do a
Regex.Replace on a string containing both single "&" characters and
"&amp;" strings. It captures the "&amp;" strings into their own
separate matches, and the "&" characters into their own matches,
putting the "&" characters into a Group. It is also case-insensitive:

(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))

Here's some sample code for reeplacing the single "&" characters with
&amp; -

/// <summary>
/// Replaces Ampersand in a Match with "&amp;"
/// </summary>
/// <param name="m">Match</param>
/// <returns>Replaced Match value</returns>
public static string ampReplacer(Match m)
{
if (m.Groups[1].Captures.Count == 0) return m.Value;
return m.Value.Replace("&", "&amp;");
}

/// <summary>
/// Replaces all single Ampersand characters in a string with "&amp;"
/// </summary>
/// <param name="s">String to process</param>
/// <returns>Processed String</returns>
public static string ReplaceAmpersand(string s)
{
return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
new MatchEvaluator(ampReplacer));
}

The "ampReplacer function is the function passed as the MatchEvaluator
delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
method. The "ReplaceAmpersand" method takes a string as an argument,
and uses Regex.Replace to replace all matches in the string that
contain a value in Groups[1] with "&amp;".

As a side note, I used both Expresso and Regex Buddy to come up with
this. It was indeed a challenge, as I'm not quite a master of Regular
Expressions. But I enjoy learning, so it was a good exercise for me! :)

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:%2***************@tk2msftngp13.phx.gbl...
> Kevin, have you ever heard the expression "preaching to the choir?"
> :-)
>
> I've got the basic pattern matching theory understood but its the use
> of expressions to disallow or replace certain characters and/or
> strings that I'm trying to really understand thoroughly. The following
> example illustrates...
>
> // Example
> Lawn Mowers, Repairs & Services - lawnmowers.com
>
> A typical page title that when entered into a TextBox meant to capture
> string data for an RSS 2.0 title element should use &amp; instead of
> the & to represent the ampersand. I've got an expression that works
> well for the example but can't figure out (with the expression I have)
> how to match the & and replace it with &amp; (yet) -- or -- how to use
> the expression I have to force the 2.0 Regular Expression Validator to
> fail when the & is present in the string.
>
> // Expression
> [a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*
>
> I also really appreciate Expresso's Analyzer. It is outstanding that
> Expresso seems to make it easy for us to pick expressions apart piece
> by piece and explain them in English.
>
>
> <%= Clinton Gallagher
>
>
>
>
>
>
> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
> news:Ow****************@tk2msftngp13.phx.gbl...
>> Hi Juan,
>>
>>> The kind of RegEx tool I'd like is one which can take a string
>>> I write, and create a RegEx expression which matches it.
>>
>> The problem with that is that you can write a Regular Expression that
>> matches a literal string quite easily. For example:
>>
>> literal string
>>
>> The above is a regular expression which will match the substring
>> "literal string" in my first sentence. Of course, the real power of
>> regular expressions is the abilty to match *patterns* in a string,
>> perform grouping, etc. So, like any programming language (which it
>> is, in a sense), Regular Expressions have a shorthand syntax that
>> allows one to create patterns of a large variety of types. A simple
>> example of this would be:
>>
>> (literal) (string)
>>
>> This captures the same match as the first, but puts the string
>> "literal" into a group, and the string "string" into a second group.
>> But of course, we have already exceeded your desired requirement. On
>> the other hand, we have made a regular expression that is perhaps
>> more useful (in some situations) than the first.
>>
>> And of course, the possible types and combinations of patterns are
>> almost endless, including wildcard patterns, special characters,
>> boolean rules, and so on.
>>
>> Yeah, it's like reading some kind of incredibly concise shorthand
>> code, without even line breaks or brackets to help. That's why I was
>> so pleased to see that Expresso allows you to break your regular
>> expression across multiple lines while building it. That helps a good
>> bit!
>>
>> --
>> HTH,
>>
>> Kevin Spencer
>> Microsoft MVP
>> .Net Developer
>> Ambiguity has a certain quality to it.
>>
>> "Juan T. Llibre" <no***********@nowhere.com> wrote in message
>> news:ei**************@TK2MSFTNGP12.phx.gbl...
>>> The kind of RegEx tool I'd like is one which can take a string
>>> I write, and create a RegEx expression which matches it.
>>>
>>> *That* will be the RegEx tool that will corner the market.
>>>
>>>
>>>
>>>
>>> Juan T. Llibre, ASP.NET MVP
>>> ASP.NET FAQ : http://asp.net.do/faq/
>>> Foros de ASP.NET en Español : http://asp.net.do/foros/
>>> ======================================
>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
>>> message news:Of**************@TK2MSFTNGP10.phx.gbl...
>>>> Thanks Kevin. I saw that post too and am going to download Expresso
>>>> in a few minutes. I know you don't need to be psychic to figure out
>>>> what I'm likely to be asking next :-)
>>>>
>>>> <%= Clinton Gallagher
>>>>
>>>>
>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
>>>> news:O0**************@tk2msftngp13.phx.gbl...
>>>>>I saw a response to this question in the CSharp group, regarding a
>>>>>product named "Expresso"
>>>>>
>>>>> http://www.ultrapico.com/Expresso.htm
>>>>>
>>>>> Expresso is .Net freeware, and after downloading, installing, and
>>>>> playing with it, I'd give it a try! So far I have found it to be
>>>>> excellent, having capabilities that Regex Buddy does not have, and
>>>>> a much more intuitive GUI.
>>>>>
>>>>> --
>>>>> HTH,
>>>>>
>>>>> Kevin Spencer
>>>>> Microsoft MVP
>>>>> .Net Developer
>>>>> Ambiguity has a certain quality to it.
>>>>>
>>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in
>>>>> message news:%2****************@TK2MSFTNGP12.phx.gbl...
>>>>>> Hi Clinton,
>>>>>>
>>>>>> Yes, I have it. I previously used the freeware Regex Coach
>>>>>> Utility, but it is nowhere near as complete in its support for
>>>>>> various newer Regular Expression syntax and programming languages
>>>>>> in general. It did have one nice feature about it. You could
>>>>>> split a Regular Expression across multiple lines, which often
>>>>>> made it easier to analyze. However, Regex Buddy has the graphical
>>>>>> tree view, and it is synchronized with the Regular Expression
>>>>>> itself, which more than makes up for the omission of breaking a
>>>>>> Regular Expression across multiple lines.
>>>>>>
>>>>>> BTW, it also has a GREP utility built in.
>>>>>>
>>>>>> In short, it is well worth the 30 bucks.
>>>>>>
>>>>>> --
>>>>>> HTH,
>>>>>>
>>>>>> Kevin Spencer
>>>>>> Microsoft MVP
>>>>>> .Net Developer
>>>>>> Ambiguity has a certain quality to it.
>>>>>>
>>>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote
>>>>>> in message news:%2****************@TK2MSFTNGP15.phx.gbl...
>>>>>>>I was looking at PowerGrep from the same dev group but like Regex
>>>>>>>Buddy I don't like the buy before you try business model so that
>>>>>>>choice has to be on the shelf for the moment but thanks for
>>>>>>>bringing it up. I assume you've used Regex Buddy?
>>>>>>>
>>>>>>> <%= Clinton Gallagher
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in
>>>>>>> message news:%2***************@tk2msftngp13.phx.gbl...
>>>>>>>> Regex Buddy is very good. It costs around $30.00, includes
>>>>>>>> quite a few nice features, including the ability to copy
>>>>>>>> regular expressions in various language string syntaxes,
>>>>>>>> including C#. It has the ability to create libraries of regular
>>>>>>>> expressions, a nice visual builder, color-coding, and quite a
>>>>>>>> bit more. Good testing environment. And it has some nice
>>>>>>>> reference material included.
>>>>>>>>
>>>>>>>> --
>>>>>>>> HTH,
>>>>>>>>
>>>>>>>> Kevin Spencer
>>>>>>>> Microsoft MVP
>>>>>>>> .Net Developer
>>>>>>>> Ambiguity has a certain quality to it.
>>>>>>>>
>>>>>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote
>>>>>>>> in message news:%2******************@tk2msftngp13.phx.gbl...
>>>>>>>>> I'm using an .aspx tool I found at [1] but as nice as the
>>>>>>>>> interface is I think I need to consider using others. Some can
>>>>>>>>> generate C# I understand. Your preferences please...
>>>>>>>>>
>>>>>>>>> <%= Clinton Gallagher
>>>>>>>>>
>>>>>>>>> [1] http://forta.com/books/0672325667/
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>



Nov 19 '05 #17

P: n/a
The RegEx Coach [1] treeview is interesting and insightful. Worth the
download but note the resizing, insertion points and related Windows I/O
events are clumsy. This software apears to be developed by an academic who
is a pure perl advocate. I'll review the resources you provided. Thanks.

<%= Clinton Gallagher

[1] http://www.weitz.de/regex-coach/
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:uJ**************@TK2MSFTNGP14.phx.gbl...
Hi Clinton,

Your remarks piqued my curiosity. I have found a few technical articles on
the inner working of regular expressions:

http://www.cs.rochester.edu/u/leblanc/csc173/fa/
http://perldoc.perl.org/perlre.html#...ar-Expressions
http://research.microsoft.com/projects/greta/
http://en.wikipedia.org/wiki/Regular...anguage_theory

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.

"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in message
news:ej*************@tk2msftngp13.phx.gbl...
Hello Kevin,

With exception of bass ackward Lookarounds, when you say expressions are
sequential in nature that is exactly what I am trying to nail down.

What confuses me is they appear to be constructed using set theory like
algabraic expressions whose terms are resolved from the inner most set
working outwards yet when a tool processes them they appear to be parsed
linearly as if serialized. Does that make sense to you?

I sure wish Microsoft would publish a document explaining how the
expressions are parsed the same way they have explained how the life
cycle of a page is processed. The control tree is easy to understand
theoretically. I can only wonder why it can not be similarly mapped and
explained for regular expressions.

I've tried the following:

* RegexDesigner.Net
* Rad Regex Designer
* Regular Expression Tester
* Regulator RegEx
* Expresso

Downloading this afternoon...

* Regex Coach [1] donationware
Which is also said to have an English language 'analyzer' like Expresso
which I consider to have the best interface and functionality but still
can't make a genius from a dummy :-).

I'm going to delve into some lists and forums [2] for the next week to
see what I can learn.
This book [3] sounds interesting. I'll take a trip to Barnes and Noble to
see if its on the shelves.
I also have this book [4] which is very good despite the "10 Minutes"
inferences and of course I'll be looking for [5] from O'Rielly.

Let's see what that does for now.

<%= Clinton Gallagher
[1] http://www.weitz.de/regex-coach/
[2] http://regexadvice.com/
[3]
http://books.slashdot.org/article.pl...&tid=192&tid=6
[4] http://forta.com/books/0672325667/
[5] http://www.oreilly.com/catalog/regex...L-4GV796923290

I appreciate your interest in this context. I've discovered a regular
expression mailint list eminates from

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:uH**************@TK2MSFTNGP10.phx.gbl...
Hi Clinton,

Regular Expressions are a bear to learn, ieven if you have good tools to
work with them. I've spent hours working out a relatively "simple" one
(at least it seemed simple at first), but learning a bit more with each
hour. Still, I'm a long way from an expert. I can read most of it fairly
well by now, but certain concepts are still a bit difficult to deal
with. I still struggle some with Lookarounds in particular. One thing to
keep in mind is that Regular Expressions consume a string as they move
through it, with a few exceptions (like Lookarounds). They are basically
sequential in nature.

You may find the "Analyze" tool helpful with this sort of thing.
Fortunately, I have not 2 but THREE Regular Expression tools to work
with (2 of them are Freeware), which enables me to use the one(s) that
are best for the particular type of work I need regarding any individual
Regular Expression and/or problem with one.

The expression you posted,

\w*@\w*\.\w*((\.\w*)*)?

Can be analyzed in so many words as (with the parsing of the email
address where the match begins):

Match any word character, zero or more times. \w*
someone
Next, Match the '@' character once. @ @
Next match any word character zero or more times \w*
somewhere
Next, Match the '.' character once
\. .
Next, Match any word character zero or more times \w* com
Next, put the following into Group 1 zero or 1 time: (......)?
Match the following into Group 2 zero or more times: (......)*
Match the '.' character once \.
Match any word character zero or more times \w*
Result of Group 1: (\.\w*)* Group 2 (Nothing)
Result of Group 2 \.\w* Nothing

Basically, there is no match for either Group 1 or Group 2, as the '.'
has been consumed by the previous Match. However, as both Groups specify
a minimum of Zero times, they don't disqualify the Match, as they appear
zero times each.

Why does Expresso report Group 1 at position 32 (end of string)? Well,
no match has been returned prior to the end of the string. So, that's
where the null match begins. Why does Expressio begin at position 0?
Well, I'm not that good with it!

Still, your regular expression is a bit lax in terms of standards. We
worked one up for valid email addresses the other day, and you may want
to borrow it:

(?i)([-.\w]+)\@(?:((?:\d{1,3}\.){3}\d{1,3})|([-a-z0-9]+(?:\.[-a-z0-9]+)*)\.((?:com|edu|gov|int|mil|net|org|biz|info|nam e|museum|coop|aero|[a-z]{2})))

It is case-insensitive, and matches both domain name and IP domain email
addresses. It puts the results into 4 possible groups:

1-User Name, 2-Domain IP Address, 3-Domain Name, 4-Root Domain.

Note that groups 2 and (3,4) are exclusive of one another. The email
address can either be an IP address, or a named domain, but not both. It
supports 2-letter country suffixes, and multiple-dot domain addresses.
And it's case-sensitive.

I'm not sure we covered all the possible permutations, but it's pretty
strong.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.
Basically, the whole string has been consumed by the
"clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
message news:%2****************@TK2MSFTNGP14.phx.gbl...
Hello Kevin,

Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
working this out. Its one of those 'must know' issues one needs to be
concerned with when generating valid XML from an application. I'll be
working with it later today and I'm starting to get a feel for Expresso
which I have a question about. I'm at the point where I've almost come
to understand how expressions are actually processed which -- for me --
means I will understand how I need to think to put them together.
You've been a real help again and your source is an inspiration which
shows how elegant self-documenting code can be.

As for the Expresso question, what is 1:? supposed to indicate? (noting
that's the closest I could come at the moment to replicate the
rectangular 'non-printable' character Expresso uses to indicate some
'thing' it has matched) In the following simple example it seems to
match a white space although in a manner that is confusing as I will
point out but in other examples with many more characters and white
space in the string to be matched I have counted the position where the
? is said to be matched and the position reported does not fall on a
white space at all.

// Expression
\w*@\w*\.\w*((\.\w*)*)?

// String to match
An example so*****@somewhere.com of an email address.

Expresso reports 1:? at Postion 32 Length 0 which infers white space in
the simple example as given noting there was white space characters
before the matched characters and motivating one to ask why Expresso
would ignore those previous white space characters and then report 2:?
at Position 0 Length 0 which suggests the parser returned to the
beginning of the string to be matched and found what?

Is this clear as mud or what :-)

<%= Clinton Gallagher
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
> Hi Clinton,
>
> The following Regular Expression will give you the ability to do a
> Regex.Replace on a string containing both single "&" characters and
> "&amp;" strings. It captures the "&amp;" strings into their own
> separate matches, and the "&" characters into their own matches,
> putting the "&" characters into a Group. It is also case-insensitive:
>
> (?i)[^&amp;][^&]*|&amp;|(&(?!=amp))
>
> Here's some sample code for reeplacing the single "&" characters with
> &amp; -
>
> /// <summary>
> /// Replaces Ampersand in a Match with "&amp;"
> /// </summary>
> /// <param name="m">Match</param>
> /// <returns>Replaced Match value</returns>
> public static string ampReplacer(Match m)
> {
> if (m.Groups[1].Captures.Count == 0) return m.Value;
> return m.Value.Replace("&", "&amp;");
> }
>
> /// <summary>
> /// Replaces all single Ampersand characters in a string with "&amp;"
> /// </summary>
> /// <param name="s">String to process</param>
> /// <returns>Processed String</returns>
> public static string ReplaceAmpersand(string s)
> {
> return Regex.Replace(s, @"(?i)[^&amp;][^&]*|&amp;|(&(?!=amp))",
> new MatchEvaluator(ampReplacer));
> }
>
> The "ampReplacer function is the function passed as the MatchEvaluator
> delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
> method. The "ReplaceAmpersand" method takes a string as an argument,
> and uses Regex.Replace to replace all matches in the string that
> contain a value in Groups[1] with "&amp;".
>
> As a side note, I used both Expresso and Regex Buddy to come up with
> this. It was indeed a challenge, as I'm not quite a master of Regular
> Expressions. But I enjoy learning, so it was a good exercise for me!
> :)
>
> --
> HTH,
>
> Kevin Spencer
> Microsoft MVP
> .Net Developer
> Ambiguity has a certain quality to it.
>
> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
> message news:%2***************@tk2msftngp13.phx.gbl...
>> Kevin, have you ever heard the expression "preaching to the choir?"
>> :-)
>>
>> I've got the basic pattern matching theory understood but its the use
>> of expressions to disallow or replace certain characters and/or
>> strings that I'm trying to really understand thoroughly. The
>> following example illustrates...
>>
>> // Example
>> Lawn Mowers, Repairs & Services - lawnmowers.com
>>
>> A typical page title that when entered into a TextBox meant to
>> capture string data for an RSS 2.0 title element should use &amp;
>> instead of the & to represent the ampersand. I've got an expression
>> that works well for the example but can't figure out (with the
>> expression I have) how to match the & and replace it with &amp;
>> (yet) -- or -- how to use the expression I have to force the 2.0
>> Regular Expression Validator to fail when the & is present in the
>> string.
>>
>> // Expression
>> [a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*
>>
>> I also really appreciate Expresso's Analyzer. It is outstanding that
>> Expresso seems to make it easy for us to pick expressions apart piece
>> by piece and explain them in English.
>>
>>
>> <%= Clinton Gallagher
>>
>>
>>
>>
>>
>>
>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
>> news:Ow****************@tk2msftngp13.phx.gbl...
>>> Hi Juan,
>>>
>>>> The kind of RegEx tool I'd like is one which can take a string
>>>> I write, and create a RegEx expression which matches it.
>>>
>>> The problem with that is that you can write a Regular Expression
>>> that matches a literal string quite easily. For example:
>>>
>>> literal string
>>>
>>> The above is a regular expression which will match the substring
>>> "literal string" in my first sentence. Of course, the real power of
>>> regular expressions is the abilty to match *patterns* in a string,
>>> perform grouping, etc. So, like any programming language (which it
>>> is, in a sense), Regular Expressions have a shorthand syntax that
>>> allows one to create patterns of a large variety of types. A simple
>>> example of this would be:
>>>
>>> (literal) (string)
>>>
>>> This captures the same match as the first, but puts the string
>>> "literal" into a group, and the string "string" into a second group.
>>> But of course, we have already exceeded your desired requirement. On
>>> the other hand, we have made a regular expression that is perhaps
>>> more useful (in some situations) than the first.
>>>
>>> And of course, the possible types and combinations of patterns are
>>> almost endless, including wildcard patterns, special characters,
>>> boolean rules, and so on.
>>>
>>> Yeah, it's like reading some kind of incredibly concise shorthand
>>> code, without even line breaks or brackets to help. That's why I was
>>> so pleased to see that Expresso allows you to break your regular
>>> expression across multiple lines while building it. That helps a
>>> good bit!
>>>
>>> --
>>> HTH,
>>>
>>> Kevin Spencer
>>> Microsoft MVP
>>> .Net Developer
>>> Ambiguity has a certain quality to it.
>>>
>>> "Juan T. Llibre" <no***********@nowhere.com> wrote in message
>>> news:ei**************@TK2MSFTNGP12.phx.gbl...
>>>> The kind of RegEx tool I'd like is one which can take a string
>>>> I write, and create a RegEx expression which matches it.
>>>>
>>>> *That* will be the RegEx tool that will corner the market.
>>>>
>>>>
>>>>
>>>>
>>>> Juan T. Llibre, ASP.NET MVP
>>>> ASP.NET FAQ : http://asp.net.do/faq/
>>>> Foros de ASP.NET en Español : http://asp.net.do/foros/
>>>> ======================================
>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote in
>>>> message news:Of**************@TK2MSFTNGP10.phx.gbl...
>>>>> Thanks Kevin. I saw that post too and am going to download
>>>>> Expresso in a few minutes. I know you don't need to be psychic to
>>>>> figure out what I'm likely to be asking next :-)
>>>>>
>>>>> <%= Clinton Gallagher
>>>>>
>>>>>
>>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in
>>>>> message news:O0**************@tk2msftngp13.phx.gbl...
>>>>>>I saw a response to this question in the CSharp group, regarding a
>>>>>>product named "Expresso"
>>>>>>
>>>>>> http://www.ultrapico.com/Expresso.htm
>>>>>>
>>>>>> Expresso is .Net freeware, and after downloading, installing, and
>>>>>> playing with it, I'd give it a try! So far I have found it to be
>>>>>> excellent, having capabilities that Regex Buddy does not have,
>>>>>> and a much more intuitive GUI.
>>>>>>
>>>>>> --
>>>>>> HTH,
>>>>>>
>>>>>> Kevin Spencer
>>>>>> Microsoft MVP
>>>>>> .Net Developer
>>>>>> Ambiguity has a certain quality to it.
>>>>>>
>>>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in
>>>>>> message news:%2****************@TK2MSFTNGP12.phx.gbl...
>>>>>>> Hi Clinton,
>>>>>>>
>>>>>>> Yes, I have it. I previously used the freeware Regex Coach
>>>>>>> Utility, but it is nowhere near as complete in its support for
>>>>>>> various newer Regular Expression syntax and programming
>>>>>>> languages in general. It did have one nice feature about it. You
>>>>>>> could split a Regular Expression across multiple lines, which
>>>>>>> often made it easier to analyze. However, Regex Buddy has the
>>>>>>> graphical tree view, and it is synchronized with the Regular
>>>>>>> Expression itself, which more than makes up for the omission of
>>>>>>> breaking a Regular Expression across multiple lines.
>>>>>>>
>>>>>>> BTW, it also has a GREP utility built in.
>>>>>>>
>>>>>>> In short, it is well worth the 30 bucks.
>>>>>>>
>>>>>>> --
>>>>>>> HTH,
>>>>>>>
>>>>>>> Kevin Spencer
>>>>>>> Microsoft MVP
>>>>>>> .Net Developer
>>>>>>> Ambiguity has a certain quality to it.
>>>>>>>
>>>>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com> wrote
>>>>>>> in message news:%2****************@TK2MSFTNGP15.phx.gbl...
>>>>>>>>I was looking at PowerGrep from the same dev group but like
>>>>>>>>Regex Buddy I don't like the buy before you try business model
>>>>>>>>so that choice has to be on the shelf for the moment but thanks
>>>>>>>>for bringing it up. I assume you've used Regex Buddy?
>>>>>>>>
>>>>>>>> <%= Clinton Gallagher
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> "Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in
>>>>>>>> message news:%2***************@tk2msftngp13.phx.gbl...
>>>>>>>>> Regex Buddy is very good. It costs around $30.00, includes
>>>>>>>>> quite a few nice features, including the ability to copy
>>>>>>>>> regular expressions in various language string syntaxes,
>>>>>>>>> including C#. It has the ability to create libraries of
>>>>>>>>> regular expressions, a nice visual builder, color-coding, and
>>>>>>>>> quite a bit more. Good testing environment. And it has some
>>>>>>>>> nice reference material included.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> HTH,
>>>>>>>>>
>>>>>>>>> Kevin Spencer
>>>>>>>>> Microsoft MVP
>>>>>>>>> .Net Developer
>>>>>>>>> Ambiguity has a certain quality to it.
>>>>>>>>>
>>>>>>>>> "clintonG" <cs*********@REMOVETHISTEXTmetromilwaukee.com>
>>>>>>>>> wrote in message
>>>>>>>>> news:%2******************@tk2msftngp13.phx.gbl...
>>>>>>>>>> I'm using an .aspx tool I found at [1] but as nice as the
>>>>>>>>>> interface is I think I need to consider using others. Some
>>>>>>>>>> can generate C# I understand. Your preferences please...
>>>>>>>>>>
>>>>>>>>>> <%= Clinton Gallagher
>>>>>>>>>>
>>>>>>>>>> [1] http://forta.com/books/0672325667/
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>



Nov 19 '05 #18

This discussion thread is closed

Replies have been disabled for this discussion.