Ioannis Vranos wrote:
Carl Daniel [VC++ MVP] wrote:
1) Isn't the "[a-zA-Z]+|[a-zA-z]+" part redundant? As far as I can
understand it means exactly the same as "[a-zA-Z]+" alone.
No, because of the alternative - it's
[a-zA-Z]+
-or-
[a-zA-z]+\\s[a-zA-Z]+
I did not understand what you mean with the above. May you explain
with some details?
The alternative operation has low precendence - lower than concatenation, so
(bob|joe|sue)
parses as 'bob' or 'joe' or 'sue' not as 'bo'+('b' or 'j')+'o'+('e' or
's')+'ue'
similarly,
[a-zA-Z]+|[a-zA-Z]+\\s+[a-zA-Z]+
parses as
'[a-zA-Z]+' or '[a-zA-Z]+\\s[a-zA-Z]+'
instead of
('[a-zA-Z]+' or '[a-zA-Z]+')\\s+[a-zA-Z]+
does that make sense?
The original expression could be factored, since the alternatives have a
common prefix:
[a-zA-Z]+(\\s+[a-zA-Z]+)?
I would expect a DFA-based regex engine might well do that factoring as a
matter of course when computing the DFA.
-cd