I'm trying to put tegether a parser for parsing an HTTP user-agent string.
According to this doument:
http://www.texsoft.it/index.php?c=so...useragent&l=it
the following POSIX regex:
^([^/[:space:]]*)(/([^[:space:]]*))?([[:space:]]*\[[a-zA-Z][a-zA-Z]\])?[[:space:]]*(\\((([^()]|(\\([^()]*\\)))*)\\))?[[:space:]]*
when run on something like this:
Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.10) Gecko/20050716
Firefox/1.0.6
should give the following matches:
Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.10)
Gecko/20050716
Firefox/1.0.6
Unfortunately the patten doesn't work in .NET. It gives something like
this:
<blank>
Mozilla
/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.10) Gecko/20050716
Firefox/1.0.6
5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.10) Gecko/20050716
Firefox/1.0.6
<blank>
<blank>
I've tried in my own C# code as well as with Expresso. They give the same
matches as each other, but not what the doc claims it should get (and what I
need to get).
Now I know only enough about regex to know that it's a black art that I
really don't want to know about. Anyone see anything obviously wrong with
the pattern?
-Chris