By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,734 Members | 832 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,734 IT Pros & Developers. It's quick & easy.

Parsing in between strings using Regex

P: n/a
CJ
Is this the format to parse a string and return the value between the item?

Regex pRE = new Regex("<File_Name>.*>(?<insideText>.*)</File_Name>");

I am trying to parse this string.

<File_Name>Services</File_Name>
Thanks
Mar 3 '08 #1
Share this Question
Share on Google+
4 Replies


P: n/a
CJ wrote:
Is this the format to parse a string and return the value between the item?

Regex pRE = new Regex("<File_Name>.*>(?<insideText>.*)</File_Name>");

I am trying to parse this string.

<File_Name>Services</File_Name>
Regex re = new Regex("<File_Name>(?<insideText>.*)</File_Name>");
string fn = re.Match(s).Groups["insideText"].Value;

seems to work.

Arne
Mar 3 '08 #2

P: n/a
CJ
Thanks Arne,

Seems like the ".*" was messing me up.

This regular expression is so hard at times, I don't know how
you guys have this thing figured out.

CJ
"Arne Vajhøj" <ar**@vajhoej.dkwrote in message
news:47***********************@news.sunsite.dk...
CJ wrote:
>Is this the format to parse a string and return the value between the
item?

Regex pRE = new Regex("<File_Name>.*>(?<insideText>.*)</File_Name>");

I am trying to parse this string.

<File_Name>Services</File_Name>

Regex re = new Regex("<File_Name>(?<insideText>.*)</File_Name>");
string fn = re.Match(s).Groups["insideText"].Value;

seems to work.

Arne

Mar 3 '08 #3

P: n/a
Hello cj,
Thanks Arne,

Seems like the ".*" was messing me up.

This regular expression is so hard at times, I don't know how you guys
have this thing figured out.
This looks a lot like XML data. If it is, you really should try to avoid
regex and use XPath to fetch the data you need.

If it isn't wellformed Regex can help you, but the regex you have still has
a few issues in it.

dor one, if your input would contain "<file_name>bbbbbbbbb</file_name><file_name>aaaaaaaaaaaa</file_name>"
you would get this as your whole value:
"bbbbbbbbb</file_name><file_name>aaaaaaaaaaaa". Obviously not what's required.

You can adjust your regex to prevent this from happening in two ways:

1) Use Reluctant Matching
Regex re = new Regex("<File_Name>(?<insideText>.*?)</File_Name>");
string fn = re.Match(s).Groups["insideText"].Value;

2) Use a negative Look Ahead
Regex re = new Regex("<File_Name>(?<insideText>((?!</File_Name>).)*)</File_Name>");
string fn = re.Match(s).Groups["insideText"].Value;

One thing that migth also catch up with you is afile that is formatted like
this (let's hope the newsreader will leave this in tact):
<file_name>
bbbbbbbbb
</file_name>

This is probably syntactically correct, but as . normally doesn't match over
the end of a line, it will require you to use an extra switch in your regex
constructor (either case) which will allow . to match newline.
Regex re = new Regex("your regex here", RegexOptions.Singleline);

Alternatively you could 'eat up' all whitespace around the File_Name. But
only if you're very sure the filename itself will never contain a newline
or have whitespace in it at the strat or end of the filename.

1)
Regex re = new Regex("<File_Name>\s*(?<insideText>.*?)\s*</File_Name>");
2)
Regex re = new Regex("<File_Name>\s*(?<insideText>((?!</File_Name>).)*?)\s*</File_Name>");

Kind Regards,

Jesse Houwing
>
CJ

"Arne Vajhøj" <ar**@vajhoej.dkwrote in message
news:47***********************@news.sunsite.dk...
>CJ wrote:
>>Is this the format to parse a string and return the value between
the item?

Regex pRE = new
Regex("<File_Name>.*>(?<insideText>.*)</File_Name>");

I am trying to parse this string.

<File_Name>Services</File_Name>
Regex re = new Regex("<File_Name>(?<insideText>.*)</File_Name>");
string fn = re.Match(s).Groups["insideText"].Value;

seems to work.

Arne
--
Jesse Houwing
jesse.houwing at sogeti.nl
Mar 3 '08 #4

P: n/a
Hi,

"CJ" <cj******@noemail.comwrote in message
news:e6*************@TK2MSFTNGP06.phx.gbl...
Thanks Arne,

Seems like the ".*" was messing me up.

This regular expression is so hard at times, I don't know how
you guys have this thing figured out.
Practice, you should try it a couple of times until you find the correct way

Also a book would help you ;)

Mar 3 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.