By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,677 Members | 1,303 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,677 IT Pros & Developers. It's quick & easy.

Parsing C# string usinsg RegEx

P: n/a
Hi there,

I have another question for .NET RegEx experts.

I am reading in a C Sharp file line by line and I am trying to detect
comments that start with either // of ///. What I am particularly
interested is the comments themselves. I am interested in some stats with
regards to the amount of comments in the file (comment bytes).

So, I tried several regular expressions, but they don't seem to work in
all the cases.

Here are the cases that I need to cover:

a. /// comments or // comments
b. /// <xml-tag> comments </xml-tag>
c. /// <xml-tag> comments <another xml-tag> comments </another xml-tag>
comments </xml-tag>
d. /// <xml-tag>
e. /// </xml-tag>

I need to be able to capture the comments but not the xml tags.

Here are a few of regular expressions that I have tried but
unsuccessfully.

@"^.*?///?\s*((</?.+>)*(?<comments>.*))*$"
@"///?\s*(</?.+>)*(?<comments>.*)"

I am having difficulty capturing multiple comments if they are separated
by xml tags. For some odd reason, if I have more than one set of tags,
the returned result is always the right most set of comments.

Thanks so much for any input!
Natalia

Nov 16 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Natalie, you need to grab the comments with XML and then post-process what
you've
grabbed using an XML dom. You can easily modify the last regex that I sent to
allow
for documentation comments ///, append all such instances into a string to
process as XML.

Regex regex = new Regex(
"(?ms)(?# Specify our options )" +
"^.*?((?<lineComment>///?)|/\\*)" +
"(?<comments>.*?)" +
"(?(lineComment)$|\\*/)");

if ( match.Groups["lineComment"].Value == "///" ) {
string xmlString += match.Groups["comments"].Value;
}

Expressions are not a jack of all trades, nor are they the best or fastest
parsing structure for
all cases. Use the right tool for the job. Hope this helps in your endeavor.
--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

"Natalia DeBow" <na***********@unisys.com> wrote in message
news:c9**********@si05.rsvl.unisys.com...
Hi there,

I have another question for .NET RegEx experts.

I am reading in a C Sharp file line by line and I am trying to detect
comments that start with either // of ///. What I am particularly
interested is the comments themselves. I am interested in some stats with
regards to the amount of comments in the file (comment bytes).

So, I tried several regular expressions, but they don't seem to work in
all the cases.

Here are the cases that I need to cover:

a. /// comments or // comments
b. /// <xml-tag> comments </xml-tag>
c. /// <xml-tag> comments <another xml-tag> comments </another xml-tag>
comments </xml-tag>
d. /// <xml-tag>
e. /// </xml-tag>

I need to be able to capture the comments but not the xml tags.

Here are a few of regular expressions that I have tried but
unsuccessfully.

@"^.*?///?\s*((</?.+>)*(?<comments>.*))*$"
@"///?\s*(</?.+>)*(?<comments>.*)"

I am having difficulty capturing multiple comments if they are separated
by xml tags. For some odd reason, if I have more than one set of tags,
the returned result is always the right most set of comments.

Thanks so much for any input!
Natalia

Nov 16 '05 #2

P: n/a
Hi, inline

"Natalia DeBow" <na***********@unisys.com> wrote in message
news:c9**********@si05.rsvl.unisys.com...
Hi there,

I have another question for .NET RegEx experts.

I am reading in a C Sharp file line by line and I am trying to detect
comments that start with either // of ///. What I am particularly
interested is the comments themselves. I am interested in some stats with
regards to the amount of comments in the file (comment bytes).

So, I tried several regular expressions, but they don't seem to work in
all the cases.

Here are the cases that I need to cover:

a. /// comments or // comments
b. /// <xml-tag> comments </xml-tag>
c. /// <xml-tag> comments <another xml-tag> comments </another xml-tag>
comments </xml-tag>
d. /// <xml-tag>
e. /// </xml-tag>

I need to be able to capture the comments but not the xml tags.

Here are a few of regular expressions that I have tried but
unsuccessfully.

@"^.*?///?\s*((</?.+>)*(?<comments>.*))*$"
@"///?\s*(</?.+>)*(?<comments>.*)"
Problems:
1) '.+' inside "</?.+>", will match anything including '>'
2) '.*' inside (?<comments>.*), will match anything including '<'

I suggest trying this:

strRex = @"///?\s(?:(?:<[^>]+>)|(?<comments>[^<]+))*";

Case d and e will not match, because they don't contain any comments you
want.

HTH,
greetings


I am having difficulty capturing multiple comments if they are separated
by xml tags. For some odd reason, if I have more than one set of tags,
the returned result is always the right most set of comments.

Thanks so much for any input!
Natalia

Nov 16 '05 #3

P: n/a
There is a great Visual Studio .NET add-in called Project Line Counter
you should have a look at downloadable from www.wndtabs.com

"Natalia DeBow" <na***********@unisys.com> wrote in message
news:c9**********@si05.rsvl.unisys.com...
Hi there,

I have another question for .NET RegEx experts.

I am reading in a C Sharp file line by line and I am trying to detect
comments that start with either // of ///. What I am particularly
interested is the comments themselves. I am interested in some stats with
regards to the amount of comments in the file (comment bytes).

So, I tried several regular expressions, but they don't seem to work in
all the cases.

Here are the cases that I need to cover:

a. /// comments or // comments
b. /// <xml-tag> comments </xml-tag>
c. /// <xml-tag> comments <another xml-tag> comments </another xml-tag>
comments </xml-tag>
d. /// <xml-tag>
e. /// </xml-tag>

I need to be able to capture the comments but not the xml tags.

Here are a few of regular expressions that I have tried but
unsuccessfully.

@"^.*?///?\s*((</?.+>)*(?<comments>.*))*$"
@"///?\s*(</?.+>)*(?<comments>.*)"

I am having difficulty capturing multiple comments if they are separated
by xml tags. For some odd reason, if I have more than one set of tags,
the returned result is always the right most set of comments.

Thanks so much for any input!
Natalia

Nov 16 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.