Hi Goran,
Yep,
It is not yet robust.
But the main thing IMO is that the overall comment structure is broken down
into its
component parts and each part is addressed by a group.
Of the extra examples you gave, only the single line comment with text
failed.
This is can addressed by (Single Line Comment) | (Multline Comment)
MultiLine Comment being my original post.
My Single Line offering is (\\x2F\\x2A\\s*[\\w\\s]*\\*/)
I can't see how you can simplify it down much from this without running into
the problem you mentioned earlier, namely matching code as well as comments.
My expression fails if you have a string assignment in code that imitates a
comment e.g. string s ="/* Some text */";
So 'Not quotes' (negative look ahead)? should be put on the front of all
groups. I tried it but couldn't stop the match.
The trouble with this empirical approach is you find holes and patch them
but you can't be sure you have found all the holes.
Unless your a regex expert which I am not.
If you can come up with a simpler robust regex that picks out the comments
and leaves the 'code' I would like to see it.
The new test text now follows.
regards
Bob
/* Comment Single Line space at front abd*/
/*Comment Single Line spaceless abd*/
/* comment two
I am a plain line
So am I
* comment three
*Comment 4
*/
Code Begins
int r = 2*3;
x = 5/3;//Inline Comment fails but do we want to grab it?
y=2^6;
string s = "/* this is a failing test string*/";
/* Comment A
* comment B
* comment C
*/
/* */
/**/
"Göran Andersson" <gu***@guffa.comwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
Bob wrote:
>Hi Yoni,
Have a play with
(\\x2F\\x2A\\x20[\\w\\s]+\\s)(\\x2A\\x20*[\\w\\s]+\\s)+(\\x2A\\x2F)
Seemed to work in regexBuddy with a test text of:
/* Comment 1
* comment two
* comment three
*Comment 4
*/
Code
2*3=3;
int r = 2*3;
/* Comment A
* comment B
* comment C
*/
The regex
captures 3 different type of lines in 3 different groups.
1) Comment Begin
2) Comment body
3) Comment end.
However, while regexbuddy listed all comment lines as being matched when
I did a list of all matches.
Only the last Body statement of each comment showed up in the individual
listing for the Comment body group.
So there may be a problem.
hth
Bob
<yo**@nobhillsoft.comwrote in message
news:11**********************@p10g2000cwp.googleg roups.com...
>>Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:
/\*.*\*/
Doesnt work... Anybody sees anything wrong with that? thanks
I think that you are over-complicating it. Also you are assuming things
that are not at all required in a comment. There is no space required
after the start of the comment, and it doesn't have to contain lines that
begin with asterisks.
/*This is a perfectly legal comment*/
/*So
is
this*/
/*And the following too:*/
/**/
--
Göran Andersson
_____
http://www.guffa.com