468,170 Members | 2,185 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,170 developers. It's quick & easy.

a quick regexp question

Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks

Mar 4 '07 #1
10 1871

<yo**@nobhillsoft.comwrote in message
news:11**********************@p10g2000cwp.googlegr oups.com...
Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks
.. (dot) seems not to match newline character.
another problem is that .* will match as much as possible.
So if you have text "/* comment */ code ... /* second comment /*" you regex
will match whole string from start to end.

/\*(.|\n)*?\*/ seems to work better.
Mar 4 '07 #2
Artur Borecki wrote:
<yo**@nobhillsoft.comwrote in message
news:11**********************@p10g2000cwp.googlegr oups.com...
>Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks
. (dot) seems not to match newline character.
another problem is that .* will match as much as possible.
So if you have text "/* comment */ code ... /* second comment /*" you regex
will match whole string from start to end.

/\*(.|\n)*?\*/ seems to work better.
Addition:

(.|\n) can also be written [.\n].

A set that matches any character can be made by combining any
complementing sets. I usually use [\w\W].

--
Göran Andersson
_____
http://www.guffa.com
Mar 4 '07 #3
Bob
Hi Yoni,
Have a play with
(\\x2F\\x2A\\x20[\\w\\s]+\\s)(\\x2A\\x20*[\\w\\s]+\\s)+(\\x2A\\x2F)
Seemed to work in regexBuddy with a test text of:
/* Comment 1
* comment two
* comment three
*Comment 4
*/
Code
2*3=3;
int r = 2*3;

/* Comment A
* comment B
* comment C
*/

The regex
captures 3 different type of lines in 3 different groups.
1) Comment Begin
2) Comment body
3) Comment end.

However, while regexbuddy listed all comment lines as being matched when I
did a list of all matches.
Only the last Body statement of each comment showed up in the individual
listing for the Comment body group.
So there may be a problem.
hth
Bob

<yo**@nobhillsoft.comwrote in message
news:11**********************@p10g2000cwp.googlegr oups.com...
Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks

Mar 5 '07 #4
Bob wrote:
Hi Yoni,
Have a play with
(\\x2F\\x2A\\x20[\\w\\s]+\\s)(\\x2A\\x20*[\\w\\s]+\\s)+(\\x2A\\x2F)
Seemed to work in regexBuddy with a test text of:
/* Comment 1
* comment two
* comment three
*Comment 4
*/
Code
2*3=3;
int r = 2*3;

/* Comment A
* comment B
* comment C
*/

The regex
captures 3 different type of lines in 3 different groups.
1) Comment Begin
2) Comment body
3) Comment end.

However, while regexbuddy listed all comment lines as being matched when I
did a list of all matches.
Only the last Body statement of each comment showed up in the individual
listing for the Comment body group.
So there may be a problem.
hth
Bob

<yo**@nobhillsoft.comwrote in message
news:11**********************@p10g2000cwp.googlegr oups.com...
>Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks

I think that you are over-complicating it. Also you are assuming things
that are not at all required in a comment. There is no space required
after the start of the comment, and it doesn't have to contain lines
that begin with asterisks.

/*This is a perfectly legal comment*/
/*So
is
this*/
/*And the following too:*/
/**/

--
Göran Andersson
_____
http://www.guffa.com
Mar 5 '07 #5
Bob
Hi Goran,
Yep,
It is not yet robust.
But the main thing IMO is that the overall comment structure is broken down
into its
component parts and each part is addressed by a group.
Of the extra examples you gave, only the single line comment with text
failed.
This is can addressed by (Single Line Comment) | (Multline Comment)
MultiLine Comment being my original post.
My Single Line offering is (\\x2F\\x2A\\s*[\\w\\s]*\\*/)

I can't see how you can simplify it down much from this without running into
the problem you mentioned earlier, namely matching code as well as comments.

My expression fails if you have a string assignment in code that imitates a
comment e.g. string s ="/* Some text */";
So 'Not quotes' (negative look ahead)? should be put on the front of all
groups. I tried it but couldn't stop the match.
The trouble with this empirical approach is you find holes and patch them
but you can't be sure you have found all the holes.
Unless your a regex expert which I am not.
If you can come up with a simpler robust regex that picks out the comments
and leaves the 'code' I would like to see it.

The new test text now follows.
regards
Bob
/* Comment Single Line space at front abd*/
/*Comment Single Line spaceless abd*/
/* comment two
I am a plain line
So am I
* comment three
*Comment 4
*/
Code Begins
int r = 2*3;
x = 5/3;//Inline Comment fails but do we want to grab it?
y=2^6;
string s = "/* this is a failing test string*/";
/* Comment A
* comment B
* comment C
*/
/* */
/**/

"Göran Andersson" <gu***@guffa.comwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
Bob wrote:
>Hi Yoni,
Have a play with
(\\x2F\\x2A\\x20[\\w\\s]+\\s)(\\x2A\\x20*[\\w\\s]+\\s)+(\\x2A\\x2F)
Seemed to work in regexBuddy with a test text of:
/* Comment 1
* comment two
* comment three
*Comment 4
*/
Code
2*3=3;
int r = 2*3;

/* Comment A
* comment B
* comment C
*/

The regex
captures 3 different type of lines in 3 different groups.
1) Comment Begin
2) Comment body
3) Comment end.

However, while regexbuddy listed all comment lines as being matched when
I did a list of all matches.
Only the last Body statement of each comment showed up in the individual
listing for the Comment body group.
So there may be a problem.
hth
Bob

<yo**@nobhillsoft.comwrote in message
news:11**********************@p10g2000cwp.googleg roups.com...
>>Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks


I think that you are over-complicating it. Also you are assuming things
that are not at all required in a comment. There is no space required
after the start of the comment, and it doesn't have to contain lines that
begin with asterisks.

/*This is a perfectly legal comment*/
/*So
is
this*/
/*And the following too:*/
/**/

--
Göran Andersson
_____
http://www.guffa.com

Mar 5 '07 #6

"Bob" <bo*@nowhere.comwrote in message
news:eD****************@TK2MSFTNGP06.phx.gbl...
>
The new test text now follows.
regards
Bob
/* Comment Single Line space at front abd*/
/*Comment Single Line spaceless abd*/
/* comment two
I am a plain line
So am I
* comment three
*Comment 4
*/
Code Begins
int r = 2*3;
x = 5/3;//Inline Comment fails but do we want to grab it?
y=2^6;
string s = "/* this is a failing test string*/";
/* Comment A
* comment B
* comment C
*/
/* */
/**/
my regex /\*(.|\n)*?\*/ works fine for this example.
Mar 5 '07 #7
Bob
Hi Artur,
Difference in engines maybe?
Running your expression in RegexBuddy some lines are missed.
It only picks up the first two, the string assignment and the last two.
regards
Bob
"Artur Borecki" <ab******************@tenbit.plwrote in message
news:ed**************@TK2MSFTNGP05.phx.gbl...
>
"Bob" <bo*@nowhere.comwrote in message
news:eD****************@TK2MSFTNGP06.phx.gbl...
>>
The new test text now follows.
regards
Bob
/* Comment Single Line space at front abd*/
/*Comment Single Line spaceless abd*/
/* comment two
I am a plain line
So am I
* comment three
*Comment 4
*/
Code Begins
int r = 2*3;
x = 5/3;//Inline Comment fails but do we want to grab it?
y=2^6;
string s = "/* this is a failing test string*/";
/* Comment A
* comment B
* comment C
*/
/* */
/**/

my regex /\*(.|\n)*?\*/ works fine for this example.

Mar 5 '07 #8
On Mar 5, 4:38 am, "Artur Borecki" <aboreckiDONTWANTS...@tenbit.pl>
wrote:
my regex /\*(.|\n)*?\*/ works fine for this example.
Yours fails on:

sdgfsdfgsdgsd/*dfgdfgd/*sfsdf*/dgdf*/

I don't know how the others do on this (type of) example.

:)

Mar 5 '07 #9
sherifffruitfly wrote:
On Mar 5, 4:38 am, "Artur Borecki" <aboreckiDONTWANTS...@tenbit.pl>
wrote:
>my regex /\*(.|\n)*?\*/ works fine for this example.

Yours fails on:

sdgfsdfgsdgsd/*dfgdfgd/*sfsdf*/dgdf*/

I don't know how the others do on this (type of) example.

:)
It should match the first /* and the first */, doesn't it?

--
Göran Andersson
_____
http://www.guffa.com
Mar 5 '07 #10
On Mar 5, 1:23 pm, Göran Andersson <g...@guffa.comwrote:
sherifffruitfly wrote:
On Mar 5, 4:38 am, "Artur Borecki" <aboreckiDONTWANTS...@tenbit.pl>
wrote:
my regex /\*(.|\n)*?\*/ works fine for this example.
Yours fails on:
sdgfsdfgsdgsd/*dfgdfgd/*sfsdf*/dgdf*/
I don't know how the others do on this (type of) example.
:)

It should match the first /* and the first */, doesn't it?

--
Göran Andersson
_____http://www.guffa.com
Oops - yes it does - I had an incorrect concept of *failure*. I was
under the erroneous impression that the *outer* comment-delimeters
would define a comment. In fact it's the first-from-left-to-right
matching pair that constitutes a comment.

Shorter version: Nevermind.

:)

Mar 5 '07 #11

This discussion thread is closed

Replies have been disabled for this discussion.

By using this site, you agree to our Privacy Policy and Terms of Use.