473,396 Members | 1,929 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

a quick regexp question

Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks

Mar 4 '07 #1
10 2093

<yo**@nobhillsoft.comwrote in message
news:11**********************@p10g2000cwp.googlegr oups.com...
Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks
.. (dot) seems not to match newline character.
another problem is that .* will match as much as possible.
So if you have text "/* comment */ code ... /* second comment /*" you regex
will match whole string from start to end.

/\*(.|\n)*?\*/ seems to work better.
Mar 4 '07 #2
Artur Borecki wrote:
<yo**@nobhillsoft.comwrote in message
news:11**********************@p10g2000cwp.googlegr oups.com...
>Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks
. (dot) seems not to match newline character.
another problem is that .* will match as much as possible.
So if you have text "/* comment */ code ... /* second comment /*" you regex
will match whole string from start to end.

/\*(.|\n)*?\*/ seems to work better.
Addition:

(.|\n) can also be written [.\n].

A set that matches any character can be made by combining any
complementing sets. I usually use [\w\W].

--
Göran Andersson
_____
http://www.guffa.com
Mar 4 '07 #3
Bob
Hi Yoni,
Have a play with
(\\x2F\\x2A\\x20[\\w\\s]+\\s)(\\x2A\\x20*[\\w\\s]+\\s)+(\\x2A\\x2F)
Seemed to work in regexBuddy with a test text of:
/* Comment 1
* comment two
* comment three
*Comment 4
*/
Code
2*3=3;
int r = 2*3;

/* Comment A
* comment B
* comment C
*/

The regex
captures 3 different type of lines in 3 different groups.
1) Comment Begin
2) Comment body
3) Comment end.

However, while regexbuddy listed all comment lines as being matched when I
did a list of all matches.
Only the last Body statement of each comment showed up in the individual
listing for the Comment body group.
So there may be a problem.
hth
Bob

<yo**@nobhillsoft.comwrote in message
news:11**********************@p10g2000cwp.googlegr oups.com...
Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks

Mar 5 '07 #4
Bob wrote:
Hi Yoni,
Have a play with
(\\x2F\\x2A\\x20[\\w\\s]+\\s)(\\x2A\\x20*[\\w\\s]+\\s)+(\\x2A\\x2F)
Seemed to work in regexBuddy with a test text of:
/* Comment 1
* comment two
* comment three
*Comment 4
*/
Code
2*3=3;
int r = 2*3;

/* Comment A
* comment B
* comment C
*/

The regex
captures 3 different type of lines in 3 different groups.
1) Comment Begin
2) Comment body
3) Comment end.

However, while regexbuddy listed all comment lines as being matched when I
did a list of all matches.
Only the last Body statement of each comment showed up in the individual
listing for the Comment body group.
So there may be a problem.
hth
Bob

<yo**@nobhillsoft.comwrote in message
news:11**********************@p10g2000cwp.googlegr oups.com...
>Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks

I think that you are over-complicating it. Also you are assuming things
that are not at all required in a comment. There is no space required
after the start of the comment, and it doesn't have to contain lines
that begin with asterisks.

/*This is a perfectly legal comment*/
/*So
is
this*/
/*And the following too:*/
/**/

--
Göran Andersson
_____
http://www.guffa.com
Mar 5 '07 #5
Bob
Hi Goran,
Yep,
It is not yet robust.
But the main thing IMO is that the overall comment structure is broken down
into its
component parts and each part is addressed by a group.
Of the extra examples you gave, only the single line comment with text
failed.
This is can addressed by (Single Line Comment) | (Multline Comment)
MultiLine Comment being my original post.
My Single Line offering is (\\x2F\\x2A\\s*[\\w\\s]*\\*/)

I can't see how you can simplify it down much from this without running into
the problem you mentioned earlier, namely matching code as well as comments.

My expression fails if you have a string assignment in code that imitates a
comment e.g. string s ="/* Some text */";
So 'Not quotes' (negative look ahead)? should be put on the front of all
groups. I tried it but couldn't stop the match.
The trouble with this empirical approach is you find holes and patch them
but you can't be sure you have found all the holes.
Unless your a regex expert which I am not.
If you can come up with a simpler robust regex that picks out the comments
and leaves the 'code' I would like to see it.

The new test text now follows.
regards
Bob
/* Comment Single Line space at front abd*/
/*Comment Single Line spaceless abd*/
/* comment two
I am a plain line
So am I
* comment three
*Comment 4
*/
Code Begins
int r = 2*3;
x = 5/3;//Inline Comment fails but do we want to grab it?
y=2^6;
string s = "/* this is a failing test string*/";
/* Comment A
* comment B
* comment C
*/
/* */
/**/

"Göran Andersson" <gu***@guffa.comwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
Bob wrote:
>Hi Yoni,
Have a play with
(\\x2F\\x2A\\x20[\\w\\s]+\\s)(\\x2A\\x20*[\\w\\s]+\\s)+(\\x2A\\x2F)
Seemed to work in regexBuddy with a test text of:
/* Comment 1
* comment two
* comment three
*Comment 4
*/
Code
2*3=3;
int r = 2*3;

/* Comment A
* comment B
* comment C
*/

The regex
captures 3 different type of lines in 3 different groups.
1) Comment Begin
2) Comment body
3) Comment end.

However, while regexbuddy listed all comment lines as being matched when
I did a list of all matches.
Only the last Body statement of each comment showed up in the individual
listing for the Comment body group.
So there may be a problem.
hth
Bob

<yo**@nobhillsoft.comwrote in message
news:11**********************@p10g2000cwp.googleg roups.com...
>>Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks


I think that you are over-complicating it. Also you are assuming things
that are not at all required in a comment. There is no space required
after the start of the comment, and it doesn't have to contain lines that
begin with asterisks.

/*This is a perfectly legal comment*/
/*So
is
this*/
/*And the following too:*/
/**/

--
Göran Andersson
_____
http://www.guffa.com

Mar 5 '07 #6

"Bob" <bo*@nowhere.comwrote in message
news:eD****************@TK2MSFTNGP06.phx.gbl...
>
The new test text now follows.
regards
Bob
/* Comment Single Line space at front abd*/
/*Comment Single Line spaceless abd*/
/* comment two
I am a plain line
So am I
* comment three
*Comment 4
*/
Code Begins
int r = 2*3;
x = 5/3;//Inline Comment fails but do we want to grab it?
y=2^6;
string s = "/* this is a failing test string*/";
/* Comment A
* comment B
* comment C
*/
/* */
/**/
my regex /\*(.|\n)*?\*/ works fine for this example.
Mar 5 '07 #7
Bob
Hi Artur,
Difference in engines maybe?
Running your expression in RegexBuddy some lines are missed.
It only picks up the first two, the string assignment and the last two.
regards
Bob
"Artur Borecki" <ab******************@tenbit.plwrote in message
news:ed**************@TK2MSFTNGP05.phx.gbl...
>
"Bob" <bo*@nowhere.comwrote in message
news:eD****************@TK2MSFTNGP06.phx.gbl...
>>
The new test text now follows.
regards
Bob
/* Comment Single Line space at front abd*/
/*Comment Single Line spaceless abd*/
/* comment two
I am a plain line
So am I
* comment three
*Comment 4
*/
Code Begins
int r = 2*3;
x = 5/3;//Inline Comment fails but do we want to grab it?
y=2^6;
string s = "/* this is a failing test string*/";
/* Comment A
* comment B
* comment C
*/
/* */
/**/

my regex /\*(.|\n)*?\*/ works fine for this example.

Mar 5 '07 #8
On Mar 5, 4:38 am, "Artur Borecki" <aboreckiDONTWANTS...@tenbit.pl>
wrote:
my regex /\*(.|\n)*?\*/ works fine for this example.
Yours fails on:

sdgfsdfgsdgsd/*dfgdfgd/*sfsdf*/dgdf*/

I don't know how the others do on this (type of) example.

:)

Mar 5 '07 #9
sherifffruitfly wrote:
On Mar 5, 4:38 am, "Artur Borecki" <aboreckiDONTWANTS...@tenbit.pl>
wrote:
>my regex /\*(.|\n)*?\*/ works fine for this example.

Yours fails on:

sdgfsdfgsdgsd/*dfgdfgd/*sfsdf*/dgdf*/

I don't know how the others do on this (type of) example.

:)
It should match the first /* and the first */, doesn't it?

--
Göran Andersson
_____
http://www.guffa.com
Mar 5 '07 #10
On Mar 5, 1:23 pm, Göran Andersson <g...@guffa.comwrote:
sherifffruitfly wrote:
On Mar 5, 4:38 am, "Artur Borecki" <aboreckiDONTWANTS...@tenbit.pl>
wrote:
my regex /\*(.|\n)*?\*/ works fine for this example.
Yours fails on:
sdgfsdfgsdgsd/*dfgdfgd/*sfsdf*/dgdf*/
I don't know how the others do on this (type of) example.
:)

It should match the first /* and the first */, doesn't it?

--
Göran Andersson
_____http://www.guffa.com
Oops - yes it does - I had an incorrect concept of *failure*. I was
under the erroneous impression that the *outer* comment-delimeters
would define a comment. In fact it's the first-from-left-to-right
matching pair that constitutes a comment.

Shorter version: Nevermind.

:)

Mar 5 '07 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.