By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,041 Members | 1,858 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,041 IT Pros & Developers. It's quick & easy.

What's your opinion RegEx Gurus?

P: n/a
Hi,

I've made this regex to catch the start of a valid multiline comment such as
"/*" in e.g. T-SQL code.
"(?<=^(?:[^'\r]*'[^'\r]*')*?[^'\r]*)(?<!^(?:[^'\r]*'[^'\r]*')*?--.*)/\*.*?$"
With Multiline option on.

As we know the T-SQL single line comment starts with a "--" and the string
character is a "'".
Considering all this, from these lines below the pattern will only catch "/*
D" and "/* E" i.e. they are the only valid start of a multiline comment.
''' /* A
''-- /* B
--'' /* C
'--' /* D
'--''' /* E

Now I'm not really happy with this regex and know there should be a shorter
regex to cover this. But I don't know what else to use. This regex could
actually be shorter IF it was possible to exclude the string "--" right in
the first lookbehind expression like this... look at the 2 "(--)":
(?<=^(?:[^'\r(--)]*'[^'\r]*')*?[^'\r(--)]*)/\*.*?$
But we know that unfortunately you can only exclude single characters in a
[^.] construct, hence the regex above is not correct. (Would be great if we
could use something like [^(--)] some day to exclude "--").

Anyway, what other regex can be used to accomplish the task of this regex?
--
Thanks in advance
Ali Eghtebas Sweden
Jul 19 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
I am not sure I fully understand what it is you want the expression to
match to as I am not familure with T-SQL but what about the following
expression:
@"(?<=^[^\-].*?)/\*.*?$"

It will match to all /* with lines that do not start with '-'.

It looks like to me you want to find all instances of /* except thoes that
exist on lines that begin with '-' and I think this expression works with
this.

The expressionshould not match to the following line:
-should NOT match /* foo

And would match to this line:
should match /* foo

Let me know if this works for you or I misunderstood what you wanted.

Ryan Byington [MS]

This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm
--------------------
| From: "Ali Eghtebas" <al***@home.se>
| Subject: What's your opinion RegEx Gurus?
| Date: Sat, 23 Aug 2003 18:02:43 +0200
| Lines: 32
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
| Message-ID: <#F**************@TK2MSFTNGP12.phx.gbl>
| Newsgroups:
microsoft.public.dotnet.framework,microsoft.public .dotnet.general,microsoft.
public.dotnet.languages.vb
| NNTP-Posting-Host: as19-6-8.mt.g.bonet.se 194.236.117.158
| Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTN GP12.phx.gbl
| Xref: cpmsftngxa06.phx.gbl microsoft.public.dotnet.general:105778
microsoft.public.dotnet.languages.vb:129908
microsoft.public.dotnet.framework:52027
| X-Tomcat-NG: microsoft.public.dotnet.general
|
| Hi,
|
| I've made this regex to catch the start of a valid multiline comment such
as
| "/*" in e.g. T-SQL code.
|
"(?<=^(?:[^'\r]*'[^'\r]*')*?[^'\r]*)(?<!^(?:[^'\r]*'[^'\r]*')*?--.*)/\*.*?$"
| With Multiline option on.
|
| As we know the T-SQL single line comment starts with a "--" and the string
| character is a "'".
| Considering all this, from these lines below the pattern will only catch
"/*
| D" and "/* E" i.e. they are the only valid start of a multiline comment.
| ''' /* A
| ''-- /* B
| --'' /* C
| '--' /* D
| '--''' /* E
|
| Now I'm not really happy with this regex and know there should be a
shorter
| regex to cover this. But I don't know what else to use. This regex could
| actually be shorter IF it was possible to exclude the string "--" right in
| the first lookbehind expression like this... look at the 2 "(--)":
| (?<=^(?:[^'\r(--)]*'[^'\r]*')*?[^'\r(--)]*)/\*.*?$
| But we know that unfortunately you can only exclude single characters in a
| [^.] construct, hence the regex above is not correct. (Would be great if
we
| could use something like [^(--)] some day to exclude "--").
|
| Anyway, what other regex can be used to accomplish the task of this regex?
| --
| Thanks in advance
| Ali Eghtebas Sweden
|
|
|

Jul 21 '05 #2

P: n/a
Thank you Ryan for replying as you are the only one who replied to this post
so far.
However I must say that you've misunderstood what I wanted.
The expression you've provided will only NOT match, matchable strings
stating with a -
While what I want is to match any /* that is not within a commented line
by -- since
-- is the start of a commented line and not necessarily right at the start
of the line. E.g.:
-- This is a comment.
code code code -- This is a comment.
Furthor more /* should not be matched if it between two string identifier
characters
which is the character ' in T-SQL e.g. ' This is just a string expression.'.
'/*' <= not a valid match
'' /* <= is a valid match since /* is not within the string expression.
And there is more... if the line comment character -- is itself within ''
then it is just part of a string expression and not a start of a comment.
All this can be understood by the following examples:
''' /* A <= Not a match since the third ' is the start of a string
expression, hence /* A is assumed to be within a string.
''-- /* B <= Not a match since /* B is in a commented line by --
--'' /* C <= Not a match since /* C is in a commented line by --
'--' /* D <= A match since /* D is neither wihin a string expression
nor commented by -- since the -- is itself within a string expression
between ''.
'--''' /* E <= A match since /* E is neither wihin a string
expression nor commented by -- since the -- is itself within a string
expression between ''. And the last '' is just an empty string.
'--'' /* F <= Not a match since the last ' is the start of a string
expression, hence /* F is assumed to be within a string.

Using the expression below will only catch /* D and /* E.
"(?<=^(?:[^'\r]*'[^'\r]*')*?[^'\r]*)(?<!^(?:[^'\r]*'[^'\r]*')*?--.*)/\*.*?$"
With Multiline option on.

--
Thanks in advance
Ali Eghtebas Sweden
Jul 21 '05 #3

P: n/a
Thanks for all of the information I think I might have an expression that
will work for you. It at least works for all of the examples you provided.
There may be some problems if new lines are allowed in string identifiers.

@"^((?>[^'\-]*?'.*?')*)([^'\-]*)/\*"

This should eat up all character starting the beging of line that are not
either ' or - unless there is a matching ' or - appers inside matching '.

I tested this with version 1.1 of the framework.

Again let me know if this does not work for you.

Thanks,

Ryan Byington [MS]
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

--------------------
| From: "Ali Eghtebas" <al***@home.se>
| References: <#F**************@TK2MSFTNGP12.phx.gbl>
<N6**************@cpmsftngxa06.phx.gbl>
| Subject: Re: What's your opinion RegEx Gurus?
| Date: Thu, 11 Sep 2003 11:38:35 +0200
| Lines: 41
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
| Message-ID: <u$**************@TK2MSFTNGP12.phx.gbl>
| Newsgroups: microsoft.public.dotnet.general
| NNTP-Posting-Host: as19-6-8.mt.g.bonet.se 194.236.117.158
| Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTN GP12.phx.gbl
| Xref: cpmsftngxa06.phx.gbl microsoft.public.dotnet.general:108113
| X-Tomcat-NG: microsoft.public.dotnet.general
|
| Thank you Ryan for replying as you are the only one who replied to this
post
| so far.
| However I must say that you've misunderstood what I wanted.
| The expression you've provided will only NOT match, matchable strings
| stating with a -
| While what I want is to match any /* that is not within a commented line
| by -- since
| -- is the start of a commented line and not necessarily right at the start
| of the line. E.g.:
| -- This is a comment.
| code code code -- This is a comment.
| Furthor more /* should not be matched if it between two string identifier
| characters
| which is the character ' in T-SQL e.g. ' This is just a string
expression.'.
| '/*' <= not a valid match
| '' /* <= is a valid match since /* is not within the string
expression.
| And there is more... if the line comment character -- is itself within ''
| then it is just part of a string expression and not a start of a comment.
| All this can be understood by the following examples:
| ''' /* A <= Not a match since the third ' is the start of a string
| expression, hence /* A is assumed to be within a string.
| ''-- /* B <= Not a match since /* B is in a commented line by --
| --'' /* C <= Not a match since /* C is in a commented line by --
| '--' /* D <= A match since /* D is neither wihin a string
expression
| nor commented by -- since the -- is itself within a string expression
| between ''.
| '--''' /* E <= A match since /* E is neither wihin a string
| expression nor commented by -- since the -- is itself within a string
| expression between ''. And the last '' is just an empty string.
| '--'' /* F <= Not a match since the last ' is the start of a string
| expression, hence /* F is assumed to be within a string.
|
| Using the expression below will only catch /* D and /* E.
|
"(?<=^(?:[^'\r]*'[^'\r]*')*?[^'\r]*)(?<!^(?:[^'\r]*'[^'\r]*')*?--.*)/\*.*?$"
| With Multiline option on.
|
| --
| Thanks in advance
| Ali Eghtebas Sweden
|
|
|

Jul 21 '05 #4

P: n/a
Your pattern is interesting and I've modified it as you see to do what
I want, however you have missed an important detail which is the
reason why I posted my question here in the first place.
That is to say the single line comment characterS is --, double - and
not a single -.
To demonstrate it with examples again (note: all the lines together
make the input string on witch the pattern is applied)
''' /* A
''-- /* B
--'' /* C
'' /* D
'--''' /* E
- /* F <= Should be matched too.
by your modified pattern: "(?<=^(?>[^'\-]*?'.*?')*?(?:[^'\-]*))/\*.*?$"
with Multiline option on.
This will find 2 matches '/* D' at index 35 and '/* E' at index 48 with a
carriage return at the end of each match.
But '/* F' at index 56 should also be matched since a single - is not a
valid single line comment marker. Now if you read my first post you'll
see what mean by this.
--
Regards
Ali Eghtebas Sweden
Jul 21 '05 #5

P: 1
Try this

Expand|Select|Wrap|Line Numbers
  1. (\/\*(\s*|.*)*\*\/)|(--.*) 

Got it from
http://www.dotnetslackers.com/Patter...rocedures.aspx
Jul 25 '06 #6

This discussion thread is closed

Replies have been disabled for this discussion.