473,388 Members | 1,209 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

What's your opinion RegEx Gurus?

Hi,

I've made this regex to catch the start of a valid multiline comment such as
"/*" in e.g. T-SQL code.
"(?<=^(?:[^'\r]*'[^'\r]*')*?[^'\r]*)(?<!^(?:[^'\r]*'[^'\r]*')*?--.*)/\*.*?$"
With Multiline option on.

As we know the T-SQL single line comment starts with a "--" and the string
character is a "'".
Considering all this, from these lines below the pattern will only catch "/*
D" and "/* E" i.e. they are the only valid start of a multiline comment.
''' /* A
''-- /* B
--'' /* C
'--' /* D
'--''' /* E

Now I'm not really happy with this regex and know there should be a shorter
regex to cover this. But I don't know what else to use. This regex could
actually be shorter IF it was possible to exclude the string "--" right in
the first lookbehind expression like this... look at the 2 "(--)":
(?<=^(?:[^'\r(--)]*'[^'\r]*')*?[^'\r(--)]*)/\*.*?$
But we know that unfortunately you can only exclude single characters in a
[^.] construct, hence the regex above is not correct. (Would be great if we
could use something like [^(--)] some day to exclude "--").

Anyway, what other regex can be used to accomplish the task of this regex?
--
Thanks in advance
Ali Eghtebas Sweden
Jul 19 '05 #1
5 2106
I am not sure I fully understand what it is you want the expression to
match to as I am not familure with T-SQL but what about the following
expression:
@"(?<=^[^\-].*?)/\*.*?$"

It will match to all /* with lines that do not start with '-'.

It looks like to me you want to find all instances of /* except thoes that
exist on lines that begin with '-' and I think this expression works with
this.

The expressionshould not match to the following line:
-should NOT match /* foo

And would match to this line:
should match /* foo

Let me know if this works for you or I misunderstood what you wanted.

Ryan Byington [MS]

This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm
--------------------
| From: "Ali Eghtebas" <al***@home.se>
| Subject: What's your opinion RegEx Gurus?
| Date: Sat, 23 Aug 2003 18:02:43 +0200
| Lines: 32
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
| Message-ID: <#F**************@TK2MSFTNGP12.phx.gbl>
| Newsgroups:
microsoft.public.dotnet.framework,microsoft.public .dotnet.general,microsoft.
public.dotnet.languages.vb
| NNTP-Posting-Host: as19-6-8.mt.g.bonet.se 194.236.117.158
| Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTN GP12.phx.gbl
| Xref: cpmsftngxa06.phx.gbl microsoft.public.dotnet.general:105778
microsoft.public.dotnet.languages.vb:129908
microsoft.public.dotnet.framework:52027
| X-Tomcat-NG: microsoft.public.dotnet.general
|
| Hi,
|
| I've made this regex to catch the start of a valid multiline comment such
as
| "/*" in e.g. T-SQL code.
|
"(?<=^(?:[^'\r]*'[^'\r]*')*?[^'\r]*)(?<!^(?:[^'\r]*'[^'\r]*')*?--.*)/\*.*?$"
| With Multiline option on.
|
| As we know the T-SQL single line comment starts with a "--" and the string
| character is a "'".
| Considering all this, from these lines below the pattern will only catch
"/*
| D" and "/* E" i.e. they are the only valid start of a multiline comment.
| ''' /* A
| ''-- /* B
| --'' /* C
| '--' /* D
| '--''' /* E
|
| Now I'm not really happy with this regex and know there should be a
shorter
| regex to cover this. But I don't know what else to use. This regex could
| actually be shorter IF it was possible to exclude the string "--" right in
| the first lookbehind expression like this... look at the 2 "(--)":
| (?<=^(?:[^'\r(--)]*'[^'\r]*')*?[^'\r(--)]*)/\*.*?$
| But we know that unfortunately you can only exclude single characters in a
| [^.] construct, hence the regex above is not correct. (Would be great if
we
| could use something like [^(--)] some day to exclude "--").
|
| Anyway, what other regex can be used to accomplish the task of this regex?
| --
| Thanks in advance
| Ali Eghtebas Sweden
|
|
|

Jul 21 '05 #2
Thank you Ryan for replying as you are the only one who replied to this post
so far.
However I must say that you've misunderstood what I wanted.
The expression you've provided will only NOT match, matchable strings
stating with a -
While what I want is to match any /* that is not within a commented line
by -- since
-- is the start of a commented line and not necessarily right at the start
of the line. E.g.:
-- This is a comment.
code code code -- This is a comment.
Furthor more /* should not be matched if it between two string identifier
characters
which is the character ' in T-SQL e.g. ' This is just a string expression.'.
'/*' <= not a valid match
'' /* <= is a valid match since /* is not within the string expression.
And there is more... if the line comment character -- is itself within ''
then it is just part of a string expression and not a start of a comment.
All this can be understood by the following examples:
''' /* A <= Not a match since the third ' is the start of a string
expression, hence /* A is assumed to be within a string.
''-- /* B <= Not a match since /* B is in a commented line by --
--'' /* C <= Not a match since /* C is in a commented line by --
'--' /* D <= A match since /* D is neither wihin a string expression
nor commented by -- since the -- is itself within a string expression
between ''.
'--''' /* E <= A match since /* E is neither wihin a string
expression nor commented by -- since the -- is itself within a string
expression between ''. And the last '' is just an empty string.
'--'' /* F <= Not a match since the last ' is the start of a string
expression, hence /* F is assumed to be within a string.

Using the expression below will only catch /* D and /* E.
"(?<=^(?:[^'\r]*'[^'\r]*')*?[^'\r]*)(?<!^(?:[^'\r]*'[^'\r]*')*?--.*)/\*.*?$"
With Multiline option on.

--
Thanks in advance
Ali Eghtebas Sweden
Jul 21 '05 #3
Thanks for all of the information I think I might have an expression that
will work for you. It at least works for all of the examples you provided.
There may be some problems if new lines are allowed in string identifiers.

@"^((?>[^'\-]*?'.*?')*)([^'\-]*)/\*"

This should eat up all character starting the beging of line that are not
either ' or - unless there is a matching ' or - appers inside matching '.

I tested this with version 1.1 of the framework.

Again let me know if this does not work for you.

Thanks,

Ryan Byington [MS]
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

--------------------
| From: "Ali Eghtebas" <al***@home.se>
| References: <#F**************@TK2MSFTNGP12.phx.gbl>
<N6**************@cpmsftngxa06.phx.gbl>
| Subject: Re: What's your opinion RegEx Gurus?
| Date: Thu, 11 Sep 2003 11:38:35 +0200
| Lines: 41
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
| Message-ID: <u$**************@TK2MSFTNGP12.phx.gbl>
| Newsgroups: microsoft.public.dotnet.general
| NNTP-Posting-Host: as19-6-8.mt.g.bonet.se 194.236.117.158
| Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTN GP12.phx.gbl
| Xref: cpmsftngxa06.phx.gbl microsoft.public.dotnet.general:108113
| X-Tomcat-NG: microsoft.public.dotnet.general
|
| Thank you Ryan for replying as you are the only one who replied to this
post
| so far.
| However I must say that you've misunderstood what I wanted.
| The expression you've provided will only NOT match, matchable strings
| stating with a -
| While what I want is to match any /* that is not within a commented line
| by -- since
| -- is the start of a commented line and not necessarily right at the start
| of the line. E.g.:
| -- This is a comment.
| code code code -- This is a comment.
| Furthor more /* should not be matched if it between two string identifier
| characters
| which is the character ' in T-SQL e.g. ' This is just a string
expression.'.
| '/*' <= not a valid match
| '' /* <= is a valid match since /* is not within the string
expression.
| And there is more... if the line comment character -- is itself within ''
| then it is just part of a string expression and not a start of a comment.
| All this can be understood by the following examples:
| ''' /* A <= Not a match since the third ' is the start of a string
| expression, hence /* A is assumed to be within a string.
| ''-- /* B <= Not a match since /* B is in a commented line by --
| --'' /* C <= Not a match since /* C is in a commented line by --
| '--' /* D <= A match since /* D is neither wihin a string
expression
| nor commented by -- since the -- is itself within a string expression
| between ''.
| '--''' /* E <= A match since /* E is neither wihin a string
| expression nor commented by -- since the -- is itself within a string
| expression between ''. And the last '' is just an empty string.
| '--'' /* F <= Not a match since the last ' is the start of a string
| expression, hence /* F is assumed to be within a string.
|
| Using the expression below will only catch /* D and /* E.
|
"(?<=^(?:[^'\r]*'[^'\r]*')*?[^'\r]*)(?<!^(?:[^'\r]*'[^'\r]*')*?--.*)/\*.*?$"
| With Multiline option on.
|
| --
| Thanks in advance
| Ali Eghtebas Sweden
|
|
|

Jul 21 '05 #4
Your pattern is interesting and I've modified it as you see to do what
I want, however you have missed an important detail which is the
reason why I posted my question here in the first place.
That is to say the single line comment characterS is --, double - and
not a single -.
To demonstrate it with examples again (note: all the lines together
make the input string on witch the pattern is applied)
''' /* A
''-- /* B
--'' /* C
'' /* D
'--''' /* E
- /* F <= Should be matched too.
by your modified pattern: "(?<=^(?>[^'\-]*?'.*?')*?(?:[^'\-]*))/\*.*?$"
with Multiline option on.
This will find 2 matches '/* D' at index 35 and '/* E' at index 48 with a
carriage return at the end of each match.
But '/* F' at index 56 should also be matched since a single - is not a
valid single line comment marker. Now if you read my first post you'll
see what mean by this.
--
Regards
Ali Eghtebas Sweden
Jul 21 '05 #5
Try this

Expand|Select|Wrap|Line Numbers
  1. (\/\*(\s*|.*)*\*\/)|(--.*) 

Got it from
http://www.dotnetslackers.com/Patter...rocedures.aspx
Jul 25 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Justin F | last post by:
I need to parse a string, which contains SQL commands, for the batch terminator ("GO"). I came up with "\s\s" which seems to work as long as there isn't a "GO" in any comments. I have no idea how...
7
by: bill tie | last post by:
I'd appreciate it if you could advise. 1. How do I replace "\" (backslash) with anything? 2. Suppose I want to replace (a) every occurrence of characters "a", "b", "c", "d" with "x", (b)...
6
by: Nurchi BECHED | last post by:
I have a filename and its process id in brackets. The problem is, the filename can contain brackets and numbers in it, but the last number in the brackets is always the process id. Now, assume,...
7
by: Chris Thunell | last post by:
I'm looking to find in a long string an instance of 4 numbers in a row, and pull out those numbers. For instance: string = "0104 PBR", i'd like to get the 0104. string="PBR XT 0105 TD", i'd like...
4
by: Ali Eghtebas | last post by:
Hi, I've made this regex to catch the start of a valid multiline comment such as "/*" in e.g. T-SQL code. "(?<=^(?:*'*')*?*)(?<!^(?:*'*')*?--.*)/\*.*?$" With Multiline option on. As we know...
5
by: Petra Meier | last post by:
Hello, I use the following script to parse URI and email: function parseLinks($sData){ $regexEmail = "/\w+((-\w+)|(\.\w+))*\@+((\.|-)+)*\.+/"; $sData = preg_replace($regexEmail, "<a...
16
by: Mark Chambers | last post by:
Hi there, I'm seeking opinions on the use of regular expression searching. Is there general consensus on whether it's now a best practice to rely on this rather than rolling your own (string)...
2
by: slg | last post by:
Gurus, I am new to RegEx. How can i validate following. All characters in my strings are and underscore The string MUST begin with Upper Or lowercase character. Maximum length is 51 can have...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.