473,568 Members | 2,850 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

my head is spinning with regex

First, if MSFT is listening I'll say IMO the MSDN material is sorely lacking
in this area... it's just a whole bunch of information thrown at you and
you're left to yourself as to organizing it in your head. Typical learning
starts with basics and progresses through increasingly complex information -
I think given the inherent confusion-inducing ability of regex that kind of
documentation would be very valuable.

But anyway, I'm trying to write a regex that will parse a line of code from
a .cs file into the code and comments portions if there is a // somewhere in
the line. I realize this may need to be more complex down the road to
handle special occurrences of // other than in a comment (like in a string
literal), but I'm trying to start with the basics. So I have...

Regex regex = new Regex(@"(?<code >.+)//(?<comments>.+) ",
RegexOptions.Co mpiled);
regexMatch = regex.Match(ori ginal);
if (regexMatch.Suc cess)
{
code = regexMatch.Resu lt("${code}");
comments = regexMatch.Resu lt("${comments} ");
}

This works fine on
// A basic comment line
or
a = b; // code line with comment afterward

but on the line
/// <summary>

I end up with
code = "/"
comment = " <summary>"

I'm not understanding why the "//" in my regex seems to match the "last"
occurrence of the pattern and "skips" the match on the first two slashes of
the three. I thought by definition the first occurrence would be matched.
Indeed, if my original line is "//// something" I end up with "//" and "
something".

Who can clarify this for me? And who can point me to a *good* resource for
regex edumucation?
Nov 15 '05 #1
2 1857
Try this instead:

(?<code>.*?)//(?<comments>.*)
First of all, changing the +s to *s will allow the regex to match even if
there are no characters before/after the "//". Also, adding the "?" after
the code portion will allow it to match the first occurrence of "//" as
opposed to the last. The ".*" is greedy, so it will consume all that it can
and only give up characters as it needs to in order to match the rest of the
expression. Using the ".*?" makes it lazy, so that it matches only what it
must match in order to continue matching the rest of the expression.
Brian Davis
www.knowdotnet.com


"Daniel Billingsley" <db**********@N O.durcon.SPAAMM .com> wrote in message
news:OA******** ********@tk2msf tngp13.phx.gbl. ..
First, if MSFT is listening I'll say IMO the MSDN material is sorely lacking in this area... it's just a whole bunch of information thrown at you and
you're left to yourself as to organizing it in your head. Typical learning starts with basics and progresses through increasingly complex information - I think given the inherent confusion-inducing ability of regex that kind of documentation would be very valuable.

But anyway, I'm trying to write a regex that will parse a line of code from a .cs file into the code and comments portions if there is a // somewhere in the line. I realize this may need to be more complex down the road to
handle special occurrences of // other than in a comment (like in a string
literal), but I'm trying to start with the basics. So I have...

Regex regex = new Regex(@"(?<code >.+)//(?<comments>.+) ",
RegexOptions.Co mpiled);
regexMatch = regex.Match(ori ginal);
if (regexMatch.Suc cess)
{
code = regexMatch.Resu lt("${code}");
comments = regexMatch.Resu lt("${comments} ");
}

This works fine on
// A basic comment line
or
a = b; // code line with comment afterward

but on the line
/// <summary>

I end up with
code = "/"
comment = " <summary>"

I'm not understanding why the "//" in my regex seems to match the "last"
occurrence of the pattern and "skips" the match on the first two slashes of the three. I thought by definition the first occurrence would be matched.
Indeed, if my original line is "//// something" I end up with "//" and "
something".

Who can clarify this for me? And who can point me to a *good* resource for regex edumucation?

Nov 15 '05 #2
Thanks Brian, the *? did it. My error there seems like a DOH! moment now.

Also, I found a cool tool that let's you experiment with regex realtime
http://www.weitz.de/regex-coach/

It has a "step" capability that let's you watch the regex work it's magic
step by step. Indeed, just like you describe, stepping through my original
..+ expression the first .+ gobbles up the whole input string (greedy) and
then only gives back what is necessary to match the //, starting at the END
of the string (so it can give up as little as possible). Hence it "gave up"
the // at the end of /// and not the beginning. Using the *? it starts at
the beginning of the input string and only eats up characters until it
reaches the // match, which is of course what I wanted. Very interesting to
see this behavior played out step by step.
"Brian Davis" <@> wrote in message
news:O7******** ********@TK2MSF TNGP12.phx.gbl. ..
Try this instead:

(?<code>.*?)//(?<comments>.*)
First of all, changing the +s to *s will allow the regex to match even if
there are no characters before/after the "//". Also, adding the "?" after
the code portion will allow it to match the first occurrence of "//" as
opposed to the last. The ".*" is greedy, so it will consume all that it can and only give up characters as it needs to in order to match the rest of the expression. Using the ".*?" makes it lazy, so that it matches only what it must match in order to continue matching the rest of the expression.
Brian Davis
www.knowdotnet.com


"Daniel Billingsley" <db**********@N O.durcon.SPAAMM .com> wrote in message
news:OA******** ********@tk2msf tngp13.phx.gbl. ..
First, if MSFT is listening I'll say IMO the MSDN material is sorely lacking
in this area... it's just a whole bunch of information thrown at you and
you're left to yourself as to organizing it in your head. Typical

learning
starts with basics and progresses through increasingly complex

information -
I think given the inherent confusion-inducing ability of regex that kind

of
documentation would be very valuable.

But anyway, I'm trying to write a regex that will parse a line of code

from
a .cs file into the code and comments portions if there is a // somewhere in
the line. I realize this may need to be more complex down the road to
handle special occurrences of // other than in a comment (like in a

string literal), but I'm trying to start with the basics. So I have...

Regex regex = new Regex(@"(?<code >.+)//(?<comments>.+) ",
RegexOptions.Co mpiled);
regexMatch = regex.Match(ori ginal);
if (regexMatch.Suc cess)
{
code = regexMatch.Resu lt("${code}");
comments = regexMatch.Resu lt("${comments} ");
}

This works fine on
// A basic comment line
or
a = b; // code line with comment afterward

but on the line
/// <summary>

I end up with
code = "/"
comment = " <summary>"

I'm not understanding why the "//" in my regex seems to match the "last"
occurrence of the pattern and "skips" the match on the first two slashes

of
the three. I thought by definition the first occurrence would be matched. Indeed, if my original line is "//// something" I end up with "//" and "
something".

Who can clarify this for me? And who can point me to a *good* resource

for
regex edumucation?


Nov 15 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
3048
by: benz | last post by:
I am trying to fork and exec a child by python. Additionally, I am attempting to have a spinning slash while the child is running. My code is as below: import sys, os, time def spin(delay):
9
4565
by: Tim Conner | last post by:
Is there a way to write a faster function ? public static bool IsNumber( char Value ) { if (Regex.IsMatch( Value.ToString(), @"^+$" )) { return true; } else return false; }
20
8054
by: jeevankodali | last post by:
Hi I have an .Net application which processes thousands of Xml nodes each day and for each node I am using around 30-40 Regex matches to see if they satisfy some conditions are not. These Regex matches are called within a loop (like if or for). E.g. for(int i = 0; i < 10; i++) { Regex r = new Regex();
7
2038
by: | last post by:
What is the beat way to dynamically write/add to the HEAD tag of an ASPX page (the <head runat=server ... is too error prone and not very repeatable)? Thanks.
6
2490
by: Extremest | last post by:
I have a huge regex setup going on. If I don't do each one by itself instead of all in one it won't work for. Also would like to know if there is a faster way tried to use string.replace with all the right parts in there in one big line and for some reason that did not work either. Here is my regex's. static Regex rar = new...
7
2571
by: Extremest | last post by:
I am using this regex. static Regex paranthesis = new Regex("(\\d*/\\d*)", RegexOptions.IgnoreCase); it should find everything between parenthesis that have some numbers onyl then a forward slash then some numbers. For some reason I am not getting that. It won't work at all in 2.0
3
2692
by: aspineux | last post by:
My goal is to write a parser for these imaginary string from the SMTP protocol, regarding RFC 821 and 1869. I'm a little flexible with the BNF from these RFC :-) Any comment ? tests= def RN(name, regex): """protect using () and give an optional name to a regex""" if name:
15
50176
by: morleyc | last post by:
Hi, i would like to remove a number of characters from my string (\t \r \n which are throughout the string), i know regex can do this but i have no idea how. Any pointers much appreciated. Chris
3
1365
by: elyfrank | last post by:
Hi guys, Anybody knows how to get rid of Item remaining at the bottom of the page and the Internet explorer tab spinning at the top? I can't find the problem. These are the two pages I have the problems. http://flyawaytrip.com/index.htm?page=_portfolio.htm http://flyawaytrip.com/index.htm?page=_home.htm Thank you for your help.
0
7693
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7604
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7916
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8117
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
7962
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5498
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5217
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3631
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1207
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.