I need to create a Regex to extract all strings (including quotations) from
a C# or C++ source file. After being unsuccessful myself, I found this
sample on the internet:
@"@?""""|@?"".*?(?!\\).""|''|'.*?(?!\\).'"
I am inputting the entire source file string and using it with
RegexOptions.Singleline. This works OK with, unless the string ends with a
back-slash. For example: "This is a test\\". Can anybody see how to fix
this sample so that back-slashes are considered?
Thanks 8 2465
"Bob" <no****@nowhere.com> wrote: I need to create a Regex to extract all strings (including quotations) from a C# or C++ source file.
Well, it's not possible. You'd need a complete C# parser to extract
strings in a foolproof way. Here's one of the simpler examples that
can't be distinguished from a real string using a regex alone:
// "I am a comment but I look like a string"
Eq.
Nope, it very well is possible...
Regex regex = new
Regex(@"(/\*.*?\*/|//.*?(?=\r|\n))|(@?""""|@?"".*?(?!\\).""|''|'.*?(?!\ \).')",
RegexOptions.Singleline);
String result = codeRegex.Replace(input, new MatchEvaluator(MatchEval));
public String MatchEval(Match match)
{
if(match.Groups[1].Success) { } //comment
if(match.Groups[2].Success) { } //string literal
...
}
Back to my original question, if anybody knows why the regex isn't correctly
watching for back-slashes followed by a quotation, any input is appreciated.
"Paul E Collins" <fi******************@CL4.org> wrote in message
news:dv**********@nwrdmz02.dmz.ncs.ea.ibs-infra.bt.com... "Bob" <no****@nowhere.com> wrote:
I need to create a Regex to extract all strings (including quotations) from a C# or C++ source file.
Well, it's not possible. You'd need a complete C# parser to extract strings in a foolproof way. Here's one of the simpler examples that can't be distinguished from a real string using a regex alone:
// "I am a comment but I look like a string"
Eq.
Bob wrote: I need to create a Regex to extract all strings (including quotations) from a C# or C++ source file. After being unsuccessful myself, I found this sample on the internet:
@"@?""""|@?"".*?(?!\\).""|''|'.*?(?!\\).'"
I am inputting the entire source file string and using it with RegexOptions.Singleline. This works OK with, unless the string ends with a back-slash. For example: "This is a test\\". Can anybody see how to fix this sample so that back-slashes are considered?
Without examples of desired behaviour, here's what I came up with, using backreferences:
Regex regex = new Regex(@"(([""']).+\<2>)",
(RegexOptions) 0);
Sample input:
"This is a test\\"
This is also a test
Here's another "test"
'Now for another\\'
Using 'single quotes'
// Here 's a comment.
// And a "quoted" one.
Sample output:
Matching: "This is a test\\"
1 =»"This is a test\\"«=
2 =»"«=
Matching: This is also a test
No Match
Matching: Here's another "test"
1 =»"test"«=
2 =»"«=
Matching: 'Now for another\\'
1 =»'Now for another\\'«=
2 =»'«=
Matching: Using 'single quotes'
1 =»'single quotes'«=
2 =»'«=
Matching: // Here 's a comment.
No Match
Matching: // And a "quoted" one.
1 =»"quoted"«=
2 =»"«=
You'd want the group 1....
--
Take care,
Ken
(to reply directly, remove the cool car. <sigh>)
Your Regex works very well Ken, thanks. Can you explain what exactly the
<2> does? It looks like a grouping construct, but it isn't in the format of
(?<group>.*?). I couldn't find any reference to this at http://msdn.microsoft.com/library/en...geelements.asp.
Thanks again.
"Ken Arway" <ka****@jaguar.att.net> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl... Bob wrote: I need to create a Regex to extract all strings (including quotations) from a C# or C++ source file. After being unsuccessful myself, I found this sample on the internet:
@"@?""""|@?"".*?(?!\\).""|''|'.*?(?!\\).'"
I am inputting the entire source file string and using it with RegexOptions.Singleline. This works OK with, unless the string ends with a back-slash. For example: "This is a test\\". Can anybody see how to fix this sample so that back-slashes are considered?
Without examples of desired behaviour, here's what I came up with, using backreferences:
Regex regex = new Regex(@"(([""']).+\<2>)", (RegexOptions) 0);
Sample input: "This is a test\\" This is also a test Here's another "test" 'Now for another\\' Using 'single quotes' // Here 's a comment. // And a "quoted" one.
Sample output: Matching: "This is a test\\" 1 =»"This is a test\\"«= 2 =»"«=
Matching: This is also a test No Match
Matching: Here's another "test" 1 =»"test"«= 2 =»"«=
Matching: 'Now for another\\' 1 =»'Now for another\\'«= 2 =»'«=
Matching: Using 'single quotes' 1 =»'single quotes'«= 2 =»'«=
Matching: // Here 's a comment. No Match
Matching: // And a "quoted" one. 1 =»"quoted"«= 2 =»"«=
You'd want the group 1....
-- Take care, Ken (to reply directly, remove the cool car. <sigh>)
Also, I prepended your pattern to test for comments first:
@"(/\*.*?\*/|//.*?(?=\r|\n))|(([""']).+\<2>)"
After prefixing the commenting part, comments are picked up but your literal
string part is completely ignored. For example:
Nothing is matched (should have gotten the "C"):
String str = "extern \"C\"\r\n";
The whole line is correctly matched for a comment:
String str = "//extern \"C\"\r\n";
Strangely enough the old pattern did work in this aspect:
@"(/\*.*?\*/|//.*?(?=\r|\n))|(@?""""|@?"".*?(?!\\).""|''|'.*?(?!\ \).')"
Unfortunately it fails to correctly end literal strings ending with a
back-slash (unlike yours, which does work).
Thanks
"Bob" <no****@nowhere.com> wrote in message
news:uV**************@TK2MSFTNGP10.phx.gbl... Your Regex works very well Ken, thanks. Can you explain what exactly the <2> does? It looks like a grouping construct, but it isn't in the format of (?<group>.*?). I couldn't find any reference to this at http://msdn.microsoft.com/library/en...geelements.asp.
Thanks again.
"Ken Arway" <ka****@jaguar.att.net> wrote in message news:%2****************@TK2MSFTNGP09.phx.gbl... Bob wrote: I need to create a Regex to extract all strings (including quotations) from a C# or C++ source file. After being unsuccessful myself, I found this sample on the internet:
@"@?""""|@?"".*?(?!\\).""|''|'.*?(?!\\).'"
I am inputting the entire source file string and using it with RegexOptions.Singleline. This works OK with, unless the string ends with a back-slash. For example: "This is a test\\". Can anybody see how to fix this sample so that back-slashes are considered?
Without examples of desired behaviour, here's what I came up with, using backreferences:
Regex regex = new Regex(@"(([""']).+\<2>)", (RegexOptions) 0);
Sample input: "This is a test\\" This is also a test Here's another "test" 'Now for another\\' Using 'single quotes' // Here 's a comment. // And a "quoted" one.
Sample output: Matching: "This is a test\\" 1 =»"This is a test\\"«= 2 =»"«=
Matching: This is also a test No Match
Matching: Here's another "test" 1 =»"test"«= 2 =»"«=
Matching: 'Now for another\\' 1 =»'Now for another\\'«= 2 =»'«=
Matching: Using 'single quotes' 1 =»'single quotes'«= 2 =»'«=
Matching: // Here 's a comment. No Match
Matching: // And a "quoted" one. 1 =»"quoted"«= 2 =»"«=
You'd want the group 1....
-- Take care, Ken (to reply directly, remove the cool car. <sigh>)
I figured what it is... the <2> is a back reference to the commenting group,
and me prefixing the entire thing set the number off. I went ahead and
named it and now I have this:
@"(/\*.*?\*/|//.*?(?=\r|\n))|(@?(?<comment>[""']).+?\<comment>)"
The only problem now is that it doesn't take into account escaped quotations
and double quotations when using the @ string literal prefix in C# files.
"Bob" <no****@nowhere.com> wrote in message
news:ef**************@tk2msftngp13.phx.gbl... Also, I prepended your pattern to test for comments first:
@"(/\*.*?\*/|//.*?(?=\r|\n))|(([""']).+\<2>)"
After prefixing the commenting part, comments are picked up but your literal string part is completely ignored. For example:
Nothing is matched (should have gotten the "C"): String str = "extern \"C\"\r\n";
The whole line is correctly matched for a comment: String str = "//extern \"C\"\r\n";
Strangely enough the old pattern did work in this aspect: @"(/\*.*?\*/|//.*?(?=\r|\n))|(@?""""|@?"".*?(?!\\).""|''|'.*?(?!\ \).')"
Unfortunately it fails to correctly end literal strings ending with a back-slash (unlike yours, which does work).
Thanks
"Bob" <no****@nowhere.com> wrote in message news:uV**************@TK2MSFTNGP10.phx.gbl... Your Regex works very well Ken, thanks. Can you explain what exactly the <2> does? It looks like a grouping construct, but it isn't in the format of (?<group>.*?). I couldn't find any reference to this at http://msdn.microsoft.com/library/en...geelements.asp.
Thanks again.
"Ken Arway" <ka****@jaguar.att.net> wrote in message news:%2****************@TK2MSFTNGP09.phx.gbl... Bob wrote: I need to create a Regex to extract all strings (including quotations) from a C# or C++ source file. After being unsuccessful myself, I found this sample on the internet:
@"@?""""|@?"".*?(?!\\).""|''|'.*?(?!\\).'"
I am inputting the entire source file string and using it with RegexOptions.Singleline. This works OK with, unless the string ends with a back-slash. For example: "This is a test\\". Can anybody see how to fix this sample so that back-slashes are considered?
Without examples of desired behaviour, here's what I came up with, using backreferences:
Regex regex = new Regex(@"(([""']).+\<2>)", (RegexOptions) 0);
Sample input: "This is a test\\" This is also a test Here's another "test" 'Now for another\\' Using 'single quotes' // Here 's a comment. // And a "quoted" one.
Sample output: Matching: "This is a test\\" 1 =»"This is a test\\"«= 2 =»"«=
Matching: This is also a test No Match
Matching: Here's another "test" 1 =»"test"«= 2 =»"«=
Matching: 'Now for another\\' 1 =»'Now for another\\'«= 2 =»'«=
Matching: Using 'single quotes' 1 =»'single quotes'«= 2 =»'«=
Matching: // Here 's a comment. No Match
Matching: // And a "quoted" one. 1 =»"quoted"«= 2 =»"«=
You'd want the group 1....
-- Take care, Ken (to reply directly, remove the cool car. <sigh>)
So here is what I've gotten so far:
@"(/\*.*?\*/|//.*?(?=\r|\n))|((?:@(?<c1>[""'])(?:""""|.)*?\<c1>)|(?:(?<c2>[""'])(?:\\.|.)*?\<c2>))"
I am using non-capturing groups for a specific reason not seen here, just
ignore those.
Anyway, the first part is for comments, the second part is for literal
strings starting with @, the third part is for literal strings with
potential escape characters. Everything seems to work now exept for
supporting double-quotation marks in literal strings starting with @. For
example, this input sample:
String str = "before @\"a\"\"b\"\"c\" after \"ok\"";
Captures:
@"a"
"b"
"c"
"ok"
When it should capture:
@"a""b""c"
"ok"
I tested making the capture non-lazy, but then it captures:
@"a""b""c" after "ok"
It is like it is going to the second option instead of doing the first, even
though the first is available:
(?:""""|.).*?
If you know why this might be, please share...
Bob wrote: It is like it is going to the second option instead of doing the first, even though the first is available: (?:""""|.).*?
I'm out of ideas on this one. Probably something to do with not considering groups/patterns available for backreferencing if they're in an OR statement.
What I'd do is try to simplify the processing -- break your parsing into more than one pass to make the resulting strings more digestible. You might even find that regex isn't the best option -- string functions could wind up being more appropriate.
--
Take care,
Ken
(to reply directly, remove the cool car. <sigh>) This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: zOrg |
last post by:
hi,
i'm using the preg_match_all() function to parse an asp file and find
all include file within this file :
asp include strings can be :...
|
by: Day Of The Eagle |
last post by:
Jeff_Relf wrote:
> ...yet you don't even know what RegEx is.
>
I'm looking at the source code for mono's Regex implementation right
now. ...
|
by: clintonG |
last post by:
I'm using an .aspx tool I found at but as nice as the interface is I
think I need to consider using others. Some can generate C# I understand....
|
by: melanieab |
last post by:
Hi,
I'm trying to use DataView to find the row number in the datatable that
contains "Rich" in it so that I can highlight it. It works fine when I...
|
by: Martin Hart |
last post by:
I have a connection string that I would like to extract a part from, but
my knowledge does not extend far enough to resolve my problem.
I can...
|
by: MooMaster |
last post by:
I'm trying to develop a little script that does some string
manipulation. I have some few hundred strings that currently look like
this:
...
|
by: |
last post by:
I'm analyzing large strings and finding matches using the Regex class. I
want to find the context those matches are found in and to display excerpts...
|
by: =?Utf-8?B?QWxCcnVBbg==?= |
last post by:
I have a regular expression for capturing all occurrences of words contained
between {{ and }} in a file. My problem is I need to capture what is...
|
by: =?Utf-8?B?bWFnZ2ll?= |
last post by:
hi,
I've been working getting a file parsed out using Regex. There's something I
don't understand. When I define the pattern for my fields in my...
|
by: tammygombez |
last post by:
Hey fellow JavaFX developers,
I'm currently working on a project that involves using a ComboBox in JavaFX, and I've run into a bit of an issue....
|
by: tammygombez |
last post by:
Hey everyone!
I've been researching gaming laptops lately, and I must say, they can get pretty expensive. However, I've come across some great...
|
by: better678 |
last post by:
Question:
Discuss your understanding of the Java platform. Is the statement "Java is interpreted" correct?
Answer:
Java is an object-oriented...
|
by: teenabhardwaj |
last post by:
How would one discover a valid source for learning news, comfort, and help for engineering designs? Covering through piles of books takes a lot of...
|
by: CD Tom |
last post by:
This only shows up in access runtime. When a user select a report from my report menu when they close the report they get a menu I've called Add-ins...
|
by: jalbright99669 |
last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
|
by: antdb |
last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine
In the overall architecture, a new "hyper-convergence" concept was...
|
by: Matthew3360 |
last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
|
by: AndyPSV |
last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
| |