* Paul Wu wrote, On 26-7-2007 20:48:
I need to replace the ascII strings in VC++ source code with unicode
compatiable strings. That is I want to replace "abc" with _T("abc") excluding
the strings in #include line or already in _T("..").
I have regular expression
1. {[^_T\(]"([^"\\]*(\\.[^"\\]*)*)"} // get strings without '_T(' prefix
2. {[^include ]"([^"\\]*(\\.[^"\\]*)*)"} //get strings without 'include '
prefix.
how can I get the intersection of these two sets so matched strings can be
replace by '_T(\1)'?
To make sure #include isn't on the line use a negative look behind:
(?<!#include.*)
To make sure you're not already in a _T"..." use a look around as well:
(?<!_T")
All other strings can be replaced (don't match the newline either,
because a string can't span lines in C++ as far as I know):
"[^"\n]*"
Combine:
(?<!#include.*)(?<!_T)"[^\n"]*"
One thing you haven't looked at is an escaped ", you'll probably need to
escape those as well. That's a bit harder as \\\" is an escaped quote,
but \\\\" isn't. There isn't really a regex way for that as far as I
know. You could try:
((^|[^\\])(\\\\)*\\"
Which would lead to:
(?<!#include.*)(?<!_T)"((^|[^\\])(\\\\)*\\"|[^"\n])*"
Which does the trick.
This regex is written using the System.Text.RegularExpressions syntax
and won't work in the Visual Studio find & replace window. You could
probably write a simple commandline tool to do the trick.
Jesse