Hi James,
I don't believe that Regular Expressions alone are your solution here. First
of all, you haven't thought about all of the possible combinations of values
that you might need to capture. For example, you example indicates that all
text columns will be used, and therefore, all values will be surrounded with
single quotes. But what if the column is a numeric or datetime type? There
will be no single quotes. In this case, both the values in the first set of
parentheses, the column names, will match as well as the values in the
VALUES clause. In fact, it is necessary to discriminate between values which
are contained within parentheses and values that are not. In addition, there
arequite a few possible combinations of characters that would have to be
accounted for only withing parentheses, and using capturing groups would
reduce the number of items in each match that were in any given group to 1.
Here are some of the possible combinations you would have to deal with:
('value'
( 'value'
( value
(value
,'value'
, 'value'
,value
, value
'value')
'value' )
value)
value )
,value)
, value)
, value )
,'value')
, 'value')
,'value' )
, 'value' )
And you would need to eliminate the extraneous punctuation, which would
necessitate using grouping or positive and negative lookarounds. Note that
half of these examples would match for the first set of parentheses, the
column names. And don't forget the syntax rules of SQL, which indicate that
a text value may contain single quotes, but that they must be escaped by
doubling, and it may also contain commas. A numeric or datetime value may
not contain either commas or single quotes (unless it is in european format,
in which case you would have to allow for commas).
Now, a kluge could be worked out by using a large number of patterns ORed
together with no grouping, but you're getting into territory there which
would cause the use of a single regular expression to be much less efficient
than desired.
Instead, it would be wiser to simply
(1) Isolate the VALUES clause. THIS could be done with a regular expression:
(?i)(?<=VALUES\s*\()(?:(?:[^)]*)|(?<=')(?:(?:''|[^']|\))*)(?=')(?=\)))
This basically covers all the SQL rules. It states that to match
(case-insensitive), the string "VALUES" followed by one or more spaces and a
left parenthesis must precede the match (positive look-behind), that the
match must be followed by a single right parenthesis, and that the match may
contain any combination of either (a)no right parentheses, or (b) a single
quote followed by either (i) any character that is either a doubled single
quote (escaped single quote in SQL statement) or (ii) not a single quote, or
(iii) a right parenthesis, and that any of these three possible combinations
may be repeated 0 or more times. It asserts that this is followed by a
single quote, and that that entire match is followed by a single right
parenthesis. This is because a text value may contain single quotes that are
escaped (by doubling them), and may contain a right parenthesis, but is
always surrounded by single quotes. Numeric and DateTime values will not be
surrounded by single quotes, and will not contain parentheses or single
quotes.
In your example, this would result in:
'valone ','valtwo' , 'valthree'
(2) Get the matches from the resulting match string using:
(?i)(?:(?:(?<=')(?:''|[^',])*(?='))|(?<!')[^,']+(?!'))*
This again, covers all the possible SQL rules. It states that a match
consists of either (1) a value that must be preceded by a single quote and
followed by a single quote, having 0 or more sequences of either (a) a
doubled (escaped) single quote, or (b)a non-single-quote or comma, or (2)
any value that is preceded by a single quote, is not a single quote, and is
followed by a single quote (for numeric and datetime values).
--
HTH,
Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist
Sequence, Selection, Iteration.
"pigeonrandle" <pi**********@hotmail.comwrote in message
news:11**********************@h48g2000cwc.googlegr oups.com...
Jesse,
Thankyou for replying. Unfortunately, i cannot seem to get it to work
.... here is the code i'm using:
private void button1_Click(object sender, System.EventArgs e)
{
String sSQL = @"INSERT INTO tblTest (colone, coltwo, colthree) VALUES
('valone ','valtwo' , 'valthree')";
String sRX =
@"values\s*\(\s*(?:'(?<value>(?:[^']|'')+)'\s*,\s*)*'\s*(?<value>(?:[^']|'')+*)'\s*\)";
Regex rex = new Regex(sRX);
foreach(Match m in rex.Matches(sSQL))
{
MessageBox.Show(m.Value);
}
}
Thanks again,
James.
Jesse Houwing wrote:
* pigeonrandle wrote, On 24-7-2006 21:43:
Hello,
Does anyone know what RegEx i can use to extract the 'values' from an
INSERT INTO ... VALUES ('one','two','three').
This is obviously a simple example because a value might have a -' <-
in it.
I'm not sure how to get RegEx to ignore things...
Cheers,
James Randle.
This should about do it:
values\s*\(\s*(?:'(?<value>(?:[^']|'')+)'\s*,\s*)*'\s*(?<value>(?:[^']|'')+)'\s*\)
It puts all values in the 'value' named group.
You might have to do some extra work to get around newlines...
Jesse