I couldn't help but bite on this one. It is a very challenging problem. Here
is your solution:
(?i)(?:(?<funct ion>Write|Read) \s*\()\s*|(?<=( ?:(?:Write|Read )\s*\(\s*)|(?:( ?:[\d\w]+\s*,\s*)))(?<p arameter>[\d\w]+)(?=,\s*|\s*\) )
Let me break it down a bit. First, I used (?i) to indicate that it is
non-case-sensitive.
Next, I had the problem of identifying *both* function names and parameters
in the same Regular Expression.
The function name Regular Expression is:
(?:(?<function> Write|Read)\s*\ (\s*)
"function" is the name of the capturing group, which captures only the
function name. The rest of the match is to identify it as a function.
It will match only if the function name is "Read" or "Write" and is followed
by an opening parenthesis. I assumed that any token may have any number of
white-space characters before and after it. This was not too tricky.
The second one is a bit trickier:
(?<=(?:(?:Write |Read)\s*\(\s*) |(?:(?:[\d\w]+\s*,\s*)))(?<p arameter>[\d\w]+)(?=,\s*|\s*\) )
The trick here is to identify a parameter from inside a set of function
parameters.
The rules break down as:
1. A parameter is always preceded by a function name followed by an open
parenthesis, as in:
Write (
2. It may be preceded by another parameter followed by a comma.
Write(param1,
- or -
Write(.......pa ram3,
3. It is always followed by either a comma or an end-parenthesis.
param1,
- or -
param2 )
So, starting with the third rule, we get:
(?<parameter>[\d\w]+)(?=,\s*|\s*\) )
"parameter" is the name of the capturing group, which according to these
rules is an alphanumeric token. The rest of it is how the parameter is
matched. It is a positive look-ahead, which means that it *must* be followed
by either a comma or an end parenthesis.
However, the problem here is that *any* word in the string that is not a
function and is followed by a comma or an end parenthesis will match this,
as in:
Read( 0x55, 5 ) <- Write one byte, to (address 0x55)
In this line, "byte," and "(address 0x55)" will match.
So, how do we eliminate non-parameters? Well, obviously, a parameter is
defined as being inside the parentheses of a function call. So, first, use a
positive look-behind to see if it is preceded by a function call. We need to
identify the function, using the same syntax as before:
(?:(?:Write|Rea d)\s*\(\s*)
However, it may have a parameter before it, instead of the function call. So
we use an OR "|" operator to indicate that it may be preceded by:
(?:(?:[\d\w]+\s*,\s*))
Note that we have changed the rule slightly. Any parameter which precedes
another parameter will *not* be followed by an end-parenthesis. It will
*always* be followed by a comma.
So, we use the Positive Lookbehind syntax (?>=) coupled with an OR operator
("|"), and get:
(?<=(?:(?:Write |Read)\s*\(\s*) |(?:(?:[\d\w]+\s*,\s*)))(?<p arameter>[\d\w]+)(?=,\s*|\s*\) )
Translated: Match any alphanumeric set of tokens which is followed by either
a comma or an end parenthesis, and is preceded either by a function call or
by another parameter.
Now to put them together, we use the OR operator:
(?i)(?:(?<funct ion>Write|Read) \s*\()\s*|(?<=( ?:(?:Write|Read )\s*\(\s*)|(?:( ?:[\d\w]+\s*,\s*)))(?<p arameter>[\d\w]+)(?=,\s*|\s*\) )
The function name will be captured into the "function" group, and all of the
parameters will be captured into the "parameter" group. This could be stated
as:
Match any token that is either "Read" or "Write" followed by an open
parenthesis, and call it "function," OR Match any alphanumeric set of tokens
which is followed by either a comma or an end parenthesis, and is preceded
either by a function call or by another parameter, and call it "parameter. "
You sure picked a doozy to start out with!
--
HTH,
Kevin Spencer
Microsoft MVP
Professional Numbskull
Hard work is a medication for which
there is no placebo.
<Lo*****@hotmai l.com> wrote in message
news:11******** *************@u 72g2000cwu.goog legroups.com...
Hello all,
I am attempting to create a small scripting application to be used
during testing. I extract the commands from the script file I was going
to tokenize the each line as one of the requirements is there one
command per line. I have always wanted to learn Regular Expressions, so
I was hoping I might do this using Regular Expressions. For a fair
number of the command will have the syntax like
Write( 0x123, 0x12, 25, 100 ) <- Write three bytes to address 0x123
Write(varName1, 0x12) <- Write one bytes to address
expressed by the value of
varName1
Read( 0x55, 5 ) <- Write one bytes to address 0x55
Read(0x3456, 0x12) <- Read eighteen bytes to address
0x3456
varName2 = Read( varName1 ) <- Read one byte from address
expressed by the value of varName1
and store that read value to
varName2
I know if I use the regular expression (^[a-zA-Z]*) will find the
initial keywords or variable names which I can perform an initial check
to make sure they are valid or the variable has been declared already,
but the hard part is creating a regular expression to match the various
forms of the syntax. How would I create a regular express for the first
and last script commands? I think with those I can attempt to determine
the others. The spaces between the arguments are optional and may be
omitted if the user so desires.
For the first script command I was attempting to craft one that looks
like..
(^[a-zA-Z]*)('\(')(['0x',0-9][a-zA-Z]*)(',')(['0x',0-9][a-zA-Z]*)
but this obviously doesn't work. Any help is greatly appreciated.
Mark