471,356 Members | 1,676 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,356 software developers and data experts.

Trivial string substitution/parser

Hi,

How would you implement a simple parser for the following string:

---
In this string $variable1 is substituted, while \$variable2 is not.
---

I know how to write a parser, but I am looking for an elegant (and lazy)
way. Any idea?

-Samuel
Jun 17 '07 #1
7 1542
Samuel <ne********@debain.orgwrote:
Hi,

How would you implement a simple parser for the following string:

---
In this string $variable1 is substituted, while \$variable2 is not.
---

I know how to write a parser, but I am looking for an elegant (and lazy)
way. Any idea?
The elegant and lazy way would be to change your specification so that $
characters are escaped by $$ not by backslashes. Then you can write:
>>from string import Template
t = Template("In this string $variable1 is substituted, while
$$variable2 is not.")
>>t.substitute(variable1="hello", variable2="world")
'In this string hello is substituted, while $variable2 is not.'

If you must insist on using backslash escapes (which introduces the
question of how you get backslashes into the output: do they have to be
escaped as well?) then use string.Template with a custom pattern.

Jun 17 '07 #2
On Sun, 17 Jun 2007 11:00:58 +0000, Duncan Booth wrote:
The elegant and lazy way would be to change your specification so that $
characters are escaped by $$ not by backslashes. Then you can write:
>>>from string import Template
...
Thanks, however, turns out my specification of the problem was
incomplete: In addition, the variable names are not known at compilation
time.
I just did it that way, this looks fairly easy already:

-------------------
import re

def variable_sub_cb(match):
prepend = match.group(1)
varname = match.group(2)
value = get_variable(varname)
return prepend + value

string_re = re.compile(r'(^|[^\\])\$([a-z][\w_]+\b)', re.I)

input = r'In this string $variable1 is substituted,'
input += 'while \$variable2 is not.'

print string_re.sub(variable_sub_cb, input)
-------------------

-Samuel
Jun 17 '07 #3
Samuel wrote:
On Sun, 17 Jun 2007 11:00:58 +0000, Duncan Booth wrote:
>The elegant and lazy way would be to change your specification so that $
characters are escaped by $$ not by backslashes. Then you can write:
>>>>from string import Template
...

Thanks, however, turns out my specification of the problem was
incomplete: In addition, the variable names are not known at compilation
time.
You mean at edit-time.
>>t.substitute(variable1="hello", variable2="world")
Can be replaced by...
>>t.substitute(**vars)
....as per the standard **kwargs passing semantics.
- Josiah
Jun 18 '07 #4
Samuel wote:
Thanks, however, turns out my specification of the problem was
incomplete: In addition, the variable names are not known at compilation
time.
I just did it that way, this looks fairly easy already:

-------------------
import re

def variable_sub_cb(match):
prepend = match.group(1)
varname = match.group(2)
value = get_variable(varname)
return prepend + value

string_re = re.compile(r'(^|[^\\])\$([a-z][\w_]+\b)', re.I)

input = r'In this string $variable1 is substituted,'
input += 'while \$variable2 is not.'

print string_re.sub(variable_sub_cb, input)
-------------------
It gets easier:

import re

def variable_sub_cb(match):
return get_variable(match.group(1))

string_re = re.compile(r'(?<!\\)\$([A-Za-z]\w+)')

def get_variable(varname):
return globals()[varname]

variable1 = 'variable 1'

input = r'In this string $variable1 is substituted,'
input += 'while \$variable2 is not.'

print string_re.sub(variable_sub_cb, input)

or even

import re

def variable_sub_cb(match):
return globals()[match.group(1)]

variable1 = 'variable 1'
input = (r'In this string $variable1 is substituted,'
'while \$variable2 is not.')

print re.sub(r'(?<!\\)\$([A-Za-z]\w+)', variable_sub_cb, input)
Graham

Jun 18 '07 #5
Josiah Carlson <jo************@sbcglobal.netwrote:
Samuel wrote:
>On Sun, 17 Jun 2007 11:00:58 +0000, Duncan Booth wrote:
>>The elegant and lazy way would be to change your specification so
that $ characters are escaped by $$ not by backslashes. Then you can
write:

>from string import Template
>...

Thanks, however, turns out my specification of the problem was
incomplete: In addition, the variable names are not known at
compilation time.

You mean at edit-time.
>t.substitute(variable1="hello", variable2="world")

Can be replaced by...
>t.substitute(**vars)

...as per the standard **kwargs passing semantics.
You don't even need to do that. substitute will accept a dictionary as a
positional argument:

t.substitute(vars)

If you use both forms then the keyword arguments take priority.

Also, of course, vars just needs to be something which quacks like a dict:
it can do whatever it needs to do such as looking up a database or querying
a server to generate the value only when it needs it, or even evaluating
the name as an expression; in the OP's case it could call get_variable.

Anyway, the question seems to be moot since the OP's definition of 'elegant
and lazy' includes regular expressions and reinvented wheels.

.... and in another message Graham Breed wrote:
def get_variable(varname):
return globals()[varname]
Doesn't the mere thought of creating global variables with unknown names
make you shudder?

Jun 18 '07 #6
Duncan Booth wote:
Also, of course, vars just needs to be something which quacks like a dict:
it can do whatever it needs to do such as looking up a database or querying
a server to generate the value only when it needs it, or even evaluating
the name as an expression; in the OP's case it could call get_variable.
And in case that sounds difficult, the code is

class VariableGetter:
def __getitem__(self, key):
return get_variable(key)
Anyway, the question seems to be moot since the OP's definition of 'elegant
and lazy' includes regular expressions and reinvented wheels.
Your suggestion of subclassing string.Template will also require a
regular expression -- and a fairly hairy one as far as I can work out
from the documentation. There isn't an example and I don't think it's
the easiest way of solving this problem. But if Samuel really wants
backslash escaping it'd be easier to do a replace('$$','$$$$') and
replace('\\$', '$$') (or replace('\\$','\\$$') if he really wants the
backslash to persist) before using the template.

Then, if he really does want to reject single letter variable names,
or names beginning with a backslash, he'll still need to subclass
Template and supply a regular expression, but a simpler one.
... and in another message Graham Breed wrote:
def get_variable(varname):
return globals()[varname]

Doesn't the mere thought of creating global variables with unknown names
make you shudder?
Not at all. It works, it's what the shell does, and it's easy to test
interactively. Obviously the application code wouldn't look like
that.
Graham

Jun 19 '07 #7
Duncan Booth wote:
If you must insist on using backslash escapes (which introduces the
question of how you get backslashes into the output: do they have to be
escaped as well?) then use string.Template with a custom pattern.
If anybody wants this, I worked out the following regular expression
which seems to work:

(?P<escaped>\\)\$ | # backslash escape pattern
\$(?:
(?P<named>[_a-z][_a-z0-9]*) | # delimiter and Python identifier
{(?P<braced>[_a-z][_a-z0-9]*)} | # delimiter and braced identifier
(?P<invalid>) # Other ill-formed delimiter exprs
)

The clue is string.Template.pattern.pattern

So you compile that with verbose and case-insensitive flags and set it
to "pattern" in a string.Template subclass. (In fact you don't have
to compile it, but that behaviour's undocumented.) Something like
>>regexp = """
.... (?P<escaped>\\\\)\\$ | # backslash escape pattern
.... \$(?:
.... (?P<named>[_a-z][_a-z0-9]*) | # delimiter and identifier
.... {(?P<braced>[_a-z][_a-z0-9]*)} | # ... and braced identifier
.... (?P<invalid>) # Other ill-formed delimiter exprs
.... )
.... """
>>class BackslashEscape(Template):
.... pattern = re.compile(regexp, re.I | re.X)
....
Graham

Jun 19 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by peter leonard | last post: by
6 posts views Thread by Troll | last post: by
5 posts views Thread by Murali | last post: by
8 posts views Thread by Ben Dewey | last post: by
6 posts views Thread by Generic Usenet Account | last post: by
1 post views Thread by Horacius ReX | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.