By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,519 Members | 1,792 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,519 IT Pros & Developers. It's quick & easy.

Enumerating formatting strings

P: n/a
I was messing about with formatting and realized that the right kind of
object could quite easily tell me exactly what accesses are made to the
mapping in a string % mapping operation. This is a fairly well-known
technique, modified to tell me what keys would need to be present in any
mapping used with the format.

class Everything:
def __init__(self, format="%s", discover=False):
self.names = {}
self.values = []
self.format=format
self.discover = discover
def __getitem__(self, key):
x = self.format % key
if self.discover:
self.names[key] = self.names.get(key, 0) + 1
return x
def nameList(self):
if self.names:
return ["%-20s %d" % i for i in self.names.items()]
else:
return self.values
def __getattr__(self, name):
print "Attribute", name, "requested"
return None
def __repr__(self):
return "<Everything object at 0x%x>" % id(self)

def nameCount(template):
et = Everything(discover=True)
p = template % et
nlst = et.nameList()
nlst.sort()
return nlst

for s in nameCount("%(name)s %(value)s %(name)s"):
print s

The result of this effort is:

name 2
value 1

I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide, or whether I'd be forced to lexical analysis of the form string.

regards
Steve
--
Steve Holden +1 703 861 4237 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/

Jul 19 '05 #1
Share this Question
Share on Google+
11 Replies


P: n/a
On Mon, 18 Apr 2005 16:24:39 -0400, Steve Holden <st***@holdenweb.com> wrote:
I was messing about with formatting and realized that the right kind of
object could quite easily tell me exactly what accesses are made to the
mapping in a string % mapping operation. This is a fairly well-known
technique, modified to tell me what keys would need to be present in any
mapping used with the format.
<snip code>I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide, or whether I'd be forced to lexical analysis of the form string.

When I was playing with formatstring % mapping I thought it could
be useful if you could get the full format specifier info an do your own
complete formatting, even for invented format specifiers. This could be
done without breaking backwards compatibility if str.__mod__ looked for
a __format__ method on the other-wise-mapping-or-tuple-object. If found,
it would call the method, which would expect

def __format__(self,
ix, # index from 0 counting every %... format
name, # from %(name) or ''
width, # from %width.prec
prec, # ditto
fc, # the format character F in %(x)F
all # just a copy of whatever is between % and including F
): ...

This would obviously let you handle non-mapping as you want, and more.

The most popular use would probably be intercepting width in %(name)<width>s
and doing custom formatting (e.g. centering in available space) for the object
and returning the right size string.

Since ix is an integer and doesn't help find the right object without the normal
tuple, you could give your formatting object's __init__ method keyword arguments
to specify arguments for anonymous slots in the format string, conventionally
naming them a0, a1, a2 etc. Then later when you get an ix with no name, you could
write self.kw.get('%as'%ix) to get the value, as in use like
'%(name)s %s' % Formatter(a1=thevalue) # Formatter as base class knows how to do name lookup

Or is this just idearrhea?

Regards,
Bengt Richter
Jul 19 '05 #2

P: n/a
Steve Holden wrote:
I was messing about with formatting and realized that the right kind of
object could quite easily tell me exactly what accesses are made to the
mapping in a string % mapping operation. This is a fairly well-known
technique, modified to tell me what keys would need to be present in any
mapping used with the format.
....
I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide, or whether I'd be forced to lexical analysis of the form string.


PyString_Format() in stringobject.c determines the tuple length, then starts
the formatting process and finally checks whether all items were used -- so
no, it's not possible to feed it a tweaked (auto-growing) tuple like you
did with the dictionary.

Here's a brute-force equivalent to nameCount(), inspired by a post by Hans
Nowak (http://mail.python.org/pipermail/pyt...y/230392.html).

def countArgs(format):
args = (1,) * (format.count("%") - 2*format.count("%%"))
while True:
try:
format % args
except TypeError, e:
args += (1,)
else:
return len(args)

samples = [
("", 0),
("%%", 0),
("%s", 1),
("%%%s", 1),
("%%%*.*d", 3),
("%%%%%*s", 2),
("%s %*s %*d %*f", 7)]
for f, n in samples:
f % ((1,)*n)
assert countArgs(f) == n

Not tested beyond what you see.

Peter

Jul 19 '05 #3

P: n/a
Steve Holden wrote:
I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide,


I just tried an experiment, and it doesn't seem to be possible.

The problem seems to be that it expects the arguments to be
in the form of a tuple, and if you give it something else,
it wraps it up in a 1-element tuple and uses that instead.

This seems to happen even with a custom subclass of tuple,
so it must be doing an exact type check.

So it looks like you'll have to parse the format string.

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg
Jul 19 '05 #4

P: n/a
Greg Ewing wrote:
Steve Holden wrote:
I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide,
I just tried an experiment, and it doesn't seem to be possible.

The problem seems to be that it expects the arguments to be
in the form of a tuple, and if you give it something else,
it wraps it up in a 1-element tuple and uses that instead.

This seems to happen even with a custom subclass of tuple,
so it must be doing an exact type check.


No, it doesn't do an exact type check, but always calls the tuple method:
class Tuple(tuple): .... def __getitem__(self, index):
.... return 42
.... "%r %r" % Tuple("ab") # would raise an exception if wrapped

"'a' 'b'"
So it looks like you'll have to parse the format string.


Indeed.

Peter
Jul 19 '05 #5

P: n/a
On Wed, 20 Apr 2005 09:14:40 +0200, Peter Otten <__*******@web.de> wrote:
Greg Ewing wrote:
Steve Holden wrote:
I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide,


I just tried an experiment, and it doesn't seem to be possible.

The problem seems to be that it expects the arguments to be
in the form of a tuple, and if you give it something else,
it wraps it up in a 1-element tuple and uses that instead.

This seems to happen even with a custom subclass of tuple,
so it must be doing an exact type check.


No, it doesn't do an exact type check, but always calls the tuple method:
class Tuple(tuple):... def __getitem__(self, index):
... return 42
... "%r %r" % Tuple("ab") # would raise an exception if wrapped"'a' 'b'"
So it looks like you'll have to parse the format string.


Indeed.

Parse might be a big word for
def tupreq(fmt): return sum(map(lambda s:list(s).count('%'), fmt.split('%%'))) .. tupreq('%s this %(x)s not %% but %s')


(if it works in general ;-)

Or maybe clearer and faster:
def tupreq(fmt): return sum(1 for c in fmt.replace('%%','') if c=='%') ... tupreq('%s this %(x)s not %% but %s')

3

Regards,
Bengt Richter
Jul 19 '05 #6

P: n/a
Bengt Richter wrote:
Parse might be a big word for
>> def tupreq(fmt): return sum(map(lambda s:list(s).count('%'),
>> fmt.split('%%'))) .. >> tupreq('%s this %(x)s not %% but %s')
(if it works in general ;-)
Which it doesn't:
def tupreq(fmt): return sum(map(lambda s:list(s).count('%'), fmt.split('%%')))
.... fmt = "%*d"
fmt % ((1,) * tupreq(fmt)) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: not enough arguments for format string
Or maybe clearer and faster:
>>> def tupreq(fmt): return sum(1 for c in fmt.replace('%%','') if
>>> c=='%') ... >>> tupreq('%s this %(x)s not %% but %s')

3


Mixed formats show some "interesting" behaviour:
"%s %(x)s" % (1,2) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: format requires a mapping class D: .... def __getitem__(self, key):
.... return "D[%s]" % key
.... "%s %(x)s" % D() '<__main__.D instance at 0x402aaf2c> D[x]' "%s %(x)s %s" % D() Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: not enough arguments for format string "%s %(x)s %(y)s" % D()

'<__main__.D instance at 0x402aad8c> D[x] D[y]'

That is as far as I got. So under what circumstances is
'%s this %(x)s not %% but %s' a valid format string?

Peter

Jul 19 '05 #7

P: n/a
On Wed, 20 Apr 2005 11:01:28 +0200, Peter Otten <__*******@web.de> wrote:
Bengt Richter wrote:
Parse might be a big word for
>> def tupreq(fmt): return sum(map(lambda s:list(s).count('%'),
>> fmt.split('%%'))) ..
>> tupreq('%s this %(x)s not %% but %s')


(if it works in general ;-)


Which it doesn't:

D'oh. (My subconscious knew that one, and prompted the "if" ;-)
def tupreq(fmt): return sum(map(lambda s:list(s).count('%'),fmt.split('%%')))
... fmt = "%*d"
fmt % ((1,) * tupreq(fmt))Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: not enough arguments for format string
But that one it totally spaced on ;-/
Or maybe clearer and faster:
>>> def tupreq(fmt): return sum(1 for c in fmt.replace('%%','') if
>>> c=='%')

...
>>> tupreq('%s this %(x)s not %% but %s')

3

Mixed formats show some "interesting" behaviour:
"%s %(x)s" % (1,2)Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: format requires a mapping class D:... def __getitem__(self, key):
... return "D[%s]" % key
... "%s %(x)s" % D()'<__main__.D instance at 0x402aaf2c> D[x]' "%s %(x)s %s" % D()Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: not enough arguments for format string "%s %(x)s %(y)s" % D()

'<__main__.D instance at 0x402aad8c> D[x] D[y]'

That is as far as I got. So under what circumstances is
'%s this %(x)s not %% but %s' a valid format string?

Yeah, I got that far too, some time ago playing % mapping, and
I thought they just didn't allow for mixed formats. My thought then
was that they could pass integer positional keys to another method
(say __format__) on a mapping object that wants to handle mixed formats.
If you wanted the normal str or repr resprensentation of a mapping
object that had a __format__ method, you'd have to do it on the args
side with str(theobject), but you'd have a way. And normal mapping objects
would need no special handling for "%s' in a mixed format context.

Regards,
Bengt Richter
Jul 19 '05 #8

P: n/a
Bengt Richter wrote:
On Wed, 20 Apr 2005 11:01:28 +0200, Peter Otten <__*******@web.de> wrote:

....
>"%s %(x)s %(y)s" % D()
My experiments suggest that you can have a maximum of one unnamed argument in a
mapping template - this unnamed value evaluates to the map itself
...
So under what circumstances is
'%s this %(x)s not %% but %s' a valid format string?


Based on the above experiments, never.

I have wrapped up my current understanding in the following class:
s = StringFormatInfo('%s %*.*d %*s')
s POSITIONAL Template: %s %*.*d %*s
Arguments: ('s', 'width', 'precision', 'd', 'width', 's')
s = StringFormatInfo('%(arg1)s %% %(arg2).*f %()s %s')
s MAPPING Template: %(arg1)s %% %(arg2).*f %()s %s
Arguments: {'': 's', 'arg1': 's', 'arg2': 'f', None: 's'}


class StringFormatInfo(object):
parse_format = re.compile(r'''
\% # placeholder
(?:\((?P<name>[\w]*)\))? # 0 or 1 named groups
(?P<conversion>[\#0\-\+]?) # 0 or 1 conversion flags
(?P<width>[\d]* | \*) # optional minimum conversion width
(?:.(?P<precision>[\d]+ | \*))? # optional precision
(?P<lengthmodifier>[hlL]?) # optional length modifier
(?P<type>[diouxXeEfFgGcrs]{1}) # conversion type - note %% omitted
''',
re.VERBOSE
)

"""Wraps a template string and provides information about the number and
kinds of arguments that must be supplied. Call with % to apply the
template to data"""

def __init__(self, template):
self.template = template
self.formats = formats = [m.groupdict() for m in
self.parse_format.finditer(template)]

for format in formats:
if format['name']:
self.format_type = "MAPPING"
self.format_names = dict((format['name'], format['type'])
for format in formats)
break
else:
self.format_type = "POSITIONAL"
format_names = []
for format in formats:
if format['width'] == '*':
format_names.append('width')
if format['precision'] == '*':
format_names.append('precision')
format_names.append(format['type'])
self.format_names = tuple(format_names)

def __mod__(self, values):
return self.template % values

def __repr__(self):
return "%s Template: %s\nArguments: %s" % \
(self.format_type, self.template, self.format_names)

Michael

Jul 19 '05 #9

P: n/a
Michael Spencer wrote:
I have wrapped up my current understanding in the following class:


I see you assume that only \w+ can fit inside of a %()
in a format string. The actual Python code allows anything
up to the balanced closed parens.
class Show: .... def __getitem__(self, text):
.... print "Want", repr(text)
.... "%(this(is)a.--test!)s" % Show() Want 'this(is)a.--test!'
'None'


I found this useful for a templating library I once wrote
that allowed operations through a simple pipeline, like

%(doc.text|reformat(68)|indent(4))s

Andrew
da***@dalkescientific.com

Jul 19 '05 #10

P: n/a
Andrew Dalke wrote:
I see you assume that only \w+ can fit inside of a %()
in a format string. The actual Python code allows anything
up to the balanced closed parens.

Gah! I guess that torpedoes the regexp approach, then.

Thanks for looking at this

Michael

Jul 19 '05 #11

P: n/a
Steve Holden wrote:
Michael Spencer wrote:
Andrew Dalke wrote:
I see you assume that only \w+ can fit inside of a %()
in a format string. The actual Python code allows anything
up to the balanced closed parens.

Gah! I guess that torpedoes the regexp approach, then.

Thanks for looking at this

Michael

While Andrew may have found the "fatal flaw" in your scheme, it's worth
pointing out that it works just fine for my original use case.

regards
Steve


Thanks. Here's a version that overcomes the 'fatal' flaw.

class StringFormatInfo(object):

def __init__(self, template):
self.template = template
self.parse()

def tokenizer(self):
lexer = TinyLexer(self.template)
self.format_type = "POSITIONAL"
while lexer.search("\%"):
if lexer.match("\%"):
continue
format = {}
name = lexer.takeparens()
if name is not None:
self.format_type = "MAPPING"
format['name'] = name
format['conversion'] = lexer.match("[\#0\-\+]")
format['width'] = lexer.match("\d+|\*")
format['precision'] = lexer.match("\.") and \
lexer.match("\d+|\*") or None
format['lengthmodifier'] = lexer.match("[hlL]")
ftype = lexer.match('[diouxXeEfFgGcrs]')
if not ftype:
raise ValueError
else:
format['type'] = ftype
yield format

def parse(self):
self.formats = formats = list(self.tokenizer())
if self.format_type == "MAPPING":
self.format_names = dict((format['name'], format['type'])
for format in formats)
else:
format_names = []
for format in formats:
if format['width'] == '*':
format_names.append('width')
if format['precision'] == '*':
format_names.append('precision')
format_names.append(format['type'])
self.format_names = tuple(format_names)

def __mod__(self, values):
return self.template % values

def __repr__(self):
return "%s Template: %s\nArguments: %s" % \
(self.format_type, self.template, self.format_names)
__str__ = __repr__

SFI = StringFormatInfo

def tests():
print SFI('%(arg1)s %% %(arg2).*f %()s %s')
print SFI('%s %*.*d %*s')
print SFI('%(this(is)a.--test!)s')
import re

class TinyLexer(object):
def __init__(self, text):
self.text = text
self.ptr = 0
self.len = len(text)
self.re_cache = {}

def match(self, regexp, consume = True, anchor = True):
if isinstance(regexp, basestring):
cache = self.re_cache
if regexp not in cache:
cache[regexp] = re.compile(regexp)
regexp = cache[regexp]
matcher = anchor and regexp.match or regexp.search
match = matcher(self.text, self.ptr)
if not match:
return None
if consume:
self.ptr = match.end()
return match.group()

def search(self, regexp, consume = True):
return self.match(regexp, consume=True, anchor=False)

def takeparens(self):
start = self.ptr
if self.text[start] != '(':
return None
out = ''
level = 1
self.ptr += 1
while self.ptr < self.len:
nextchar = self.text[self.ptr]
level += (nextchar == '(') - (nextchar == ')')
self.ptr += 1
if level == 0:
return out
out += nextchar
raise ValueError, "Unmatched parentheses"

Jul 19 '05 #12

This discussion thread is closed

Replies have been disabled for this discussion.