472,800 Members | 1,593 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,800 software developers and data experts.

Enumerating formatting strings

I was messing about with formatting and realized that the right kind of
object could quite easily tell me exactly what accesses are made to the
mapping in a string % mapping operation. This is a fairly well-known
technique, modified to tell me what keys would need to be present in any
mapping used with the format.

class Everything:
def __init__(self, format="%s", discover=False):
self.names = {}
self.values = []
self.format=format
self.discover = discover
def __getitem__(self, key):
x = self.format % key
if self.discover:
self.names[key] = self.names.get(key, 0) + 1
return x
def nameList(self):
if self.names:
return ["%-20s %d" % i for i in self.names.items()]
else:
return self.values
def __getattr__(self, name):
print "Attribute", name, "requested"
return None
def __repr__(self):
return "<Everything object at 0x%x>" % id(self)

def nameCount(template):
et = Everything(discover=True)
p = template % et
nlst = et.nameList()
nlst.sort()
return nlst

for s in nameCount("%(name)s %(value)s %(name)s"):
print s

The result of this effort is:

name 2
value 1

I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide, or whether I'd be forced to lexical analysis of the form string.

regards
Steve
--
Steve Holden +1 703 861 4237 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/

Jul 19 '05 #1
11 1594
On Mon, 18 Apr 2005 16:24:39 -0400, Steve Holden <st***@holdenweb.com> wrote:
I was messing about with formatting and realized that the right kind of
object could quite easily tell me exactly what accesses are made to the
mapping in a string % mapping operation. This is a fairly well-known
technique, modified to tell me what keys would need to be present in any
mapping used with the format.
<snip code>I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide, or whether I'd be forced to lexical analysis of the form string.

When I was playing with formatstring % mapping I thought it could
be useful if you could get the full format specifier info an do your own
complete formatting, even for invented format specifiers. This could be
done without breaking backwards compatibility if str.__mod__ looked for
a __format__ method on the other-wise-mapping-or-tuple-object. If found,
it would call the method, which would expect

def __format__(self,
ix, # index from 0 counting every %... format
name, # from %(name) or ''
width, # from %width.prec
prec, # ditto
fc, # the format character F in %(x)F
all # just a copy of whatever is between % and including F
): ...

This would obviously let you handle non-mapping as you want, and more.

The most popular use would probably be intercepting width in %(name)<width>s
and doing custom formatting (e.g. centering in available space) for the object
and returning the right size string.

Since ix is an integer and doesn't help find the right object without the normal
tuple, you could give your formatting object's __init__ method keyword arguments
to specify arguments for anonymous slots in the format string, conventionally
naming them a0, a1, a2 etc. Then later when you get an ix with no name, you could
write self.kw.get('%as'%ix) to get the value, as in use like
'%(name)s %s' % Formatter(a1=thevalue) # Formatter as base class knows how to do name lookup

Or is this just idearrhea?

Regards,
Bengt Richter
Jul 19 '05 #2
Steve Holden wrote:
I was messing about with formatting and realized that the right kind of
object could quite easily tell me exactly what accesses are made to the
mapping in a string % mapping operation. This is a fairly well-known
technique, modified to tell me what keys would need to be present in any
mapping used with the format.
....
I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide, or whether I'd be forced to lexical analysis of the form string.


PyString_Format() in stringobject.c determines the tuple length, then starts
the formatting process and finally checks whether all items were used -- so
no, it's not possible to feed it a tweaked (auto-growing) tuple like you
did with the dictionary.

Here's a brute-force equivalent to nameCount(), inspired by a post by Hans
Nowak (http://mail.python.org/pipermail/pyt...y/230392.html).

def countArgs(format):
args = (1,) * (format.count("%") - 2*format.count("%%"))
while True:
try:
format % args
except TypeError, e:
args += (1,)
else:
return len(args)

samples = [
("", 0),
("%%", 0),
("%s", 1),
("%%%s", 1),
("%%%*.*d", 3),
("%%%%%*s", 2),
("%s %*s %*d %*f", 7)]
for f, n in samples:
f % ((1,)*n)
assert countArgs(f) == n

Not tested beyond what you see.

Peter

Jul 19 '05 #3
Steve Holden wrote:
I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide,


I just tried an experiment, and it doesn't seem to be possible.

The problem seems to be that it expects the arguments to be
in the form of a tuple, and if you give it something else,
it wraps it up in a 1-element tuple and uses that instead.

This seems to happen even with a custom subclass of tuple,
so it must be doing an exact type check.

So it looks like you'll have to parse the format string.

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg
Jul 19 '05 #4
Greg Ewing wrote:
Steve Holden wrote:
I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide,
I just tried an experiment, and it doesn't seem to be possible.

The problem seems to be that it expects the arguments to be
in the form of a tuple, and if you give it something else,
it wraps it up in a 1-element tuple and uses that instead.

This seems to happen even with a custom subclass of tuple,
so it must be doing an exact type check.


No, it doesn't do an exact type check, but always calls the tuple method:
class Tuple(tuple): .... def __getitem__(self, index):
.... return 42
.... "%r %r" % Tuple("ab") # would raise an exception if wrapped

"'a' 'b'"
So it looks like you'll have to parse the format string.


Indeed.

Peter
Jul 19 '05 #5
On Wed, 20 Apr 2005 09:14:40 +0200, Peter Otten <__*******@web.de> wrote:
Greg Ewing wrote:
Steve Holden wrote:
I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide,


I just tried an experiment, and it doesn't seem to be possible.

The problem seems to be that it expects the arguments to be
in the form of a tuple, and if you give it something else,
it wraps it up in a 1-element tuple and uses that instead.

This seems to happen even with a custom subclass of tuple,
so it must be doing an exact type check.


No, it doesn't do an exact type check, but always calls the tuple method:
class Tuple(tuple):... def __getitem__(self, index):
... return 42
... "%r %r" % Tuple("ab") # would raise an exception if wrapped"'a' 'b'"
So it looks like you'll have to parse the format string.


Indeed.

Parse might be a big word for
def tupreq(fmt): return sum(map(lambda s:list(s).count('%'), fmt.split('%%'))) .. tupreq('%s this %(x)s not %% but %s')


(if it works in general ;-)

Or maybe clearer and faster:
def tupreq(fmt): return sum(1 for c in fmt.replace('%%','') if c=='%') ... tupreq('%s this %(x)s not %% but %s')

3

Regards,
Bengt Richter
Jul 19 '05 #6
Bengt Richter wrote:
Parse might be a big word for
>> def tupreq(fmt): return sum(map(lambda s:list(s).count('%'),
>> fmt.split('%%'))) .. >> tupreq('%s this %(x)s not %% but %s')
(if it works in general ;-)
Which it doesn't:
def tupreq(fmt): return sum(map(lambda s:list(s).count('%'), fmt.split('%%')))
.... fmt = "%*d"
fmt % ((1,) * tupreq(fmt)) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: not enough arguments for format string
Or maybe clearer and faster:
>>> def tupreq(fmt): return sum(1 for c in fmt.replace('%%','') if
>>> c=='%') ... >>> tupreq('%s this %(x)s not %% but %s')

3


Mixed formats show some "interesting" behaviour:
"%s %(x)s" % (1,2) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: format requires a mapping class D: .... def __getitem__(self, key):
.... return "D[%s]" % key
.... "%s %(x)s" % D() '<__main__.D instance at 0x402aaf2c> D[x]' "%s %(x)s %s" % D() Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: not enough arguments for format string "%s %(x)s %(y)s" % D()

'<__main__.D instance at 0x402aad8c> D[x] D[y]'

That is as far as I got. So under what circumstances is
'%s this %(x)s not %% but %s' a valid format string?

Peter

Jul 19 '05 #7
On Wed, 20 Apr 2005 11:01:28 +0200, Peter Otten <__*******@web.de> wrote:
Bengt Richter wrote:
Parse might be a big word for
>> def tupreq(fmt): return sum(map(lambda s:list(s).count('%'),
>> fmt.split('%%'))) ..
>> tupreq('%s this %(x)s not %% but %s')


(if it works in general ;-)


Which it doesn't:

D'oh. (My subconscious knew that one, and prompted the "if" ;-)
def tupreq(fmt): return sum(map(lambda s:list(s).count('%'),fmt.split('%%')))
... fmt = "%*d"
fmt % ((1,) * tupreq(fmt))Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: not enough arguments for format string
But that one it totally spaced on ;-/
Or maybe clearer and faster:
>>> def tupreq(fmt): return sum(1 for c in fmt.replace('%%','') if
>>> c=='%')

...
>>> tupreq('%s this %(x)s not %% but %s')

3

Mixed formats show some "interesting" behaviour:
"%s %(x)s" % (1,2)Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: format requires a mapping class D:... def __getitem__(self, key):
... return "D[%s]" % key
... "%s %(x)s" % D()'<__main__.D instance at 0x402aaf2c> D[x]' "%s %(x)s %s" % D()Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: not enough arguments for format string "%s %(x)s %(y)s" % D()

'<__main__.D instance at 0x402aad8c> D[x] D[y]'

That is as far as I got. So under what circumstances is
'%s this %(x)s not %% but %s' a valid format string?

Yeah, I got that far too, some time ago playing % mapping, and
I thought they just didn't allow for mixed formats. My thought then
was that they could pass integer positional keys to another method
(say __format__) on a mapping object that wants to handle mixed formats.
If you wanted the normal str or repr resprensentation of a mapping
object that had a __format__ method, you'd have to do it on the args
side with str(theobject), but you'd have a way. And normal mapping objects
would need no special handling for "%s' in a mixed format context.

Regards,
Bengt Richter
Jul 19 '05 #8
Bengt Richter wrote:
On Wed, 20 Apr 2005 11:01:28 +0200, Peter Otten <__*******@web.de> wrote:

....
>"%s %(x)s %(y)s" % D()
My experiments suggest that you can have a maximum of one unnamed argument in a
mapping template - this unnamed value evaluates to the map itself
...
So under what circumstances is
'%s this %(x)s not %% but %s' a valid format string?


Based on the above experiments, never.

I have wrapped up my current understanding in the following class:
s = StringFormatInfo('%s %*.*d %*s')
s POSITIONAL Template: %s %*.*d %*s
Arguments: ('s', 'width', 'precision', 'd', 'width', 's')
s = StringFormatInfo('%(arg1)s %% %(arg2).*f %()s %s')
s MAPPING Template: %(arg1)s %% %(arg2).*f %()s %s
Arguments: {'': 's', 'arg1': 's', 'arg2': 'f', None: 's'}


class StringFormatInfo(object):
parse_format = re.compile(r'''
\% # placeholder
(?:\((?P<name>[\w]*)\))? # 0 or 1 named groups
(?P<conversion>[\#0\-\+]?) # 0 or 1 conversion flags
(?P<width>[\d]* | \*) # optional minimum conversion width
(?:.(?P<precision>[\d]+ | \*))? # optional precision
(?P<lengthmodifier>[hlL]?) # optional length modifier
(?P<type>[diouxXeEfFgGcrs]{1}) # conversion type - note %% omitted
''',
re.VERBOSE
)

"""Wraps a template string and provides information about the number and
kinds of arguments that must be supplied. Call with % to apply the
template to data"""

def __init__(self, template):
self.template = template
self.formats = formats = [m.groupdict() for m in
self.parse_format.finditer(template)]

for format in formats:
if format['name']:
self.format_type = "MAPPING"
self.format_names = dict((format['name'], format['type'])
for format in formats)
break
else:
self.format_type = "POSITIONAL"
format_names = []
for format in formats:
if format['width'] == '*':
format_names.append('width')
if format['precision'] == '*':
format_names.append('precision')
format_names.append(format['type'])
self.format_names = tuple(format_names)

def __mod__(self, values):
return self.template % values

def __repr__(self):
return "%s Template: %s\nArguments: %s" % \
(self.format_type, self.template, self.format_names)

Michael

Jul 19 '05 #9
Michael Spencer wrote:
I have wrapped up my current understanding in the following class:


I see you assume that only \w+ can fit inside of a %()
in a format string. The actual Python code allows anything
up to the balanced closed parens.
class Show: .... def __getitem__(self, text):
.... print "Want", repr(text)
.... "%(this(is)a.--test!)s" % Show() Want 'this(is)a.--test!'
'None'


I found this useful for a templating library I once wrote
that allowed operations through a simple pipeline, like

%(doc.text|reformat(68)|indent(4))s

Andrew
da***@dalkescientific.com

Jul 19 '05 #10
Andrew Dalke wrote:
I see you assume that only \w+ can fit inside of a %()
in a format string. The actual Python code allows anything
up to the balanced closed parens.

Gah! I guess that torpedoes the regexp approach, then.

Thanks for looking at this

Michael

Jul 19 '05 #11
Steve Holden wrote:
Michael Spencer wrote:
Andrew Dalke wrote:
I see you assume that only \w+ can fit inside of a %()
in a format string. The actual Python code allows anything
up to the balanced closed parens.

Gah! I guess that torpedoes the regexp approach, then.

Thanks for looking at this

Michael

While Andrew may have found the "fatal flaw" in your scheme, it's worth
pointing out that it works just fine for my original use case.

regards
Steve


Thanks. Here's a version that overcomes the 'fatal' flaw.

class StringFormatInfo(object):

def __init__(self, template):
self.template = template
self.parse()

def tokenizer(self):
lexer = TinyLexer(self.template)
self.format_type = "POSITIONAL"
while lexer.search("\%"):
if lexer.match("\%"):
continue
format = {}
name = lexer.takeparens()
if name is not None:
self.format_type = "MAPPING"
format['name'] = name
format['conversion'] = lexer.match("[\#0\-\+]")
format['width'] = lexer.match("\d+|\*")
format['precision'] = lexer.match("\.") and \
lexer.match("\d+|\*") or None
format['lengthmodifier'] = lexer.match("[hlL]")
ftype = lexer.match('[diouxXeEfFgGcrs]')
if not ftype:
raise ValueError
else:
format['type'] = ftype
yield format

def parse(self):
self.formats = formats = list(self.tokenizer())
if self.format_type == "MAPPING":
self.format_names = dict((format['name'], format['type'])
for format in formats)
else:
format_names = []
for format in formats:
if format['width'] == '*':
format_names.append('width')
if format['precision'] == '*':
format_names.append('precision')
format_names.append(format['type'])
self.format_names = tuple(format_names)

def __mod__(self, values):
return self.template % values

def __repr__(self):
return "%s Template: %s\nArguments: %s" % \
(self.format_type, self.template, self.format_names)
__str__ = __repr__

SFI = StringFormatInfo

def tests():
print SFI('%(arg1)s %% %(arg2).*f %()s %s')
print SFI('%s %*.*d %*s')
print SFI('%(this(is)a.--test!)s')
import re

class TinyLexer(object):
def __init__(self, text):
self.text = text
self.ptr = 0
self.len = len(text)
self.re_cache = {}

def match(self, regexp, consume = True, anchor = True):
if isinstance(regexp, basestring):
cache = self.re_cache
if regexp not in cache:
cache[regexp] = re.compile(regexp)
regexp = cache[regexp]
matcher = anchor and regexp.match or regexp.search
match = matcher(self.text, self.ptr)
if not match:
return None
if consume:
self.ptr = match.end()
return match.group()

def search(self, regexp, consume = True):
return self.match(regexp, consume=True, anchor=False)

def takeparens(self):
start = self.ptr
if self.text[start] != '(':
return None
out = ''
level = 1
self.ptr += 1
while self.ptr < self.len:
nextchar = self.text[self.ptr]
level += (nextchar == '(') - (nextchar == ')')
self.ptr += 1
if level == 0:
return out
out += nextchar
raise ValueError, "Unmatched parentheses"

Jul 19 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Jouke Langhout | last post by:
Hello all! For quite some time now, I've got the following problem: Access won't close properly when a user closes the application. An ACCESS process stays active and that process can only be...
7
by: BBFrost | last post by:
I'm receiving decimal values from database queries and placing them on a report page. The users want to see the following .... Db Value Display Value 123.3400 123.34...
4
by: Robert Manookian | last post by:
How do you format strings? i.e. In VB6: Format("AB34567", "@@@@@-@@") = "AB345-67" In .Net: ????????
2
by: David Veeneman | last post by:
How does one format a date column in a GridView control? I had assumed that the DataFormat string would do it, but MSDN only shows numeric formatting codes. Can dates be formatted using that...
4
by: Peter Newman | last post by:
the data input app im writing has some 30 + input fields and i want to be able to format them. I know i can use the .validate on each textbox and format the 'string' however this require loads...
11
by: Dustan | last post by:
Is there any builtin function or module with a function similar to my made-up, not-written deformat function as follows? I can't imagine it would be too easy to write, but possible... 'I am...
9
by: john coltrane | last post by:
Is there way to create a formatted string in a similar that is similar to sprintf? The same for printing, printf? C,D,E,F,G,N,X for currency, decimal, exponential, fixed, general, numerical,...
6
by: Tomasz J | last post by:
Hello developers, I bind my TextBox control specyfying a format stored in my application global ApplicationContext object - it has a static string CurrencyFormat property. The problem - this...
2
by: Jean-Paul Calderone | last post by:
On Fri, 5 Sep 2008 14:24:16 -0500, Robert Dailey <rcdailey@gmail.comwrote: mystring = ( "This is a very long string that " "spans multiple lines and does " "not include line breaks or tabs "...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: erikbower65 | last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps: 1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal. 2. Connect to...
0
by: kcodez | last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: Taofi | last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same This are my field names ID, Budgeted, Actual, Status and Differences ...
0
by: Rina0 | last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
5
by: DJRhino | last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer) If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _ 310030356 Or 310030359 Or 310030362 Or...
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.