473,386 Members | 1,832 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Enumerating formatting strings

I was messing about with formatting and realized that the right kind of
object could quite easily tell me exactly what accesses are made to the
mapping in a string % mapping operation. This is a fairly well-known
technique, modified to tell me what keys would need to be present in any
mapping used with the format.

class Everything:
def __init__(self, format="%s", discover=False):
self.names = {}
self.values = []
self.format=format
self.discover = discover
def __getitem__(self, key):
x = self.format % key
if self.discover:
self.names[key] = self.names.get(key, 0) + 1
return x
def nameList(self):
if self.names:
return ["%-20s %d" % i for i in self.names.items()]
else:
return self.values
def __getattr__(self, name):
print "Attribute", name, "requested"
return None
def __repr__(self):
return "<Everything object at 0x%x>" % id(self)

def nameCount(template):
et = Everything(discover=True)
p = template % et
nlst = et.nameList()
nlst.sort()
return nlst

for s in nameCount("%(name)s %(value)s %(name)s"):
print s

The result of this effort is:

name 2
value 1

I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide, or whether I'd be forced to lexical analysis of the form string.

regards
Steve
--
Steve Holden +1 703 861 4237 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/

Jul 19 '05 #1
11 1619
On Mon, 18 Apr 2005 16:24:39 -0400, Steve Holden <st***@holdenweb.com> wrote:
I was messing about with formatting and realized that the right kind of
object could quite easily tell me exactly what accesses are made to the
mapping in a string % mapping operation. This is a fairly well-known
technique, modified to tell me what keys would need to be present in any
mapping used with the format.
<snip code>I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide, or whether I'd be forced to lexical analysis of the form string.

When I was playing with formatstring % mapping I thought it could
be useful if you could get the full format specifier info an do your own
complete formatting, even for invented format specifiers. This could be
done without breaking backwards compatibility if str.__mod__ looked for
a __format__ method on the other-wise-mapping-or-tuple-object. If found,
it would call the method, which would expect

def __format__(self,
ix, # index from 0 counting every %... format
name, # from %(name) or ''
width, # from %width.prec
prec, # ditto
fc, # the format character F in %(x)F
all # just a copy of whatever is between % and including F
): ...

This would obviously let you handle non-mapping as you want, and more.

The most popular use would probably be intercepting width in %(name)<width>s
and doing custom formatting (e.g. centering in available space) for the object
and returning the right size string.

Since ix is an integer and doesn't help find the right object without the normal
tuple, you could give your formatting object's __init__ method keyword arguments
to specify arguments for anonymous slots in the format string, conventionally
naming them a0, a1, a2 etc. Then later when you get an ix with no name, you could
write self.kw.get('%as'%ix) to get the value, as in use like
'%(name)s %s' % Formatter(a1=thevalue) # Formatter as base class knows how to do name lookup

Or is this just idearrhea?

Regards,
Bengt Richter
Jul 19 '05 #2
Steve Holden wrote:
I was messing about with formatting and realized that the right kind of
object could quite easily tell me exactly what accesses are made to the
mapping in a string % mapping operation. This is a fairly well-known
technique, modified to tell me what keys would need to be present in any
mapping used with the format.
....
I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide, or whether I'd be forced to lexical analysis of the form string.


PyString_Format() in stringobject.c determines the tuple length, then starts
the formatting process and finally checks whether all items were used -- so
no, it's not possible to feed it a tweaked (auto-growing) tuple like you
did with the dictionary.

Here's a brute-force equivalent to nameCount(), inspired by a post by Hans
Nowak (http://mail.python.org/pipermail/pyt...y/230392.html).

def countArgs(format):
args = (1,) * (format.count("%") - 2*format.count("%%"))
while True:
try:
format % args
except TypeError, e:
args += (1,)
else:
return len(args)

samples = [
("", 0),
("%%", 0),
("%s", 1),
("%%%s", 1),
("%%%*.*d", 3),
("%%%%%*s", 2),
("%s %*s %*d %*f", 7)]
for f, n in samples:
f % ((1,)*n)
assert countArgs(f) == n

Not tested beyond what you see.

Peter

Jul 19 '05 #3
Steve Holden wrote:
I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide,


I just tried an experiment, and it doesn't seem to be possible.

The problem seems to be that it expects the arguments to be
in the form of a tuple, and if you give it something else,
it wraps it up in a 1-element tuple and uses that instead.

This seems to happen even with a custom subclass of tuple,
so it must be doing an exact type check.

So it looks like you'll have to parse the format string.

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg
Jul 19 '05 #4
Greg Ewing wrote:
Steve Holden wrote:
I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide,
I just tried an experiment, and it doesn't seem to be possible.

The problem seems to be that it expects the arguments to be
in the form of a tuple, and if you give it something else,
it wraps it up in a 1-element tuple and uses that instead.

This seems to happen even with a custom subclass of tuple,
so it must be doing an exact type check.


No, it doesn't do an exact type check, but always calls the tuple method:
class Tuple(tuple): .... def __getitem__(self, index):
.... return 42
.... "%r %r" % Tuple("ab") # would raise an exception if wrapped

"'a' 'b'"
So it looks like you'll have to parse the format string.


Indeed.

Peter
Jul 19 '05 #5
On Wed, 20 Apr 2005 09:14:40 +0200, Peter Otten <__*******@web.de> wrote:
Greg Ewing wrote:
Steve Holden wrote:
I've been wondering whether it's possible to perform a similar analysis
on non-mapping-type format strings, so as to know how long a tuple to
provide,


I just tried an experiment, and it doesn't seem to be possible.

The problem seems to be that it expects the arguments to be
in the form of a tuple, and if you give it something else,
it wraps it up in a 1-element tuple and uses that instead.

This seems to happen even with a custom subclass of tuple,
so it must be doing an exact type check.


No, it doesn't do an exact type check, but always calls the tuple method:
class Tuple(tuple):... def __getitem__(self, index):
... return 42
... "%r %r" % Tuple("ab") # would raise an exception if wrapped"'a' 'b'"
So it looks like you'll have to parse the format string.


Indeed.

Parse might be a big word for
def tupreq(fmt): return sum(map(lambda s:list(s).count('%'), fmt.split('%%'))) .. tupreq('%s this %(x)s not %% but %s')


(if it works in general ;-)

Or maybe clearer and faster:
def tupreq(fmt): return sum(1 for c in fmt.replace('%%','') if c=='%') ... tupreq('%s this %(x)s not %% but %s')

3

Regards,
Bengt Richter
Jul 19 '05 #6
Bengt Richter wrote:
Parse might be a big word for
>> def tupreq(fmt): return sum(map(lambda s:list(s).count('%'),
>> fmt.split('%%'))) .. >> tupreq('%s this %(x)s not %% but %s')
(if it works in general ;-)
Which it doesn't:
def tupreq(fmt): return sum(map(lambda s:list(s).count('%'), fmt.split('%%')))
.... fmt = "%*d"
fmt % ((1,) * tupreq(fmt)) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: not enough arguments for format string
Or maybe clearer and faster:
>>> def tupreq(fmt): return sum(1 for c in fmt.replace('%%','') if
>>> c=='%') ... >>> tupreq('%s this %(x)s not %% but %s')

3


Mixed formats show some "interesting" behaviour:
"%s %(x)s" % (1,2) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: format requires a mapping class D: .... def __getitem__(self, key):
.... return "D[%s]" % key
.... "%s %(x)s" % D() '<__main__.D instance at 0x402aaf2c> D[x]' "%s %(x)s %s" % D() Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: not enough arguments for format string "%s %(x)s %(y)s" % D()

'<__main__.D instance at 0x402aad8c> D[x] D[y]'

That is as far as I got. So under what circumstances is
'%s this %(x)s not %% but %s' a valid format string?

Peter

Jul 19 '05 #7
On Wed, 20 Apr 2005 11:01:28 +0200, Peter Otten <__*******@web.de> wrote:
Bengt Richter wrote:
Parse might be a big word for
>> def tupreq(fmt): return sum(map(lambda s:list(s).count('%'),
>> fmt.split('%%'))) ..
>> tupreq('%s this %(x)s not %% but %s')


(if it works in general ;-)


Which it doesn't:

D'oh. (My subconscious knew that one, and prompted the "if" ;-)
def tupreq(fmt): return sum(map(lambda s:list(s).count('%'),fmt.split('%%')))
... fmt = "%*d"
fmt % ((1,) * tupreq(fmt))Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: not enough arguments for format string
But that one it totally spaced on ;-/
Or maybe clearer and faster:
>>> def tupreq(fmt): return sum(1 for c in fmt.replace('%%','') if
>>> c=='%')

...
>>> tupreq('%s this %(x)s not %% but %s')

3

Mixed formats show some "interesting" behaviour:
"%s %(x)s" % (1,2)Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: format requires a mapping class D:... def __getitem__(self, key):
... return "D[%s]" % key
... "%s %(x)s" % D()'<__main__.D instance at 0x402aaf2c> D[x]' "%s %(x)s %s" % D()Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: not enough arguments for format string "%s %(x)s %(y)s" % D()

'<__main__.D instance at 0x402aad8c> D[x] D[y]'

That is as far as I got. So under what circumstances is
'%s this %(x)s not %% but %s' a valid format string?

Yeah, I got that far too, some time ago playing % mapping, and
I thought they just didn't allow for mixed formats. My thought then
was that they could pass integer positional keys to another method
(say __format__) on a mapping object that wants to handle mixed formats.
If you wanted the normal str or repr resprensentation of a mapping
object that had a __format__ method, you'd have to do it on the args
side with str(theobject), but you'd have a way. And normal mapping objects
would need no special handling for "%s' in a mixed format context.

Regards,
Bengt Richter
Jul 19 '05 #8
Bengt Richter wrote:
On Wed, 20 Apr 2005 11:01:28 +0200, Peter Otten <__*******@web.de> wrote:

....
>"%s %(x)s %(y)s" % D()
My experiments suggest that you can have a maximum of one unnamed argument in a
mapping template - this unnamed value evaluates to the map itself
...
So under what circumstances is
'%s this %(x)s not %% but %s' a valid format string?


Based on the above experiments, never.

I have wrapped up my current understanding in the following class:
s = StringFormatInfo('%s %*.*d %*s')
s POSITIONAL Template: %s %*.*d %*s
Arguments: ('s', 'width', 'precision', 'd', 'width', 's')
s = StringFormatInfo('%(arg1)s %% %(arg2).*f %()s %s')
s MAPPING Template: %(arg1)s %% %(arg2).*f %()s %s
Arguments: {'': 's', 'arg1': 's', 'arg2': 'f', None: 's'}


class StringFormatInfo(object):
parse_format = re.compile(r'''
\% # placeholder
(?:\((?P<name>[\w]*)\))? # 0 or 1 named groups
(?P<conversion>[\#0\-\+]?) # 0 or 1 conversion flags
(?P<width>[\d]* | \*) # optional minimum conversion width
(?:.(?P<precision>[\d]+ | \*))? # optional precision
(?P<lengthmodifier>[hlL]?) # optional length modifier
(?P<type>[diouxXeEfFgGcrs]{1}) # conversion type - note %% omitted
''',
re.VERBOSE
)

"""Wraps a template string and provides information about the number and
kinds of arguments that must be supplied. Call with % to apply the
template to data"""

def __init__(self, template):
self.template = template
self.formats = formats = [m.groupdict() for m in
self.parse_format.finditer(template)]

for format in formats:
if format['name']:
self.format_type = "MAPPING"
self.format_names = dict((format['name'], format['type'])
for format in formats)
break
else:
self.format_type = "POSITIONAL"
format_names = []
for format in formats:
if format['width'] == '*':
format_names.append('width')
if format['precision'] == '*':
format_names.append('precision')
format_names.append(format['type'])
self.format_names = tuple(format_names)

def __mod__(self, values):
return self.template % values

def __repr__(self):
return "%s Template: %s\nArguments: %s" % \
(self.format_type, self.template, self.format_names)

Michael

Jul 19 '05 #9
Michael Spencer wrote:
I have wrapped up my current understanding in the following class:


I see you assume that only \w+ can fit inside of a %()
in a format string. The actual Python code allows anything
up to the balanced closed parens.
class Show: .... def __getitem__(self, text):
.... print "Want", repr(text)
.... "%(this(is)a.--test!)s" % Show() Want 'this(is)a.--test!'
'None'


I found this useful for a templating library I once wrote
that allowed operations through a simple pipeline, like

%(doc.text|reformat(68)|indent(4))s

Andrew
da***@dalkescientific.com

Jul 19 '05 #10
Andrew Dalke wrote:
I see you assume that only \w+ can fit inside of a %()
in a format string. The actual Python code allows anything
up to the balanced closed parens.

Gah! I guess that torpedoes the regexp approach, then.

Thanks for looking at this

Michael

Jul 19 '05 #11
Steve Holden wrote:
Michael Spencer wrote:
Andrew Dalke wrote:
I see you assume that only \w+ can fit inside of a %()
in a format string. The actual Python code allows anything
up to the balanced closed parens.

Gah! I guess that torpedoes the regexp approach, then.

Thanks for looking at this

Michael

While Andrew may have found the "fatal flaw" in your scheme, it's worth
pointing out that it works just fine for my original use case.

regards
Steve


Thanks. Here's a version that overcomes the 'fatal' flaw.

class StringFormatInfo(object):

def __init__(self, template):
self.template = template
self.parse()

def tokenizer(self):
lexer = TinyLexer(self.template)
self.format_type = "POSITIONAL"
while lexer.search("\%"):
if lexer.match("\%"):
continue
format = {}
name = lexer.takeparens()
if name is not None:
self.format_type = "MAPPING"
format['name'] = name
format['conversion'] = lexer.match("[\#0\-\+]")
format['width'] = lexer.match("\d+|\*")
format['precision'] = lexer.match("\.") and \
lexer.match("\d+|\*") or None
format['lengthmodifier'] = lexer.match("[hlL]")
ftype = lexer.match('[diouxXeEfFgGcrs]')
if not ftype:
raise ValueError
else:
format['type'] = ftype
yield format

def parse(self):
self.formats = formats = list(self.tokenizer())
if self.format_type == "MAPPING":
self.format_names = dict((format['name'], format['type'])
for format in formats)
else:
format_names = []
for format in formats:
if format['width'] == '*':
format_names.append('width')
if format['precision'] == '*':
format_names.append('precision')
format_names.append(format['type'])
self.format_names = tuple(format_names)

def __mod__(self, values):
return self.template % values

def __repr__(self):
return "%s Template: %s\nArguments: %s" % \
(self.format_type, self.template, self.format_names)
__str__ = __repr__

SFI = StringFormatInfo

def tests():
print SFI('%(arg1)s %% %(arg2).*f %()s %s')
print SFI('%s %*.*d %*s')
print SFI('%(this(is)a.--test!)s')
import re

class TinyLexer(object):
def __init__(self, text):
self.text = text
self.ptr = 0
self.len = len(text)
self.re_cache = {}

def match(self, regexp, consume = True, anchor = True):
if isinstance(regexp, basestring):
cache = self.re_cache
if regexp not in cache:
cache[regexp] = re.compile(regexp)
regexp = cache[regexp]
matcher = anchor and regexp.match or regexp.search
match = matcher(self.text, self.ptr)
if not match:
return None
if consume:
self.ptr = match.end()
return match.group()

def search(self, regexp, consume = True):
return self.match(regexp, consume=True, anchor=False)

def takeparens(self):
start = self.ptr
if self.text[start] != '(':
return None
out = ''
level = 1
self.ptr += 1
while self.ptr < self.len:
nextchar = self.text[self.ptr]
level += (nextchar == '(') - (nextchar == ')')
self.ptr += 1
if level == 0:
return out
out += nextchar
raise ValueError, "Unmatched parentheses"

Jul 19 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Jouke Langhout | last post by:
Hello all! For quite some time now, I've got the following problem: Access won't close properly when a user closes the application. An ACCESS process stays active and that process can only be...
7
by: BBFrost | last post by:
I'm receiving decimal values from database queries and placing them on a report page. The users want to see the following .... Db Value Display Value 123.3400 123.34...
4
by: Robert Manookian | last post by:
How do you format strings? i.e. In VB6: Format("AB34567", "@@@@@-@@") = "AB345-67" In .Net: ????????
2
by: David Veeneman | last post by:
How does one format a date column in a GridView control? I had assumed that the DataFormat string would do it, but MSDN only shows numeric formatting codes. Can dates be formatted using that...
4
by: Peter Newman | last post by:
the data input app im writing has some 30 + input fields and i want to be able to format them. I know i can use the .validate on each textbox and format the 'string' however this require loads...
11
by: Dustan | last post by:
Is there any builtin function or module with a function similar to my made-up, not-written deformat function as follows? I can't imagine it would be too easy to write, but possible... 'I am...
9
by: john coltrane | last post by:
Is there way to create a formatted string in a similar that is similar to sprintf? The same for printing, printf? C,D,E,F,G,N,X for currency, decimal, exponential, fixed, general, numerical,...
6
by: Tomasz J | last post by:
Hello developers, I bind my TextBox control specyfying a format stored in my application global ApplicationContext object - it has a static string CurrencyFormat property. The problem - this...
2
by: Jean-Paul Calderone | last post by:
On Fri, 5 Sep 2008 14:24:16 -0500, Robert Dailey <rcdailey@gmail.comwrote: mystring = ( "This is a very long string that " "spans multiple lines and does " "not include line breaks or tabs "...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.