By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,199 Members | 1,077 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,199 IT Pros & Developers. It's quick & easy.

Reverse string-formatting (maybe?)

P: n/a
Is there any builtin function or module with a function similar to my
made-up, not-written deformat function as follows? I can't imagine it
would be too easy to write, but possible...
>>template = 'I am %s, and he %s last %s.'
values = ('coding', "coded', 'week')
formatted = template % values
formatted
'I am coding, and he coded last week.'
>>deformat(formatted, template)
('coding', 'coded', 'week')

expanded (for better visual):
>>deformat('I am coding, and he coded last week.', 'I am %s, and he %s last %s.')
('coding', 'coded', 'week')

It would return a tuple of strings, since it has no way of telling what
the original type of each item was.
Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.

Oct 14 '06 #1
Share this Question
Share on Google+
11 Replies


P: n/a
Dustan wrote:
Is there any builtin function or module with a function similar to my
made-up, not-written deformat function as follows? I can't imagine it
would be too easy to write, but possible...
>>>template = 'I am %s, and he %s last %s.'
values = ('coding', "coded', 'week')
formatted = template % values
formatted
'I am coding, and he coded last week.'
>>>deformat(formatted, template)
('coding', 'coded', 'week')

expanded (for better visual):
>>>deformat('I am coding, and he coded last week.', 'I am %s, and he %s
last %s.')
('coding', 'coded', 'week')

It would return a tuple of strings, since it has no way of telling what
the original type of each item was.
Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.
Simple, but unreliable:
>>import re
template = "I am %s, and he %s last %s."
values = ("coding", "coded", "week")
formatted = template % values
def deformat(formatted, template):
.... r = re.compile("(.*)".join(template.split("%s")))
.... return r.match(formatted).groups()
....
>>deformat(formatted, template)
('coding', 'coded', 'week')

Peter
Oct 14 '06 #2

P: n/a
>>>template = 'I am %s, and he %s last %s.'
>>>values = ('coding', "coded', 'week')
formatted = template % values
formatted
'I am coding, and he coded last week.'
>>>deformat(formatted, template)
('coding', 'coded', 'week')

expanded (for better visual):
>>>deformat('I am coding, and he coded last week.', 'I am %s, and he %s last %s.')
('coding', 'coded', 'week')

It would return a tuple of strings, since it has no way of telling what
the original type of each item was.

Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.

Yes, in the trivial case you provide, it can be done fairly
easily using the re module:
>>import re
template = 'I am %s, and he %s last %s.'
values = ('coding', 'coded', 'week')
formatted = template % values
unformat_re = re.escape(template).replace('%s', '(.*)')
# unformat_re = unformat_re.replace('%i', '([0-9]+)')
r = re.compile(unformat_re)
r.match(formatted).groups()
('coding', 'coded', 'week')

Thing's get crazier when you have things like
>>answer ='format values into a string'
template = 'The formatting string %%s is used to %s' % answer
or
>>template = 'The value is %0*.*f'
values = (10, 4, 3.14159)
formatted = template % values
formated
'The value is 00003.1415'

or
>>template = 'Dear %(name)s, Thank you for the %(gift)s. It
was very %(adj).' % {'name': 'Grandma', 'gift': 'sweater', 'adj':
'nice'}

Additionally, things go a little tangled when the replacement
values duplicate matters in the template. Should the unformatting
of "I am tired, and he didn't last last All Saint's Day" be
parsed as ('tired', "didn't last", "All Saint's Day") or
('tired', "didn't", "last All Saint's Day"). The /intent/ is
likely the former, but getting a computer to understand intent is
a non-trivial task ;)

Just a few early-morning thoughts...

-tkc


Oct 14 '06 #3

P: n/a
>>>template = 'I am %s, and he %s last %s.'
>>>values = ('coding', "coded', 'week')
formatted = template % values
formatted
'I am coding, and he coded last week.'
>>>deformat(formatted, template)
('coding', 'coded', 'week')

expanded (for better visual):
>>>deformat('I am coding, and he coded last week.', 'I am %s, and he %s last %s.')
('coding', 'coded', 'week')

It would return a tuple of strings, since it has no way of telling what
the original type of each item was.

Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.

Yes, in the trivial case you provide, it can be done fairly
easily using the re module:
>>import re
template = 'I am %s, and he %s last %s.'
values = ('coding', 'coded', 'week')
formatted = template % values
unformat_re = re.escape(template).replace('%s', '(.*)')
# unformat_re = unformat_re.replace('%i', '([0-9]+)')
r = re.compile(unformat_re)
r.match(formatted).groups()
('coding', 'coded', 'week')

Thing's get crazier when you have things like
>>answer ='format values into a string'
template = 'The formatting string %%s is used to %s' % answer
or
>>template = 'The value is %0*.*f'
values = (10, 4, 3.14159)
formatted = template % values
formated
'The value is 00003.1415'

or
>>template = 'Dear %(name)s, Thank you for the %(gift)s. It
was very %(adj).' % {'name': 'Grandma', 'gift': 'sweater', 'adj':
'nice'}

Additionally, things go a little tangled when the replacement
values duplicate matters in the template. Should the unformatting
of "I am tired, and he didn't last last All Saint's Day" be
parsed as ('tired', "didn't last", "All Saint's Day") or
('tired', "didn't", "last All Saint's Day"). The /intent/ is
likely the former, but getting a computer to understand intent is
a non-trivial task ;)

Just a few early-morning thoughts...

-tkc


Oct 14 '06 #4

P: n/a

Peter Otten wrote:
Dustan wrote:
Is there any builtin function or module with a function similar to my
made-up, not-written deformat function as follows? I can't imagine it
would be too easy to write, but possible...
>>template = 'I am %s, and he %s last %s.'
values = ('coding', "coded', 'week')
formatted = template % values
formatted
'I am coding, and he coded last week.'
>>deformat(formatted, template)
('coding', 'coded', 'week')

expanded (for better visual):
>>deformat('I am coding, and he coded last week.', 'I am %s, and he %s
last %s.')
('coding', 'coded', 'week')

It would return a tuple of strings, since it has no way of telling what
the original type of each item was.
Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.

Simple, but unreliable:
>import re
template = "I am %s, and he %s last %s."
values = ("coding", "coded", "week")
formatted = template % values
def deformat(formatted, template):
... r = re.compile("(.*)".join(template.split("%s")))
... return r.match(formatted).groups()
...
>deformat(formatted, template)
('coding', 'coded', 'week')

Peter
Trying to figure out the 'unreliable' part of your statement...

I'm sure 2 '%s' characters in a row would be a bad idea, and if you
have similar expressions for the '%s' characters within as well as in
the neighborhood of the '%s', that would cause difficulty. Is there any
other reason it might not work properly?

My template outside of the '%s' characters contains only commas and
spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?

Oct 15 '06 #5

P: n/a
My template outside of the '%s' characters contains only commas and
spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?
Given this new (key) information along with the assumption that
you're doing straight string replacement (not dictionary
replacement of the form "%(key)s" or other non-string types such
as "%05.2f"), then yes, a reversal is possible. To make it more
explicit, one would do something like
>>template = '%s, %s, %s'
values = ('Tom', 'Dick', 'Harry')
formatted = template % values
import re
unformat_string = template.replace('%s', '([^, ]+)')
unformatter = re.compile(unformat_string)
extracted_values = unformatter.search(formatted).groups()
using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".

-tkc


Oct 15 '06 #6

P: n/a

Tim Chase wrote:
My template outside of the '%s' characters contains only commas and
spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?

Given this new (key) information along with the assumption that
you're doing straight string replacement (not dictionary
replacement of the form "%(key)s" or other non-string types such
as "%05.2f"), then yes, a reversal is possible. To make it more
explicit, one would do something like
>>template = '%s, %s, %s'
>>values = ('Tom', 'Dick', 'Harry')
>>formatted = template % values
>>import re
>>unformat_string = template.replace('%s', '([^, ]+)')
>>unformatter = re.compile(unformat_string)
>>extracted_values = unformatter.search(formatted).groups()

using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".

-tkc
Thanks.

One more thing (I forgot to mention this other situation earlier)
The %s characters are ints, and outside can be anything except int
characters. I do have one situation of '%s%s%s', but I can change it to
'%s', and change the output into the needed output, so that's not
important. Think something along the lines of "abckdaldj iweo%s
qwierxcnv !%sjd".

Oct 15 '06 #7

P: n/a

Dustan wrote:
Tim Chase wrote:
My template outside of the '%s' characters contains only commas and
spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?
Given this new (key) information along with the assumption that
you're doing straight string replacement (not dictionary
replacement of the form "%(key)s" or other non-string types such
as "%05.2f"), then yes, a reversal is possible. To make it more
explicit, one would do something like
>>template = '%s, %s, %s'
>>values = ('Tom', 'Dick', 'Harry')
>>formatted = template % values
>>import re
>>unformat_string = template.replace('%s', '([^, ]+)')
>>unformatter = re.compile(unformat_string)
>>extracted_values = unformatter.search(formatted).groups()
using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".

-tkc

Thanks.

One more thing (I forgot to mention this other situation earlier)
The %s characters are ints, and outside can be anything except int
characters. I do have one situation of '%s%s%s', but I can change it to
'%s', and change the output into the needed output, so that's not
important. Think something along the lines of "abckdaldj iweo%s
qwierxcnv !%sjd".
That was written in haste. All the information is true. The question:
I've already created a function to do this, using your original
deformat function. Is there any way in which it might go wrong?

Oct 15 '06 #8

P: n/a

Dustan wrote:
Dustan wrote:
Tim Chase wrote:
My template outside of the '%s' characters contains only commas and
spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?
>
Given this new (key) information along with the assumption that
you're doing straight string replacement (not dictionary
replacement of the form "%(key)s" or other non-string types such
as "%05.2f"), then yes, a reversal is possible. To make it more
explicit, one would do something like
>
>>template = '%s, %s, %s'
>>values = ('Tom', 'Dick', 'Harry')
>>formatted = template % values
>>import re
>>unformat_string = template.replace('%s', '([^, ]+)')
>>unformatter = re.compile(unformat_string)
>>extracted_values = unformatter.search(formatted).groups()
>
using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".
>
-tkc
Thanks.

One more thing (I forgot to mention this other situation earlier)
The %s characters are ints, and outside can be anything except int
characters. I do have one situation of '%s%s%s', but I can change it to
'%s', and change the output into the needed output, so that's not
important. Think something along the lines of "abckdaldj iweo%s
qwierxcnv !%sjd".

That was written in haste. All the information is true. The question:
I've already created a function to do this, using your original
deformat function. Is there any way in which it might go wrong?
Again, haste. I used Peter's deformat function.

Oct 15 '06 #9

P: n/a
On 14 Oct 2006 05:35:02 -0700,
"Dustan" <Du**********@gmail.comwrote:
Is there any builtin function or module with a function similar to my
made-up, not-written deformat function as follows? I can't imagine it
would be too easy to write, but possible...
[ snip ]
Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.
Track down pyscanf. (Google is your friend, but I can't find any sort
of licensing/copyright information, and the web addresses in the source
code aren't available, so I hesitate to post my ancient copy.)

HTH,
Dan

--
Dan Sommers
<http://www.tombstonezero.net/dan/>
"I wish people would die in alphabetical order." -- My wife, the genealogist
Oct 15 '06 #10

P: n/a
>> >>template = '%s, %s, %s'
>> >>values = ('Tom', 'Dick', 'Harry')
>>formatted = template % values
>>import re
>>unformat_string = template.replace('%s', '([^, ]+)')
>>unformatter = re.compile(unformat_string)
>>extracted_values = unformatter.search(formatted).groups()

using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".

One more thing (I forgot to mention this other situation earlier)
The %s characters are ints, and outside can be anything except int
characters. I do have one situation of '%s%s%s', but I can change it to
'%s', and change the output into the needed output, so that's not
important. Think something along the lines of "abckdaldj iweo%s
qwierxcnv !%sjd".

That was written in haste. All the information is true. The question:
I've already created a function to do this, using your original
deformat function. Is there any way in which it might go wrong?
Only you know what anomalies will be found in your data-sets. If
you know/assert that

-the only stuff in the formatting string is one set of characters

-that stuff in the replacement-values can never include any of
your format-string characters

-that you're not using funky characters/formatting in your format
string (such as "%%" possibly followed by an "s" to get the
resulting text of "%s" after formatting, or trying to use other
formatters such as the aforementioned "%f" or possibly "%i")

then you should be safe. It could also be possible (with my
original replacement of "(.*)") if your values will never include
any substring of your format string. If you can't guarantee
these conditions, you're trying to make a cow out of hamburger.
Or a pig out of sausage. Or a whatever out of a hotdog. :)

Conventional wisdom would tell you to create a test-suite of
format-strings and sample values (preferably worst-case funkiness
in your expected format-strings/values), and then have a test
function that will assert that the unformatting of every
formatted string in the set returns the same set of values that
went in. Something like

tests = {
'I was %s but now I am %s' : [
('hot', 'cold'),
('young', 'old'),
],
'He has 3 %s and 2 %s' : [
('brothers', 'sisters'),
('cats', 'dogs')
]
}

for format_string, values in tests:
unformatter = format.replace('%s', '(.*)')
for value_tuple in values:
formatted = format_string % value_tuple
unformatted = unformatter.search(formatted).groups()
if unformatted <value_tuple:
print "%s doesn't match %s when unformatting %s" % (
unformatted,
value_tuple
format_string)

-tkc


Oct 15 '06 #11

P: n/a
Only you know what anomalies will be found in your data-sets. If
you know/assert that

-the only stuff in the formatting string is one set of characters

-that stuff in the replacement-values can never include any of
your format-string characters

-that you're not using funky characters/formatting in your format
string (such as "%%" possibly followed by an "s" to get the
resulting text of "%s" after formatting, or trying to use other
formatters such as the aforementioned "%f" or possibly "%i")

then you should be safe. It could also be possible (with my
original replacement of "(.*)") if your values will never include
any substring of your format string. If you can't guarantee
these conditions, you're trying to make a cow out of hamburger.
Or a pig out of sausage. Or a whatever out of a hotdog. :)

Conventional wisdom would tell you to create a test-suite of
format-strings and sample values (preferably worst-case funkiness
in your expected format-strings/values), and then have a test
function that will assert that the unformatting of every
formatted string in the set returns the same set of values that
went in. Something like

tests = {
'I was %s but now I am %s' : [
('hot', 'cold'),
('young', 'old'),
],
'He has 3 %s and 2 %s' : [
('brothers', 'sisters'),
('cats', 'dogs')
]
}

for format_string, values in tests:
unformatter = format.replace('%s', '(.*)')
for value_tuple in values:
formatted = format_string % value_tuple
unformatted = unformatter.search(formatted).groups()
if unformatted <value_tuple:
print "%s doesn't match %s when unformatting %s" % (
unformatted,
value_tuple
format_string)

-tkc
Thanks for all your help. I've gotten the idea.

Oct 15 '06 #12

This discussion thread is closed

Replies have been disabled for this discussion.