Problem with Lexical Scope

jslowery

I am not completely knowledgable about the status of lexical scoping in
Python, but it was my understanding that this was added in a long time
ago around python2.1-python2.2

I am using python2.4 and the following code throws a "status variable"
not found in the inner-most function, even when I try to "global" it.

def collect(fields, reducer):
def rule(record):
status = True
def _(x, y):
cstat = reducer(x, y)
if status and not cstat:
status = False
return y
return reduce(_, [record[field] for field in fields])
return rule

What gives?

Dec 12 '05 #1

Subscribe Reply

1762

jslowery

Well, I think I found a nasty little hack to get around it, but I still
don't see why it doesn't work in the regular way.

def collect(fields, reducer):
def rule(record):
# Nasty hack b/c can't get lexical scoping of status to work
status = [True]
def _(x, y, s=status):
cstat = reducer(x, y)
if s[0] and not cstat:
s[0] = False
return y
reduce(_, [record[field] for field in fields])
return status[0]
return rule

Dec 12 '05 #2

Steve Holden

js******@gmail.com wrote:

I am not completely knowledgable about the status of lexical scoping in
Python, but it was my understanding that this was added in a long time
ago around python2.1-python2.2

I am using python2.4 and the following code throws a "status variable"
not found in the inner-most function, even when I try to "global" it.

def collect(fields, reducer):
def rule(record):
status = True
def _(x, y):
cstat = reducer(x, y)
if status and not cstat:
status = False
return y
return reduce(_, [record[field] for field in fields])
return rule

What gives?

What's happening is that the interpreter, when it compiles the inner
function _(), sees an assignment to status and so assumes it is local to
the _() function. Consequently, since you reference it inside _() before
assignment you get (I presume) an exception reporting an unbound local
variable.

The scoping rules do work when you obey them:

def f1(a, b): ... s = a+b
... def _(x):
... return s+x
... return _
... func = f1(12, 13)
func(10) 35

Here the nested lexical scopes rule isn't that helpful given the
overriding nature of assignment within an inner scope. Using global will
simply put the status variable at *module* scope, which also isn't what
you want.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Dec 12 '05 #3

jslowery

That does make sense. So there is no way to modify a variable in an
outer lexical scope? Is the use of a mutable data type the only way?

I'm trying to think of a case that would create semantic ambiguity when
being able to modify a variable reference in an outer scope, but I
cannot (which is probably short-sighted). Is this behavior correct
because of performance or perhaps because of a certain design of the
interpreter?

Dec 12 '05 #4

Duncan Booth

js******@gmail.com wrote:

I am using python2.4 and the following code throws a "status variable"
not found in the inner-most function, even when I try to "global" it.

def collect(fields, reducer):
def rule(record):
status = True
def _(x, y):
cstat = reducer(x, y)
if status and not cstat:
status = False
return y
return reduce(_, [record[field] for field in fields])
return rule

What gives?

You rebind status in the inner function. Binding to a name makes it local
to that function (unless declared global). A name in an enclosing function
is neither local nor global so you cannot rebind it.

Ways round this include storing the status in a mutable value, making it an
attribute of a class, or rewriting the code to not require setting an outer
variable in the first place.

The last of these is the easiest in this case: status is never actually
used or returned anywhere so you could just remove all references to it
with no impact on the code. In fact, so long as reducer has no side
effects, you could just do this:

def collect(fields, reducer):
def rule(record):
if fields:
return record[fields[-1]]
return rule

Alternatively, if I assume you actually wanted rule to return the status as
well as y, then the outer assignment disappears quite easily:

def collect(fields, reducer):
def rule(record):
def _((x, status), y):
cstat = reducer(x, y)
return (y, status and cstat)
return reduce(_, [record[field] for field in fields], (0, True))
return rule

If reducer has no side effects this can be further reduced:

def collect(fields, reducer):
def rule(record):
def _((x, status), y):
return (y, status and reducer(x,y))
return reduce(_, [record[field] for field in fields], (0, True))
return rule

Given that the final y returned is simply the last record[field] maybe you
only wanted the status value? In that case expanding the reduce and _ into
inline code is likely to make things clearer:

def collect(fields, reducer):
def rule(record):
prev = 0
for field in fields:
if not reducer(prev, record[field]):
return False
prev = record[field]
return True
return rule

Dec 12 '05 #5

jslowery

Thanks for the example code. Definately provided a few different ways
of doing the construction.

Actually, the status variable was the only thing that mattered for the
inner function. The first code post had a bug b/c I was first just
trying to use reduce. However, I realized that the boolean reducer
ended up using the boolean result instead of the next field for
subsequent operations.

The effect I was going for was was a logical AND of all the returned
values from reducer. However the inner function is supposed to use the
same type algorithm as the built-in reduce function.

reducer does have no side effects so I suppose short-circuting it would
be the best thing. I think the only thing about the last example is
that it starts things off with a zero. I think that would boink it.

The way that this thing works is it's a general algorithm for boolean
operations that are applied to fields in a dict.

def lt(*fields):
return collect(fields, lambda x, y: x < y)

data = {
'birth_date' : 19740201,
'hire_date' : 19840721,
'fire_date' : 19850123
}

rule = lt('birth_date', 'hire_date')
assert rule(data) == True

Dec 12 '05 #6

bonono

js******@gmail.com wrote:

That does make sense. So there is no way to modify a variable in an
outer lexical scope? Is the use of a mutable data type the only way?

I'm trying to think of a case that would create semantic ambiguity when
being able to modify a variable reference in an outer scope, but I
cannot (which is probably short-sighted). Is this behavior correct
because of performance or perhaps because of a certain design of the
interpreter?

That is by design and IMO a good thing as you don't need to declare
things(the '=' sort of done it for you) in python. I think you can do
something like globals()['s'] but that is ugly and should be
discouraged.

Just use a mutable object like s=[1] or a Ref class or whatever similar
implementation if you need this "shared write" functionality. Just
reading is perfectly fine.

Dec 12 '05 #7

bonono

js******@gmail.com wrote:

Thanks for the example code. Definately provided a few different ways
of doing the construction.

Actually, the status variable was the only thing that mattered for the
inner function. The first code post had a bug b/c I was first just
trying to use reduce. However, I realized that the boolean reducer
ended up using the boolean result instead of the next field for
subsequent operations.

The effect I was going for was was a logical AND of all the returned
values from reducer. However the inner function is supposed to use the
same type algorithm as the built-in reduce function.

reducer does have no side effects so I suppose short-circuting it would
be the best thing. I think the only thing about the last example is
that it starts things off with a zero. I think that would boink it.

The way that this thing works is it's a general algorithm for boolean
operations that are applied to fields in a dict.

def lt(*fields):
return collect(fields, lambda x, y: x < y)

data = {
'birth_date' : 19740201,
'hire_date' : 19840721,
'fire_date' : 19850123
}

rule = lt('birth_date', 'hire_date')
assert rule(data) == True

I usually solve this kind of thing by giving reduce a tuple(or n-uples)
as the accumulator then factor out only the needed final result sans
these "temp variables".

Dec 12 '05 #8

Bengt Richter

On Mon, 12 Dec 2005 08:26:59 +0000, Steve Holden <st***@holdenweb.com> wrote:
[...]

The scoping rules do work when you obey them:
>>> def f1(a, b): ... s = a+b
... def _(x):
... return s+x
... return _
... >>> func = f1(12, 13)
>>> func(10)35 >>>

Here the nested lexical scopes rule isn't that helpful given the
overriding nature of assignment within an inner scope. Using global will
simply put the status variable at *module* scope, which also isn't what
you want.

Is external rebindability being considered at all ?

what if it were a new builtin that could do it in C, e.g. rebind(targetname, expr)
where target could be in any enclosing scope (including globals(),
but not extending to __builtins__) and it would be a NameError if target
didn't exist. Thus this would be possible (untested and currently impossible ;-)

def mkbumper(initval, incr):
def _():
rebind('initval', initval+incr) # vs initval := initval+incr
return initval
return _

bump3 = mkbumper(8, 3)
bump3() => 11
bump3() => 14

I wonder if a function would fly better than a punctuation tweak on
bare name assignment ;-)

Regards,
Bengt Richter

Dec 12 '05 #9

Duncan Booth

js******@gmail.com wrote:

reducer does have no side effects so I suppose short-circuting it would
be the best thing. I think the only thing about the last example is
that it starts things off with a zero. I think that would boink it.

In that case, and assuming that fields contains at least one entry:

def collect(fields, reducer):
def rule(record):
f = iter(fields)
prev = record[f.next()]
for field in f:
if not reducer(prev, record[field]):
return False
prev = record[field]
return True
return rule

Dec 12 '05 #10

Bengt Richter

On 12 Dec 2005 01:23:48 -0800, "js******@gmail.com" <js******@gmail.com> wrote:

Thanks for the example code. Definately provided a few different ways
of doing the construction.

Actually, the status variable was the only thing that mattered for the
inner function. The first code post had a bug b/c I was first just
trying to use reduce. However, I realized that the boolean reducer
ended up using the boolean result instead of the next field for
subsequent operations.

The effect I was going for was was a logical AND of all the returned
values from reducer. However the inner function is supposed to use the
same type algorithm as the built-in reduce function.

reducer does have no side effects so I suppose short-circuting it would
be the best thing. I think the only thing about the last example is
that it starts things off with a zero. I think that would boink it.

The way that this thing works is it's a general algorithm for boolean
operations that are applied to fields in a dict.

def lt(*fields):
return collect(fields, lambda x, y: x < y)

data = {
'birth_date' : 19740201,
'hire_date' : 19840721,
'fire_date' : 19850123
}

rule = lt('birth_date', 'hire_date')
assert rule(data) == True

I suspect you are trying to win an obfuscation contest ;-)
It appears that in this case you want lt to have the same effect
as if it were written

def lt(*fields): ... def _(dct):
... return dct[fields[0]] < dct[fields[1]]
... return _
... data = { ... 'birth_date' : 19740201,
... 'hire_date' : 19840721,
... 'fire_date' : 19850123
... }
rule = lt('birth_date', 'hire_date')
rule <function _ at 0x02F986BC> rule(data) True assert rule(data)
assert rule(data) == True

But why do you need to "collect" with reduce and all that cryptic cruft?
I think we need a use case where you have other than two fields to operate on.
but in that case, what would reduce do? Something seems bogus with that.

I'm guessing you want to loop through a bunch of "records" that are extracted from
a database in the form of dicts, and you want to check them in various ways.
So you want to extract fields in given order and apply various checks, something like (untested)

for data in dbrecordsupply:
for fields, check, msg in record_supply_checklist:
assert check(*[data[field] for field in fields]), msg

where record_supply_checklist is something like (untested) (and eschewing lambdas for a change ;-)

def before(x, y): return x<y
def before_now(x): return x < time.time()
def during_company_existence(x): return company_founding_time <= x < time.time()

record_supply_checklist = [
(['birth_date', 'hire_date'], before, 'born at or after hire),
(['birth_date'], before_now, 'not yet born'),
(['hire_date'], during_company_existence, 'bogus hire date')
# etc ...
]

I suppose different checklists would go with different recordsupply's.
Is this too simple? Or have I misread between the lines? ;-)
(Of course you might want to log stuff instead of blowing up on the first assert.)

Regards,
Bengt Richter

Dec 12 '05 #11

jslowery

Well, the the comparison operations are just a special case really.I
don't know about the obfuscation contest. I've attempted to make an
extensible library.

def lt(*fields):
return collect(fields, lambda x, y: x < y)

def gt(*fields):
return collect(fields, lambda x, y: x > y)

def gte(*fields):
""" gte(field, ...) -> rule
"""
return collect(fields, lambda x, y: x >= y)

etc...

The purpose of writing the collect function was just to be able to
easily apply the logic to whatever function was wanted.

the added ability is to be able to take as many arguments as it wants.

hire_rule = lt('birth_date', 'hire_date', 'fire_date')
cube_rule = eq('height', 'width', 'depth')

The reason that it's written like this is because I'm attempting to
make things as declarative as possible. I could use classes and
override __special__ operators and things like that but just applying
rules using prefix notation seems to work alright.

I've basically got a simple call signature...takes dict returns bool
that plugs into the little validation routine.

It turns out that envoking the rules is easistly expressed with a
closure, though I might switch to types if it needs that added
complexity.

But your assumption is correct, it's for dictionaries of data to be
validated. I haven't decided if I want to take advantage of the dict's
mutability to include formating as well.

Here's a bit from my unit test.

rule = when(all(have('length'), have('width')),
check(['length', 'width'], lambda x, y: x == y))
assert rule({'length' : '2', 'width' : '2'}) == True
assert rule({'length' : '2', 'width' : '1'}) == False
assert rule({'length' : '1', 'width' : '2'}) == False

I've also got a "collectable" equality function so that can be shown
as.

box_rule = when(all(have('length'), have('width')), eq('length',
'width'))

Which basically means, if we have both a length and a width, then be
sure they are equal.

Of course, there is a little function that performs a conjunction of a
complete list of rules on a dict and returns the rules that failed.

I've also got a little adapter that translates functions that take a
string and returns bool into one that fits the call signature called
match.

match(is_ssn, 'social-security_number')

Like I said, it may be considered more readable if using operator
overloading so that it uses python's native syntax. Considering the
added complexity, I don't know if it would be worth it. I'd probably
put a simple little declarative language on top of it to translate the
function calls before that.

Dec 16 '05 #12

Bengt Richter

On 15 Dec 2005 18:45:17 -0800, "js******@gmail.com" <js******@gmail.com> wrote:

Well, the the comparison operations are just a special case really.I
don't know about the obfuscation contest. I've attempted to make an
extensible library. Sorry about "obfuscation contest," I just balked at the reduce code, which seemed
like premature overgeneralization ;-)
def lt(*fields):
return collect(fields, lambda x, y: x < y)

def gt(*fields):
return collect(fields, lambda x, y: x > y)

def gte(*fields):
""" gte(field, ...) -> rule
"""
return collect(fields, lambda x, y: x >= y)

etc... DRY ? ;-)

The purpose of writing the collect function was just to be able to
easily apply the logic to whatever function was wanted.

the added ability is to be able to take as many arguments as it wants.
Yes, but does collect(fields, fun) always know that fun takes two args? And that
it makes sense to apply fun pairwise as a boolean relation to reduce to a single boolean value?
You could say that by definition it does. In that case, will your future library extender need
to add some other kind of collect-like function?
hire_rule = lt('birth_date', 'hire_date', 'fire_date')
cube_rule = eq('height', 'width', 'depth')
I see where the idea of using reduce would occur ;-)

The reason that it's written like this is because I'm attempting to
make things as declarative as possible. I could use classes and
override __special__ operators and things like that but just applying
rules using prefix notation seems to work alright. Applying or generating? But yeah, prefix should work fine. Or infix.

I've basically got a simple call signature...takes dict returns bool
that plugs into the little validation routine. Simple is good.
It turns out that envoking the rules is easistly expressed with a
closure, though I might switch to types if it needs that added
complexity. It's good that you have a syntax that can be implemented either way.

But your assumption is correct, it's for dictionaries of data to be
validated. I haven't decided if I want to take advantage of the dict's
mutability to include formating as well. You might consider putting formatting in dict subclasses that have appropriate
__str__ and/or __repr__ methods, and doing print Fmtxxx(record_dict) or such?

Here's a bit from my unit test. Looks nice. And the syntax is regular enough that you can probably
write an optimizing rule generator when/if you need it.

rule = when(all(have('length'), have('width')),
check(['length', 'width'], lambda x, y: x == y))
assert rule({'length' : '2', 'width' : '2'}) == True
assert rule({'length' : '2', 'width' : '1'}) == False
assert rule({'length' : '1', 'width' : '2'}) == False
But what about when the "when" clause says the rule does not apply?
Maybe return NotImplemented, (which passes as True in an if test) e.g.,

assert rule({'length' : '1', 'height' : '2'}) is NotImplemented
I've also got a "collectable" equality function so that can be shown
as.

box_rule = when(all(have('length'), have('width')), eq('length',
'width'))

Which basically means, if we have both a length and a width, then be
sure they are equal. either way, I think it would be better to give tests names, i.e., instead
of the lambda, pass a function def eq(x,y): return x==y and then also
change check to have signature def check(testfun, *fields):...
But "collectable" has the possibility of inline code generation (see below ;-)

Of course, there is a little function that performs a conjunction of a
complete list of rules on a dict and returns the rules that failed.

I've also got a little adapter that translates functions that take a
string and returns bool into one that fits the call signature called
match.

match(is_ssn, 'social-security_number')

Like I said, it may be considered more readable if using operator
overloading so that it uses python's native syntax. Considering the
added complexity, I don't know if it would be worth it. I'd probably
put a simple little declarative language on top of it to translate the
function calls before that.

Note that you could treat your when, all, have, and check as code source generators
and compile a rule as the last part of "when" processing, e.g.,
(fresh off the griddle, only tested as far as you see ;-)

----< jslowery.py >------------------------------------------
def all(*tests):
return '(' + ' and '.join(tests) +')'
def have(*fields):
return '(' + ' and '.join('"%s" in record'%field for field in fields) + ')'
def check(tfun, *fields):
return '%s(%s)' % (tfun.func_name, ', '.join('record[%r]'%field for field in fields))
def when(cond, test):
g = globals()
d = dict((name, g[name]) for name in __testfuns__)
src = """\
def rule(record):
if not (%s): return NotImplemented
return (%s)
""" %(cond, test)
print src # XXX debug
exec src in d
return d['rule']
def eq(*fields): # "collectable"
return '(' + ' == '.join('record[%r]'%(field,) for field in fields) + ')'

def eq_test(x, y): # not "collectable" in itself, use with check
return x==y

__testfuns__ = ['eq_test']

#rule = when(all(have('length'), have('width')),
# check(['length', 'width'], lambda x, y: x == y))

# rewritten with changed check, differentiating eq_test from "collectable" eq
rule = when(all(have('length'), have('width')),
check(eq_test, 'length', 'width'))
box_rule = when(all(have('length'), have('width')), eq('length', 'width'))
-------------------------------------------------------------

Then: (I left the debug print in, which prints the rule source code, with name "rule"
all the time. You could always change that)

import jslowery def rule(record):
if not ((("length" in record) and ("width" in record))): return NotImplemented
return (eq_test(record['length'], record['width']))

def rule(record):
if not ((("length" in record) and ("width" in record))): return NotImplemented
return ((record['length'] == record['width']))

So the "collectable" eq for box_rule (second debug output) generated in-line code,
which will run faster.
jslowery.rule({'length':2, 'width':2}) True jslowery.rule({'length':2, 'width':1}) False jslowery.rule({'height':2, 'width':1}) NotImplemented jslowery.box_rule({'height':2, 'width':1}) NotImplemented jslowery.box_rule({'length':2, 'width':1}) False jslowery.box_rule({'length':2, 'width':2})

True

Looks like I didn't need all the parens ;-)

Anyway, this is not well tested, but rules constructed this way should run
a lot faster than doing all the nested calling at run time. There are always
security issues in exec-ing or eval-ing, so I am not recommending the above
as secure from malicious rule-writers, obviously.

Regards,
Bengt Richter

Dec 16 '05 #13

jslowery

>Sorry about "obfuscation contest," I just balked at the reduce code, which seemed

like premature overgeneralization ;-)
It's a personal preference, but the only thing I consider premature is
optimization, not generalization. I prefer to try and capture any
concept I can as its own abstracted procedure. In practice this usually
means using a lot of lazy data structures at the end of the day :)
Yes, but does collect(fields, fun) always know that fun takes two args? And that
it makes sense to apply fun pairwise as a boolean relation to reduce to a single boolean value? You could say that by definition it does. In that case, will your
future library extender needto add some other kind of collect-like function?

The collect does assume that the function takes 2 parameters and the
logic is to apply to the result of each call and the next item in the
list.

The ony more general way I can think of making this would be to do
inspection on the function to determine the number. To make a general
algorithm that does something like:

let n = number of elements a given function f takes

apply n elements from the list to the function
given a result of the function r, apply [r, *(n-1)] recursively to the
function.

def f(a, b, c):
return (b+c)*a

map_col([1, 2, 3, 4, 5], f) => 45

However, I don't know how applicable that would be in this context,
since this is a boolean system and in the simple binary setup, it makes
sense to return the last operand to the beginning of the next
evaluation. In a more general case like this, the value of the
evalution would be the most useful, not the boolean truth value.

I do have one other generic rountine besides the booleans and collect
collectors which is called check:

def check(fields, test):
""" check(list fields, func test) -> rule
"""
def rule(record):
values = [record[field] for field in fields]
return test(*values)
return rule

it basically lets you pull out whatever fields you want and pass them
to the provided function.

def do_complicated_test(a, b, tolerance=0.001):
return a - ((a+b)/b) < a*tolerance

rule = check(('x', 'y'), do_complicated_test)

Now that I think of it, I should probably swap this to be
check(do_complicated_test, 'x', 'y')

When I first wrote this thing it was 3 stage construction, not 2. I
wanted to abstract out specific tests and then JUST apply fields to
them in the actual rules for the business code. I then figured out if I
standardize having the fields as the last arguments on the check
routines, then I could use a simple curry technique to build the
abstraction.

from functional import curry

comp_rule = curry(check)(do_complicated_test)

Then in the actual business code, can use it like this:

comp_rule(('a', 'b'))

Or:

ssn_rule = curry(match)(is_ssn)

ssn_rule('my-social-security-field')

For this to work properly and easily for the business logic. I do need
to get down some more solid rule invocation standards.

(have('length'), have('width')),
check(['length', 'width'], lambda x, y: x == y))
assert rule({'length' : '2', 'width' : '2'}) == True
assert rule({'length' : '2', 'width' : '1'}) == False
assert rule({'length' : '1', 'width' : '2'}) == False

But what about when the "when" clause says the rule does not apply?
Maybe return NotImplemented, (which passes as True in an if test) e.g.,

I don't really see why a special exception needs to be here because
"when" easily falls into the logic. When is basically the truth
function, "if A therefore B" and when(A, B) => NOT A OR B

If A is false, then the expression is always true.

def when(predicate, expr):
""" when(rule predicate, rule expr) -> rule
"""
def rule(record):
if predicate(record):
return expr(record)
else:
return True
return rule

So if the "test" fails, then the expression is True OR X, which is
always True.

Or, form another POV, the entire system is based on conjunction.
Returning "true" means that it doesn't force the entire validation
False. When the predicate fails on the When expression, then the
"therefore" expression does not matter because the predicate failed.
Therefore, the expresison is true no matter what.

Trying not to be condesending, I just think that the typical
propositional logic works well in this case.
Because this fits well into the system, can you give me an example why
this should make an exception to the boolean logic and raise an
exception?

The inline code generation is an interesting concept. It would
definately run faster. Hmm, I did this little test.

This one does arbitrary recursion calls

def doit(v, i): .... if not i:
.... return v
.... else:
.... return doit(v, i-1)

# Simple Adding while stressing function calls def run(times, depth): .... for i in range(times):
.... doit(i, depth/2) + doit(i, depth/2)

# benchmark def ezrun(times): .... for i in range(times):
.... i + i

# timer def efunc(f, *a, **k): .... s = time.time()
.... f(*a, **k)
.... return time.time() - s
....
efunc(run, 10000, 8) 0.061595916748046875 efunc(run, 10000, 4) 0.038624048233032227 efunc(ezrun, 10000) 0.0024700164794921875 efunc(run, 1000, 4) 0.0038750171661376953 efunc(ezrun, 1000) 0.00024008750915527344 efunc(run, 1000, 2) 0.002780914306640625 efunc(ezrun, 1000) 0.00023913383483886719
Even in the best case where there are 2 function calls per evaluation,
then it's at least 10^1 slower.

However, given a more realistic example with the kind of domain I'm
going to apply this to:

Hmm, I'd say that for practical application a record with 500 different
fields with an average of 3 rules per field is a common case upper
bound. using the product of that for a profile:
efunc(run, 1500, 4) 0.0058319568634033203 efunc(ezrun, 1500) 0.00034809112548828125
I can probably live with those results for now, though optimiziation in
the future is always an option :)
jslowery.rule({'length':2, 'width':2}) True jslowery.rule({'length':2, 'width':1}) False jslowery.rule({'height':2, 'width':1}) NotImplemented jslowery.box_rule({'height':2, 'width':1}) NotImplemented jslowery.box_rule({'length':2, 'width':1}) False jslowery.box_rule({'length':2, 'width':2})

True

You know, the first implementation of this thing I did had the fields
seperated from the complete rules. When I realized that this was just a
basic logic system, it became a problem having the fields seperated
from the rules.

In practice, I'm looking to use this for easily collecting many rules
together and applying them to a data set.

def test_validator2(self):
from strlib import is_ssn
rules = [
have('first_name'),
have('last_name'),
all(have('ssn'), match(is_ssn, 'ssn')),
when(all(have('birth_date'), have('hire_date')),
lt('birth_date', 'hire_date'))
]
v = validator(rules)
data = {'first_name' : 'John',
'last_name' : 'Smith',
'ssn' : '123-34-2343',
'birth_date' : 0, 'hire_date' : 0}

assert v(data) == []

data = {'first_name' : 'John',
'last_name' : 'Smith',
'ssn' : '12-34-2343',
'birth_date' : 0, 'hire_date' : 0}
assert v(data) == [rules[2]]
The thing I'm wrestling with now is the identity of rules and fields.
I'd like to fully support localization. Making a localization library
that will provide string tables for "error messages". Could use
positions to do this.

rules = [
have('first_name'),
have('last_name'),
all(have('ssn'), match(is_ssn, 'ssn')),
when(all(have('birth_date'), have('hire_date')), lt('birth_date',
'hire_date'))
]

msgs = [
locale('MISSING_VALUE', 'FIRST_NAME'),
locale('MISSING_VALUE', 'LAST_NAME'),
locale('INVALID_SSN'),
locale('INVALID_BIRTH_HIRE_DATE'),
]

# Could use a functional construction to get all of the locale
references out of here if wanted

and then use a generic routine to join them:

error_map = assoc_list(rules, msgs)

errors = validate(record)

for e in errors:
print locale.lookup(error_map[e])

Dec 17 '05 #14

jslowery

>>def lt(*fields):

return collect(fields, lambda x, y: x < y) def gt(*fields):
return collect(fields, lambda x, y: x > y) def gte(*fields):
""" gte(field, ...) -> rule
"""
return collect(fields, lambda x, y: x >= y) etc...
DRY ? ;-)

Do you mean by the doc string or the fact that I'm having to redeclare
built-in operations?

Dec 17 '05 #15

Bengt Richter

On 16 Dec 2005 16:55:52 -0800, "js******@gmail.com" <js******@gmail.com> wrote:
[...]

When I first wrote this thing it was 3 stage construction, not 2. I
wanted to abstract out specific tests and then JUST apply fields to
them in the actual rules for the business code. I then figured out if I
standardize having the fields as the last arguments on the check
routines, then I could use a simple curry technique to build the
abstraction.

from functional import curry I'm not familiar with that module, but I wrote a byte-code-munging curry
as a decorator that actually modified the decorated function to eliminate
parameter(s) from the signature and preset the parameter values inside the code.
I mention this because pure-python currying that I've seen typically creates a wrapper
function that calls the original function, and slows things down with the extra calling
instead of speeding things up, at least until some future version of python.
Looks nice on the surface though.

comp_rule = curry(check)(do_complicated_test)

Then in the actual business code, can use it like this:

comp_rule(('a', 'b'))

Or:

ssn_rule = curry(match)(is_ssn)

ssn_rule('my-social-security-field')

For this to work properly and easily for the business logic. I do need
to get down some more solid rule invocation standards.

(have('length'), have('width')),
check(['length', 'width'], lambda x, y: x == y))
assert rule({'length' : '2', 'width' : '2'}) == True
assert rule({'length' : '2', 'width' : '1'}) == False
assert rule({'length' : '1', 'width' : '2'}) == False
But what about when the "when" clause says the rule does not apply?
Maybe return NotImplemented, (which passes as True in an if test) e.g.,

I don't really see why a special exception needs to be here because
"when" easily falls into the logic. When is basically the truth
function, "if A therefore B" and when(A, B) => NOT A OR B

I was suggesting a distinction between testing and production failure detection.
The latter of course can treat not-applicable as logically the same as
applicable-and-did-not-fail.

Note that NotImplemented is not the same as NotImplementedError -- and I wasn't suggesting
raising an exception, just returning a distinguishable "True" value, so that
a test suite (which I think you said the above was from) can test that the "when"
guard logic is working vs just passing back a True which can't distinguish whether
the overall test passed or was inapplicable.

Of course, to assert a did-not-fail condition, bool(NotImplemented) tests as True,
so the assertion (if that's what you want to do) would be written

assert rule(...), 'You only see this if rule was applicable and did not pass'
# since assert NotImplented succeeds and assert True succeeds
# w/o raising AssertionError and False means rule was applicable and failed.

for testing purposes you can distinguish results:

assert rule(...) is True, 'You only see this if rule was not applicable or test failed'
assert rule(...) is False, 'You only see this if rule was not applicable or test succeeded'
assert rule(...) is NotImplemented, 'You see this if rule WAS applicable, and either passed or failed

OTOH, I can see returning strictly True or False. It's a matter of defining the semantics.
If A is false, then the expression is always true.

def when(predicate, expr):
""" when(rule predicate, rule expr) -> rule
"""
def rule(record):
if predicate(record):
return expr(record)
else:
return True
return rule

So if the "test" fails, then the expression is True OR X, which is
always True. Sure.

Or, form another POV, the entire system is based on conjunction.
Returning "true" means that it doesn't force the entire validation
False. When the predicate fails on the When expression, then the
"therefore" expression does not matter because the predicate failed.
Therefore, the expresison is true no matter what.

Trying not to be condesending, I just think that the typical
propositional logic works well in this case.
Because this fits well into the system, can you give me an example why
this should make an exception to the boolean logic and raise an
exception? First, I didn't suggest raising an exception, I suggested returning an alternate
value (NotImplemented) that python also considers logically true in an "assert value"
or "if value" context, so it could be distinguised, if desired, in a test suite (as I believe
you said your snippets were from).

Of course if you want to write "assert expr == True" as opposed to
just "assert expr", you are writing a narrower assertion. Note:

assert NotImplemented, 'assertion failed'
assert True, 'assertion failed'
assert False, 'assertion failed' Traceback (most recent call last):
File "<stdin>", line 1, in ?
AssertionError: assertion failed

If your rule was returning NotImplemented as a "true" value, you wouldn't use
the narrowed assertion in a production context, you'd just use the normal
bare python-truth assertion. I.e., (using the dummy rule to illustrate)
def rulereturning(x): return x ...

you would not write
assert rulereturning(NotImplemented)==True, 'assertion failed' Traceback (most recent call last):
File "<stdin>", line 1, in ?
AssertionError: assertion failed

You would write assert rulereturning(NotImplemented), 'assertion failed'
(That passed)

BTW, note that unless you use 'is', expr==True is not as narrow as you might think: assert 1.0==True, 'assertion failed' or assert rulereturning(1.0)==True, 'assertion failed'

(both passed sans exception)

The inline code generation is an interesting concept. It would
definately run faster. Hmm, I did this little test.
[...]
BTW, is this data coming from an actual DBMS system? If so, wouldn't
there be native facilities to do all this validation? Or is Python
just too much more fun than SQL? ;-)
[...]
The thing I'm wrestling with now is the identity of rules and fields.
I'd like to fully support localization. Making a localization library
that will provide string tables for "error messages". Could use
positions to do this. IWT dict keys would be more robust. Maybe just use the english string as the
key for looking up alternates? BTW strings could also name data to be
formatted, as in '%(this)s and %(that_num)d' % mappingobject,
where mappingobject can be anything that looks up values per
mappingobject.__getitem__(name) where name would be 'this' or 'that' here.

rules = [
have('first_name'),
have('last_name'),
all(have('ssn'), match(is_ssn, 'ssn')),
when(all(have('birth_date'), have('hire_date')), lt('birth_date',
'hire_date'))
]

msgs = [
locale('MISSING_VALUE', 'FIRST_NAME'),
locale('MISSING_VALUE', 'LAST_NAME'),
locale('INVALID_SSN'),
locale('INVALID_BIRTH_HIRE_DATE'),
]

Not to distract you, but I'm wondering what your rules would look like represented in simple
rule-per-line text, if you assume that all names are data base field names
unless otherwise declared, and just e.g. "name!" would be short for "have(name)",
then just use some format like (assuming 'rules' is a name for a particular group of rules)

functions:: "is_ssn" # predeclare user-defined function names
operators:: "and", "<" # might be implicit

rules: first_name!; last_name!; ssn! and is_ssn(ssn); birth_date! and hire_date! and (birth_date < hire_date)

or with identation, and messages after a separating comma a la assert

rules:
first_name!, 'Missing first name' # or ('MISSING_VALUE', 'FIRST_NAME') tuple as args for your locale thing
last_name!, 'Missing last name'
ssn! and is_ssn(ssn), 'Invlaid ssn' # => 'ssn' in record and is_ssn(record['ssn']
birth_date! and hire_date! and (birth_date < hire_date) # => 'birtdate' in record and 'hire_date' in record and
# record['birth_date'] < record['hire_date']
more_rules:
field_name!, 'field name is missing'
etc ...

Seem like it would be fairly easy to translate to code in a function per rule group,
without any fancy parsing.
I don't like XML that much, but that might be a possible representation too.

Regards,
Bengt Richter

Dec 17 '05 #16

jslowery

>Note that NotImplemented is not the same as NotImplementedError -- and I wasn't suggesting

raising an exception, just returning a distinguishable "True" value, so that
a test suite (which I think you said the above was from) can test that the "when"
guard logic is working vs just passing back a True which can't distinguish whether
the overall test passed or was inapplicable.

Ahh, sorry about that. I got ya. that's not a bad idea.
BTW, is this data coming from an actual DBMS system? If so, wouldn't
there be native facilities to do all this validation? Or is Python
just too much more fun than SQL? ;-)
Well, the "first usage" is going to come from a RDBMS, yeap. We use
MySQL 4.1.x right now. I know with Oracle/MSSQL, it's easy to put all
kinds of fancy constraints on data. I'm aware of no such facilities in
MySQL though I would love it if there was one to apply these kinds of
flexible validation routines.

Worst case scenario I can use it to do testing of what is pulled out of
the database :)
Not to distract you, but I'm wondering what your rules would look like represented in simple
rule-per-line text, if you assume that all names are data base field names
unless otherwise declared, and just e.g. "name!" would be short for "have(name)",
then just use some format like (assuming 'rules' is a name for a particular group of rules) functions:: "is_ssn" # predeclare user-defined function names
operators:: "and", "<" # might be implicit rules: first_name!; last_name!; ssn! and is_ssn(ssn); birth_date! and hire_date! and (birth_date < hire_date) or with identation, and messages after a separating comma a la assert

That's really a great idea!

When I was trying to figure out how to combine these rules and figure
out what kind of flexibility I wanted, I actually did write some
declarative strings. It was a lot more verbose than what you wrote, but
it helped with the concepts. Your approach is good.

My little pseduo-lang was like:

REQUIRE last_name or first_name
WHEN HAVE last_name, surname THEN REQUIRE last_name != surname

Well, I didn't go on and work with the error strings. I decided that
this thing wouldn't be very useful without being able to work with
hierarchical data.

So I abstracted out the idea of referencing the record by key and
implemented a little "data resolution" protocol using PLY. I like the
protocol because I think it will be something that can be reused fairly
often, especially for mini-langs.

def testNestedData(self):
data = {
'first_name' : 'John',
'last_name' : 'Smith',
'surname' : '',
'phone' : '205-442-0841',
'addresses' : {
'home' : {
'street' : '123 Elm St',
'city' : 'Birmingham',
'state' : 'AL',
'postal' : '35434'
},
'work' : {
'street' : '101 Century Square',
'city' : 'Centre',
'state' : 'AL',
'postal' : '35344'
}
},
'aliases' : [
'John Smith',
'Frank Beanz'
]
}

rule = when(have('addresses.work'), all(
have('addresses.work.city'),
have('addresses.work.state'),
have('addresses.work.postal')))

assert repr(rule(data))
data['addresses']['work']['city'] = ''
assert rule(data) == False
data['addresses']['work']['city'] = 'Centre'
It works the same as the rules do:

def match(test, field):
""" match uses a function that takes a single string and returns a
boolean
and converts it into a validation rule.
"""
field = drep(field)
def rule(record):
value = field(record)
if value:
return bool(test(value))
else:
return True
return rule

def check(test, fields):
""" check uses a function that takes a variable arguments and
applies the given
fields to that function for the test.
"""
fields = [drep(field) for field in fields]
def rule(record):
values = [field(record) for field in fields]
return test(*values)
return rule
The data resolution algorithm uses dots and square braces. Dot's
resolve on dicts and square braces resolve on lists. It doesn't support
slicing, but it easily could. There is a special [:] notation that
turns the return into a list. all rules after the [:] are then applied
successively to each list entry.

def testDeep(self):
data = {
'a' : 'eh-yah',
'b' : {'x' : 'xylophone',
'y': 'yo-yo',
'z': 'zebra',
'A': [
1, 2, 3, 4
]}
}
assert drep('b')(data) == {'y': 'yo-yo', 'x': 'xylophone', 'z':
'zebra', 'A': [1, 2, 3, 4]}
assert drep('b.x')(data) == 'xylophone'
assert drep('b.A[:]')(data) == [1, 2, 3, 4]
assert drep('b.A')(data) == [1, 2, 3, 4]
assert drep('b.A[1]')(data) == 2
def testListResolution(self):
data = {"foo" : [
{"bar" : 1},
{"bar" : 2},
{"bar" : 3}
]}
assert drep("foo[:].bar")(data) == [1, 2, 3]
assert drep("foo[2].bar")(data) == 3

I know the code COULD also use some funky thing to break apart dicts as
well, but I didn't think of that when I was writing the lexer/parser.

Combining this with a little mini-language like the one you described
would be nice.

Dec 17 '05 #17

jslowery

>>from functional import curry

I'm not familiar with that module, but I wrote a byte-code-munging curry
as a decorator that actually modified the decorated function to eliminate
parameter(s) from the signature and preset the parameter values inside the code.
I mention this because pure-python currying that I've seen typically creates a wrapper
function that calls the original function, and slows things down with the extra calling
instead of speeding things up, at least until some future version of python.
Looks nice on the surface though.

Sounds very useful. Is this open source? All of the curry functions
I've seen in the wild use a wrapper as well. I don't care for this
either mainly because it hides away the function attributes and when I
use decorators on functions that mudge attributes, it doesn't work.

Dec 17 '05 #18

Bengt Richter

On 17 Dec 2005 01:16:31 -0800, "js******@gmail.com" <js******@gmail.com> wrote:

from functional import curry
I'm not familiar with that module, but I wrote a byte-code-munging curry
as a decorator that actually modified the decorated function to eliminate
parameter(s) from the signature and preset the parameter values inside the code.
I mention this because pure-python currying that I've seen typically creates a wrapper
function that calls the original function, and slows things down with the extra calling
instead of speeding things up, at least until some future version of python.
Looks nice on the surface though.

Sounds very useful. Is this open source? All of the curry functions
I've seen in the wild use a wrapper as well. I don't care for this
either mainly because it hides away the function attributes and when I
use decorators on functions that mudge attributes, it doesn't work.

The curry part works like:
(first unmodified)

import inspect
def foo(x, y): return x*y ... import dis
dis.dis(foo) 1 0 LOAD_FAST 0 (x)
3 LOAD_FAST 1 (y)
6 BINARY_MULTIPLY
7 RETURN_VALUE inspect.getargspec(foo) (['x', 'y'], None, None, None)

(modified) from ut.presets import presets, curry
@curry(y=111) ... def foo(x, y): return x*y
... dis.dis(foo) 1 0 LOAD_CONST 1 (111)
3 STORE_FAST 1 (y)

3 6 LOAD_FAST 0 (x)
9 LOAD_FAST 1 (y)
12 BINARY_MULTIPLY
13 RETURN_VALUE inspect.getargspec(foo) (['x'], None, None, None)

The presets decorator does the same except doesn't modify the signature
(it came first ;-)
def foo(): ... return now()
... dis.dis(foo) 2 0 LOAD_GLOBAL 0 (now)
3 CALL_FUNCTION 0
6 RETURN_VALUE import time
@presets(now=time.ctime) ... def foo():
... return now()
... dis.dis(foo) 1 0 LOAD_CONST 1 (<built-in function ctime>)
3 STORE_FAST 0 (now)

3 6 LOAD_FAST 0 (now)
9 CALL_FUNCTION 0
12 RETURN_VALUE foo()

'Sat Dec 17 02:49:06 2005'

Humph. Good night ;-/

I googled and found a post of mine, which I think may be my last version, at

http://groups.google.com/group/comp....27a53385e89924

where you can look to see if it's interesting. I think it's probably more fruitful
to put work into doing it by way of rewriting the AST, which would potentially
allow folding constant expressions that may be formed when parameters
become constants, etc. So I haven't pursued byte code munging much further.
The version above only had minor testing, and only on python 2.4. IOW, basically
a proof of concept.

I based my presets/curry decorator on the pattern of a decorator Raymond Hettinger wrote,
which optimizes global accesses and other stuff. No warranties, mind! ;-)

Regards,
Bengt Richter

Dec 17 '05 #19

Problem with Lexical Scope

Similar topics