Suggesting a new feature - "Inverse Generators"

Jordan Rastrick

First, a disclaimer. I am a second year Maths and Computer Science
undergraduate, and this is my first time ever on Usenet (I guess I'm
part of the http generation). On top of that, I have been using Python
for a grand total of about a fortnight now. Hence, I apologise if what
follows is a stupid suggestion, or if its already been made somewhere
else, or if this is not the appropriate place to make it, etc. But I
did honestly do some background checking through the Python
documentation, www.python.org, FAQs, and so forth before making this
post. Basically what I'm trying to say here is, please don't flame this
post :)

Anyway, I have a suggestion to make for a rather radical new feature in
Python. Rather than just struggling to explain and justify it straight
out, I'll give the background of how I thought up the feature - and how
I worked through the ideas I came up with. The example may seem a bit
trivial to start, but bear with me, because I think theres at least
some possibility the suggestion has merit.

I am writing a project in Python atm, and one thing it needs to do is
parse a very simple text file to get a list of records. Its a
straightforward kind of thing that I (and I'm sure you) have done many
times before in many languages.

Each bit of interesting data in the file occurs over 3 or 4 lines of
text. On top of this there are lines of random uninteresting junk as
well.

I decided to write a generator to filter out the junk lines, strip
excess whitespace and the like (generators are one of my favourite
features in Python). So the main body of code looks like this:

def filterJunk(lines):
for line in lines:
# Do filtering & cleaning up stuff on line.
# ...
yield line

def getData(filename):
recordList = []
input = file(filename)
lines = line.readlines()
input.close()
for line in filterJunk(lines):
# ... create records and add to recordList

So far, so good. Next comes the business of actually creating the
records from the lines of interesting text.

You can probably see the problem I ran into - I need to look at 3 or 4
lines at a time to create the record. Theres no easy way out of this
requirement, because whether the record is contained in 3 or 4 lines is
conditional on the value of the second line. But the for loop isn't
going to allow me to do that.
So I rewrote the for loop above as:

while True:
try:
# ....do stuff repeatedly with lines.next() to add to recordList[]
except StopIteration:
pass

This works. (And yes, there are also a multitude of other reasonably
simple ways to approach this problem, some of which I've used in this
past. And probably some fancy ways involving map and reduce as well.)

But it seems ugly to me somehow. And I'm suspicious of anything that
hints of ugliness (in Computer Science, anyway.) I suppose that
Python's getting me into the habit of assuming theres always an elegant
looking solution.

I tried, and failed, to think of some sort of Generator-based concept
(I really do love generators! So simple, but so useful!) that would
allow me to retain a for loop. I guess that, essentially, I feel this
task is really just iterating over the lines, and that iterating over
something in Python is a job for a for loop. Generators allow you to
use for loops all regular iteration. So I wanted to have something
"vaugely Generator like", that would allow me to get back to using a
for loop.

It occured to me - what if I had some sort of "inverse generator" - a
function with a resumable call mechanism that retained state, but that
would resume when accepting input rather than returning output. For
want of a better term, call it an Acceptor.

If you think of a Generator as a method acting as (an abstraction of) a
source of sequential data, then the Acceptor correspondingly is a sink.
I started playing round with the Acceptor idea, thinking about how it
might work. I imagined a simple mirroring of Generator syntax - using a
new keyword, accept, to take in a value, just as Generator's use yield
to return a value. Here's my first attempt at 'coding' my precious for
loop using an imaginary Acceptor:

def combineIntoRecord(): # This is an acceptor function
optionalline = None # We may not get given a value for this line
accept firstline
accept secondline
if condition(secondline):
accept optionalline
accept lastline
r = createRecord(firstline, secondline, lastline, optionalline)
return r

recordlist = []
for line in lines:
recordlist.append(combineIntoRecord(line))

I've since thought about it somewhat, and its clear I got this first
try just plain wrong. Its got a number of hideous defects. First,
combineIntoRecord() is 'accepting' its values by taking a magical
argument, line, that does not appear in the argument list.

Second, in this particular syntax, do
combineIntoRecord(x) at line 50
and
combineIntoRecord(y) at line 200
want x and y to be combined into the same record, using the old state,
or is the second trying to start a "new call" to the method? For an
extreme example, imagine calls made in different modules. Basically,
the jump into combineIntoRecord code needs to somehow specify an
'instance' of the function, with a given existing state. The method
call needs to be bound to something, somehow.

Worst of all, though, since combineIntoRecord(line) doesn't always
return a value, how does the interpreter know when to call append? It
seems the 'magic' of the acceptor is spilling out into the surrounding
code, making it impossible to understand or intelligently parse. The
magicness needs to be contained within the acceptor itself.

How to resolve these issues?
Well, thinking about how generators achieve their magic-like effects:
- A generator, despite being a function, doesn't return a value in
the usual sense.
- Calls to a generator don't result in a value, they result in an
Iterator object.
- The magic happens at the 'yield' keyword.
So, analguously:
- Perhaps call to combineIntoRecord() shouldn't return a value
- It should return an acceptor object instance instead
- Every call results in a different object - solves problem 2 above
- Acceptor objects have a special method for feeding them values,
takes care of problem 1 above
BUT:
- When and what does an acceptor object 'return'?
- We don't actually want to restrict what an acceptor *function*
returns.
- Acceptors (unlike generators) should be able to return just as per
ordinary functions
- Its not the output of the function thats 'magic', its the input to
the function
Thus:
- We must confine the magic to the input the function recieves - its
argument list.
- Have a 'magic argument' that 'takes in' the accepted values.
- This is fine - it keeps the magic within the call of the acceptor
function
- If an acceptor returns like an ordinary function, it needs to
return when called
- Hence it needs to know its accepted values "all at once"

From our earlier thinking, an acceptor, fundamentally, accepts a

sequence of data.

So let the first argument to an acceptor function call be a "magic
sequence" that contains all the values we want to be accepted. Other
arguments may be passed as normal.

A disadvantage: the power of the acceptor is restricted in this
version, since you can't have elements Accepted at any arbitrary point
in the code. This is really a good thing, though - with the old version
it was the case that too much power was making the idea totally
unworkable. What we've done is essentially bound our Acceptor to
something, namely a sequence. Note, though, that we can use any
abstract source of sequential data - including a Generator - so theres
still quite a lot of flexibility in how it gets data for accepting.

To illustrate this model of an Acceptor, heres an example of a simple
Acceptor that returns the first element of a sequence that satisfies
some condtion (which is presumed to be a function):

def firstThatSatisfies(condition):
accept element
if condition(element): return element

To use, we simply call:

first = firstThatSatisfies(somelist, condition)

somelist, not appearing in the argument list, gets fed into the accept
statement by the 'acceptor magic' .

Note how explicit iteration has been removed from this canonical
algorithm altogether. I dont know about you, but I think it looks kind
of neat.

A Slight Digression:

Clearly it might be true that condtion(element) == 0 for every element
in some list. This brings us to the issue of an accept statment that
doesnt get fed any data.

There are two possible ways to deal with this. One is to insist every
accept statement must get called, and raise an exception if this doesnt
happen. However, I prefer the seemingly more flexible option where you
simply say "Bad luck, once the magic sequece is empty, any remaining
acceptor code doesnt get executed, and any acceptor state is lost."
This means an acceptor might not always give a well defined return
value, but this is no worse than something like

def f()
x = someCalculation()
bool = someConditionThatWillAlwaysBeFalse()
if false: return x

This conventional example has a return value of None, and an Acceptor
that doesnt reach an explict return statement can be treated exactly
the same way.

Digression

Now, going back to my original problem and the first Acceptor I wrote:

def combineIntoRecord(): # This is an acceptor function
optionalline = None # We may not get given a value for this line
accept firstline
accept secondline
if condition(secondline):
accept optionalline
accept lastline
r = createRecord(firstline, secondline, lastline, optionalline)
return r

This doesnt work anymore - it produces and returns the first record we
want, and then stops. We want a list of all records .

Maybe we could use a loop.The 'accepting magic' would need to be
confied to this loop. And the loop needs to know when the magic stream
of data comes to an end to know when to return the final value. Maybe
use another special keyword:

def combineIntoRecord():
recordList = [].
while acceptingonly
# ....accept values, append createRecord to recordLlist
Perhaps. I don't really like the idea of having any magic new keywords
other than accept itself; if it needs this sort of explicit while loop,
maybe it shouldnt be in an Acceptor at all (the whole original point of
this idea was to get rid of while loop after all!) **

So are Acceptors pretty useless, after all that? Maybe. Then again,
maybe not.

What is this combineIntoRecord(), that I have written using
non-existent language features, trying to do, in a general sense?

It wants to take some sequence (the lines in a file) and produce
another sequence, that in some way depends first. Howevers it wants to
be able define basically any arbitary dependence between the sequences,
taking the elements from the first and producing output to the second
at its leisure.

The 'taking at its leisure' comes by being virtue of an Acceptor.
'Producing at its leisure' is a feature already available in Python.
Its called a Generator.

Behold:

# An Acceptor/Generator!!!
def combineIntoRecords():
optionalline = None # We may not get given a value for this line
accept firstline
accept secondline
if condition(secondline):
accept optionalline
accept lastline
yield createRecord(firstline, secondline, optionalline,
lastline)

def getData(lines):
return list(combineIntoRecords(filterJunk(lines)))

So, alas, my beloved for loop was not saved after all.

Fortunately, the dreaded While loop has been vanquished. In fact all
loopiness has disappeared. Along with any other semblance of a main
method.

I think I like the finished result. I'll leave if for you to decide if
you like it too.

Jul 18 '05 #1

Subscribe Post Reply

2095

Diez B. Roggisch

I'll try to reduce my pages of ranting to a single question. In
posting, i was wondering if the "syntactic sugar" (Acceptors) that i
invented to implement the solution is of any general interest. So are
there maybe examples less straightforward than this one, where
Acceptors work better? Or can you always just "turn the generator
inside out" in the way you have done here?

If you can always do it your way, well, thats a powerful design
pattern, and it just goes to show my faith in Generators was justified
:) And that I wasnt thinking hard /clearly enough about how to use
them.

There are other issues, like "Does the Acceptor syntax, although
perhaps functionally equivalent to other methods, ever make the code
more readable, easier to parse, etc?" But they're a lot less important
i'd say.

To me your acceptors look like micro-threads or similar concepts that have
been popped up here every now and then - but so far it seems they didn't
end up beeing included. I can't say for what reasons though.

Just yesterday I looked into stackless python to grasp what it does, and
while it is not syntactically different to standard python, it seems to
make coding the way you intend to do possible.
--
Regards,

Diez B. Roggisch

Jul 18 '05 #2

Michael Spencer

Jordan Rastrick wrote:

Wow, if I'm going to get replies (with implemented solutions!) this
quickly, I'll post here more often :-)
That is indeed typical of this most attentive group :-)
Its taken me a while to get a rough understanding of this code, but I
think I have some idea. It is just an example jotted in 2 min - no doubt it could be made clearer.
Correct me if I'm wrong.
groupby groups based on value of line(record)
No, groupby, groups on the value of record(item), where item is given by
iterating over linesource

You should check the itertools documentation:
http://docs.python.org/lib/itertools-functions.html
Record returns 1 for the first line, 1 of the second, 1 for the 3rd,
then 2 for the 4th because seq[0] gets incremented since len(line) > 20
In this case, it doesn't matter what record returns, as long as it is equal for
successive values of item that should be grouped
OK thats fair enough. But how does record retain state between calls?
How is that related to the fact your storing your value as a singleton
list, instead just an int?
You are asking about the fundamental behavior of Python: argument passing,
mutable objects and scopes.
It seems a little confusing to be honest, probably mainly due to my
unfamiliarity with groupby. Retaining state between method calls is
part of what interests me so much about the Generator/ Acceptor case.
Here youre retaining state between calls with none of the special
syntax used for example in Generators.

How? Is it a side effect of the way groupby uses record? If so, then
thats a littleoblique and unreadable for my liking.
No, it's nothing special about groupby. record simply stores its state in a
mutable default parameter. This isn't general good practice: at least you have
to be careful with it. You can see the behavior in the following example:
def accumulate(value, accum = []): ... accum.append(value)
... return accum
... accumulate(1) [1] accumulate(2) [1, 2] accumulate(6) [1, 2, 6]

....
Still, this is fascinating.... going to have to spend some time
experimenting with groupby as soon as I get a chance....

Experimenting is good. So is the the documentation:
http://docs.python.org/tut/tut.html

Michael

Jul 18 '05 #3

Serge Orlov

Michael Spencer wrote:

Still, this is fascinating.... going to have to spend some time
experimenting with groupby as soon as I get a chance....

Experimenting is good. So is the the documentation:
http://docs.python.org/tut/tut.html

Reading documentation is a good idea, but I think your example would
be more clear for Jordan if you used function attributes:

def record(item):
if len(item) > 20:
record.seq +=1
return record.seq
record.seq = 0

Serge.

Jul 18 '05 #4

Terry Reedy

Terminology: To me and some (most?) others posting here and, I believe,
both the docs and the common meaning of 'generator', a Python generator is
the particular kind of iterator that produces multiple values on request
and which is created by the generator function that you write.

Acceptors (consumers): The concept is known to both CS and Python
developers. According to Tim Peters, Python generators constitute
'semi-coroutines'. Any full coroutine mechanism (Stackless, for instance)
will allow the reverse (not inverse, which would undo rather than
complement). For various reasons, the Python developers choose that data
chains should be written as consumer code getting data from a generator,
which might in turn get data from a generator.

In your example, opening and reading the lines of the data file could be
done by filterjunk, not the main function, but I can see reasons both ways.

For your specific problem, you could, I believe, use an intermediate
generator (which could also be combined with filterjunk) that combines
lines into complete text records. Something like (ignoring any fussy
details left out):

def textrecord(file):
trlines = []
for line in filterjunk(file):
trlines.append(line)
if complete(trline): yield ''.join(trlines)
# where 'complete' is code to determine if have all lines in record or
not

Terry J. Reedy

Jul 18 '05 #5

Bengt Richter

On Fri, 25 Mar 2005 08:46:12 -0800, Michael Spencer <ma**@telcopartners.com> wrote:

Tim Hochberg wrote:
Jordan Rastrick wrote:

itertools.groupby enables you to do this, you just need to define a suitable
grouping function, that stores its state:

For example, if short lines should be appended to the previous line:

from itertools import groupby
linesource = """\
Here is a long line, long line, long line
and this is short
and this is short
Here is a long line, long line, long line
and this is short""".splitlines()

def record(item, seq = [0]):
if len(item) > 20:
seq[0] +=1
return seq[0]

>>> for groupnum, lines in groupby(linesource, record): ... print "".join(lines)
...
Here is a long line, long line, long lineand this is shortand this is short
Here is a long line, long line, long lineand this is short >>>

Nice, but I think "record" is a bit opaque semantically.
How about group_id or generate_incrementing_unique_id_for_each_group_to_ group_by or such?

Regards,
Bengt Richter

Jul 18 '05 #6

Scott David Daniels

Michael Spencer wrote:

itertools.groupby enables you to do this, you just need to define a
suitable grouping function, that stores its state:

Michael, this would make a great Python Cookbook Recipe.

--Scott David Daniels
Sc***********@Acm.Org

Jul 18 '05 #7

Michael Spencer

Scott David Daniels wrote:

Michael Spencer wrote:
itertools.groupby enables you to do this, you just need to define a
suitable grouping function, that stores its state:

Michael, this would make a great Python Cookbook Recipe.

OK, will do. What would you call it? Something like: "Stateful grouping of
iterable items"

[Bengt]: Nice, but I think "record" is a bit opaque semantically.
How about group_id or generate_incrementing_unique_id_for_each_group_to_ group_by or such?

Regards,
Bengt Richter
Agreed, it's an issue. I think the most natural name is groupby - but that
would cause more trouble. What do you think about 'grouping' ?
I would use 'generate_incrementing_unique_id_for_each_group_to _group_by', but
then people might think I'm trying to outdo Bob Ippolito :-)

[Serge]: I think your example would
be more clear for Jordan if you used function attributes:

def record(item):
if len(item) > 20:
record.seq +=1
return record.seq
record.seq = 0

That does read better than the mutable default argument hack. Is this use of
function attributes generally encouraged? (I tend to think of func_dict for
meta-data, used only outside the function) Thoughts?

Michael

Jul 18 '05 #8

Scott David Daniels

Michael Spencer wrote:

Scott David Daniels wrote:
Michael Spencer wrote:

OK, will do. What would you call it? Something like: "Stateful
grouping of iterable items"

How about "Using groupby to fill lines"?

--Scott David Daniels
Sc***********@Acm.Org

Jul 18 '05 #9

Bengt Richter

On Fri, 25 Mar 2005 12:07:23 -0800, Michael Spencer <ma**@telcopartners.com> wrote:

Scott David Daniels wrote:
Michael Spencer wrote:
itertools.groupby enables you to do this, you just need to define a
suitable grouping function, that stores its state:

Michael, this would make a great Python Cookbook Recipe.

OK, will do. What would you call it? Something like: "Stateful grouping of
iterable items"

[Bengt]:
Nice, but I think "record" is a bit opaque semantically.
How about group_id or generate_incrementing_unique_id_for_each_group_to_ group_by or such?

Regards,
Bengt Richter

Agreed, it's an issue. I think the most natural name is groupby - but that
would cause more trouble. What do you think about 'grouping' ?
I would use 'generate_incrementing_unique_id_for_each_group_to _group_by', but
then people might think I'm trying to outdo Bob Ippolito :-)

[Serge]:
I think your example would
be more clear for Jordan if you used function attributes:

def record(item):
if len(item) > 20:
record.seq +=1
return record.seq
record.seq = 0

That does read better than the mutable default argument hack. Is this use of
function attributes generally encouraged? (I tend to think of func_dict for
meta-data, used only outside the function) Thoughts?

Personally, I don't like depending on an externally bound (and rebindable) (usually global)
name as a substitute for "self." You could always use a class to carry state, e.g. (untested)

class Grouper(object):
def __init__(self): self.group_id = 0
def __call__(self, item):
if len(item) > 20: self.group_id += 1 # advance id to next group
return self.group_id
# ...
grouper = Grouper()
# ...
for groupnum, lines in groupby(linesource, grouper):
print "".join(lines)

Or (guess I better actually try this one ;-)

linesource = """\ ... Here is a long line, long line, long line
... and this is short
... and this is short
... Here is a long line, long line, long line
... and this is short""".splitlines()
for groupnum, lines in groupby(linesource, type('',(),{'n':0, '__call__':lambda s,i: setattr(s,'n', s.n+(i>20)) or s.n})()):

... print "".join(lines)
...
Here is a long line, long line, long line
and this is short
and this is short
Here is a long line, long line, long line
and this is short

Regards,
Bengt Richter

Jul 18 '05 #10

Jordan Rastrick

> No, it's nothing special about groupby. record simply stores its
state in a

mutable default parameter. This isn't general good practice: at least you have to be careful with it. You can see the behavior in the following

example:

>>> def accumulate(value, accum = []): ... accum.append(value)
... return accum
... >>> accumulate(1) [1] >>> accumulate(2) [1, 2] >>> accumulate(6) [1, 2, 6] >>>

Wow.... I'd never seen this kind of thing in examples of Python code.
Although its really neat, it doesn't really make sense, intuituvely to
me. Why does accum remember its state - I suppose its to do with the
scope of arguments (as opposed to method variables) or something like
that?

Still, thats powerful. But I see why its not standard use - it could
be easily abused!

Jul 18 '05 #11

Peter Otten

Jordan Rastrick wrote:

No, it's nothing special about groupby. record simply stores its state in a
mutable default parameter. This isn't general good practice: at

least you have
to be careful with it. You can see the behavior in the following

example:
>>> def accumulate(value, accum = []):

... accum.append(value)
... return accum
...
>>> accumulate(1)

[1]
>>> accumulate(2)

[1, 2]
>>> accumulate(6)

[1, 2, 6]
>>>
Wow.... I'd never seen this kind of thing in examples of Python code.
Although its really neat, it doesn't really make sense, intuituvely to
me. Why does accum remember its state - I suppose its to do with the
scope of arguments (as opposed to method variables) or something like
that?

Michael's accumulator uses the fact that default arguments are only
evaluated once -- when the function is created. The behaviour shown above
is actually a common trap every newbie has to experience once until he
learns the workaround:

def accumulate(value, a=None):
if a is None:
a = []
a.append(value)
return a
Still, thats powerful. But I see why its not standard use - it could
be easily abused!

There are limitations, too. If you want more than one accumulator you have
to pass the accum argument explicitly or wrap accumulate() into a factory:

def make_accumulator():
def accumulate(value, a=[]):
a.append(value)
return a
return accumulate

Sill, you cannot get hold of the result of the accumulation without
modifying it. One way to fix that:

def make_accumulator(): .... a = []
.... def accumulate(value):
.... a.append(value)
.... return a, accumulate
.... items1, accu1 = make_accumulator()
for i in range(4): accu1(i) .... items1 [0, 1, 2, 3] items2, accu2 = make_accumulator()
for i in "abc": accu2(i) .... items2 ['a', 'b', 'c'] items1

[0, 1, 2, 3]

Now this is all nice and dandy to play around with and learn something about
Python's scoping rules, but you can get the same functionality in a
straightforward way with a callable object (like Bengt Richter's Grouper)
and that is what I would recommend.

Peter

Jul 18 '05 #12

Andrew Koenig

"Jordan Rastrick" <jr*******@student.usyd.edu.au> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com...

But I'm not so much interested in alternate solutions to the problem
itself, which is to be honest trivial. I'm intereseted in the
implications of the imaginary solution of the Acceptor function.

Of course.

But you'll get more people interested in your alternatives if you can come
up with some use cases that the existing language can't handle easily.

Jul 18 '05 #13

Jordan Rastrick

Hmmm, I like the terminology consumers better than acceptors.

The ability to have 'full coroutines', or at least more 'coroutiney
behaviour' than is provided by generators alone, was I think what I was
trying to get at with my original idea, although things got a bit
sidetracked by the way I focused on the particular (and not espeically
interesting) example I provided. Another factor I think was that I
restricted Acceptors far too much in the second version by demanding
they receive all their input at once - that meant they weren't much
more interesting than a function taking a generator or the like and
using its output. The ability for them to accept input and resume
execution at any point is an essential part of what would make such a
feature interesting. Maybe if they gave you a callable object, and just
avoided the whole issue of return value all-together, something like

# I wish I could think of a more interesting example :)
def combineIntoRecords(listToAppend):
accept firstline
# etc
listToAppend.append(record)
# No return value, avoid issues of non-evaluation

recordList = []
combine = combineIntoRecords(recordList)
for line in filterJunk(lines):
combine(line)

This acheives the same result as before, but of course now youre free
to use combine anywhere else in the code if you wanted.

I still think this kind would allow you to do things that can't be done
using the other techniques brought up in this discussion... maybe Terry
or somebody else familiar with Consumers might know a good example?

Anyway this has all been very interesting, and I've learned a lot. I
never really had a clue what Stackless Python was before, and now I
think I have a vague idea of some of the implications. Not to mention
the little tidbit about default argument evaluation. And groupby looks
very interesting.

Thanks everyone for the host of replies, which have all been of an
excellent standard IMO. I only wish being a newbie in other languages
was as much fun as it is in Python :)

Now to actually finish implementing my text file parser. The only
problem is deciding between the dozens of potential different methods...

Jul 18 '05 #14

phil_nospam_schmidt

> The ability to have 'full coroutines', or at least more 'coroutiney

behaviour' than is provided by generators alone, was I think what I

was

Jordan,

This is somewhat off-topic, but perhaps you might be interested in
taking a look at the Lua language (http://www.lua.org/). It supports
coroutines, and is similar to Python in that it is a dynamically typed
language with support for a nice associative dictionary-like data
structure. It doesn't cover the same solution space that Python does,
but it is an interesting language to play with.

Phil Schmidt

Jul 18 '05 #15

Oren Tirosh

"Jordan Rastrick" <jr*******@student.usyd.edu.au> wrote in message news:<11**********************@o13g2000cwo.googleg roups.com>...

Hmmm, I like the terminology consumers better than acceptors.

Here's an implementation of Python consumers using generators:
http://groups.google.co.uk/gr*******...***@python.org

Oren

Jul 18 '05 #16

Suggesting a new feature - "Inverse Generators"

Similar topics