473,322 Members | 1,417 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

request help with Pipe class in iterwrap.py

While studying iterators and generator expressions, I started wishing I
had some tools for processing the values. I wanted to be able to chain
together a set of functions, sort of like the "pipelines" you can make
with command-line programs.

So, I wrote a module called iterwrap.py. You can download it from here:

http://home.blarg.net/~steveha/iterwrap.tar.gz

iterwrap has functions that "wrap" an iterator; when you call the
..next() method on a wrapped iterator, it will get the .next() value
from the original iterator, apply a function to it, and return the new
value. Of course, a wrapped iterator is itself an iterator, so you can
wrap it again: you can build up a "chain" of wrappers that will do the
processing you want.

As an example, here's a command-line pipeline:

cat mylist | sort | uniq > newlist
Here's the analogous example from iterwrap:

newlist = list(iterwrap.uniq(iterwrap.sort(mylist)))
You need to call list() because all the wrapper functions in iterwrap
always return an iterator. That final list() forces the iterator
returned by uniq() to be expanded out to a list.

iterwrap.py defines functions based on many common command-line tools:
sort, uniq, tr, grep, cat, head, tail, and tee. Plus it defines some
other functions that seemed like they would be useful.

Well, it doesn't take very many nested function calls before the call gets
visually confusing, with lots of parentheses at the end. To avoid this,
you can arrange the calls in a vertical chain, like this:

temp = iterwrap.sort(mylist)
temp = iterwrap.uniq(temp)
newlist = list(temp)
But I wanted to provide a convenience class to allow "dot-chaining". I
wanted something like this to work:

from iterwrap import *
newlist = Pipe(mylist).sort.uniq.list()
I have actually coded up two classes. One, Pipe, works as shown above.
The other, which I unimaginatively called "IW" (for "iterwrap"), works in
a right-to-left order:

from iterwrap import *
iw = IW()
newlist = iw.list.uniq.sort(mylist)
Hear now my cry for help:

Both IW() and Pipe() have annoying problems. I'd like to have one class
that just works.

The problem with Pipe() is that it will act very differently depending on
whether the user remembers to put the "()" on the end. For all the
dot-chained functions in the middle of the chain, you don't need to put
parentheses; it will just work. However, for the function at the end of
the dot-chain, you really ought to put the parentheses.

In the given example, if the user remembers to put the parentheses, mylist
will be set to a list; otherwise, mylist will be set to an instance of
class Pipe.

An instance of class Pipe works as an iterator, so in this example:

itr = Pipe(mylist).sort.uniq

....then the user really need not care whether there are parentheses after
uniq() or not. Which of course will make it all the more confusing when
the list() case breaks.
In comparison with Pipe, IW is clean and elegant. The user cannot forget
the parenthetical expression on the end, since that's where the initial
sequence (list or iterator) is provided! The annoying thing about IW is
that the dot-chained functions cannot have extra arguments passed in.

This example works correctly:

newlist = Pipe(mylist).grep("larch").grep("parrot", "v").list()

newlist will be set to a list of all strings from mylist that contain the
string "larch" but do not contain the string "parrot". There is no way to
do this example with IW, because IW expects just one call to its
__call__() function. The best you could do with IW is:

temp = iw.grep(mylist, "larch")
newlist = iw.list.grep(temp, "parrot", "v")

Since it *is* legal to pass extra arguments to the one permitted
__call__(), this works, but it's really not very much of an advantage over
the vertical chain:

temp = grep(mylist, "larch")
temp = grep(temp, "parrot", "v")
newlist = list(temp)

The key point here is that, when processing a dot-chain, my code doesn't
actually know whether it's looking at the end of the dot-chain. If you
had

newlist = Pipe(mylist).foo.bar.baz

and if my code could somehow know that baz is the last thing in the chain,
it could treat baz specially (and do the right thing whether there are
parentheses on it, or not). I wish there were a special method __set__
called when an expression is being assigned somewhere; that would make
this trivial.

What is the friendliest and most Pythonic way to write a Pipe class for
iterwrap?
P.S. I have experimented with overloading the | operator to allow this
syntax:

newlist = Pipe(mylist) | sort | uniq | list()

Personally, I much prefer the dot-chaining syntax. The above is just too
tricky.
--
Steve R. Hastings "Vita est"
st***@hastings.org http://www.blarg.net/~steveha

May 2 '06 #1
2 1645
In <pa****************************@hastings.org>, Steve R. Hastings wrote:
What is the friendliest and most Pythonic way to write a Pipe class for
iterwrap?


Maybe one with less "magic" syntax. What about using a function that
takes the iterators and an iterable and returns an iterator of the chained
iterators::

new_list = pipe(grep('larch'), grep('parrot', 'v')), list)(my_list)
Ciao,
Marc 'BlackJack' Rintsch
May 3 '06 #2
On Wed, 03 May 2006 08:01:12 +0200, Marc 'BlackJack' Rintsch wrote:
Maybe one with less "magic" syntax. What about using a function that
takes the iterators and an iterable and returns an iterator of the chained
iterators::

new_list = pipe(grep('larch'), grep('parrot', 'v')), list)(my_list)

This is a good idea. But not quite magic enough, I'm afraid.

One of the features of Pipe() is that it automatically pastes in the first
argument of each function call (namely, the iterator returned by the
previous function call). It is able to do this because of a special
__getattr__ that grabs the function reference but returns the "self"
instance of the Pipe class, to allow the dot-chain to continue.

Any supplied options will then be pasted in after that first argument.

In your example, "grep('larch')" is going to be evaluated by Python, and
immediately called. And it will then complain because its first argument
is not an iterator. I cannot see any way to modify this call before it
happens.
If we take your basic idea, and apply just a little bit of magic, we could
do this:

new_list = Pipe(my_list, grep, 'larch', grep, ('parrot', 'v'), list)
The rules would be:

* the first argument to Pipe is always the initial iterable sequence.

* each argument after that is tested to see if it is callable. If it is,
it's remembered; if not, it is presumed to be an argument for the
remembered callable. Multiple arguments must be packaged up into a tuple
or list. Once Pipe() has a callable and an argument or sequence of
arguments, Pipe() can paste in all arguments and make the call;
alternatively, once Pipe() sees another callable, it can safely assume
that there aren't going to be any extra arguments for the remembered
callable, and paste in that one iterator argument and make the call.

Now Pipe always knows when it has reached the last callable, because it
will have reached the end of the supplied arguments! Then it can safely
assume there aren't going to be any extra arguments, and make the call to
the last remembered callable.

However, I remain fond of the dot-chaining syntax. For interactively
playing around with data, I think the dot-chaining syntax is more natural
for most people.

newlist = Pipe(mylist).sort.uniq.list()

newlist = Pipe(mylist, sort, uniq, list)
Hmmm. The second one really isn't bad... Also, the second one doesn't
require my tricky e_eval() to work; it just lets Python figure out all the
function references.

Thinking about it, I realize that "list" is a very common thing with
which to end a dot-chain. I think perhaps if my code would just notice
that the last function reference is "list", which takes exactly one
argument and thus cannot be waiting for additional arguments, it could
just call list() right away.

If there were a general way to know that a function reference only expects
a single argument, I could generalize this idea. But it may be enough to
just do the special case for list().

I think I'll keep Pipe(), hackish as it is, but I will also add a new one
based on your idea. Maybe I'll call it "Chain()".

newlist = Chain(mylist, sort, uniq, list)
I did kind of want a way to make a "reusable pipe". If you come up with a
useful chain, it might be nice if you could use it again with convenient
syntax. Maybe like so:

sort_u = [sort, uniq, list]
newlist = Chain(mylist, sort_u)

Thank you very much for making a helpful suggestion!
--
Steve R. Hastings "Vita est"
st***@hastings.org http://www.blarg.net/~steveha

May 3 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Chris S. | last post by:
A wrote a small class to handle IO through pipes, but the connection only seems to work in one direction. The following code defines connection.py, engine.py, and controller.py. Once connected, the...
2
by: FB's .NET Dev PC | last post by:
I am writing two services in VB.NET, one of which needs to send text strings to the other. After reading, I decided (perhaps incorrectly) that named pipes would be the best interprocess...
0
by: Alex | last post by:
my app was working fine in VB.NET 2003 (and framework 1.1). Now with VB.NET 2005 (framework 2.0) the uploading to an http server (ie. www.sharebigfile.com) stops with the error "The request was...
0
by: Stewart Midwinter | last post by:
I have a Tkinter app running on cygwin. It includes a Test menu item that does nothing more than fetch a directory listing and display it in a Toplevel window (I'd use a tkMessageBox showinfo...
25
by: Matt Kruse | last post by:
According to HTTP/1.1 specs, a client should only have two connections open to the host at a time (which can be changed by browser users, of course). When using xmlHttpRequest connections, is...
0
by: Leon zhang | last post by:
#!/usr/bin/env python # -*- coding: utf-8 -*- import string, sys from threading import Thread import os import time class test_pipe(Thread): def __init__(self, fd):
0
by: Jean-Paul Calderone | last post by:
On Sun, 1 Jun 2008 07:32:39 -0700 (PDT), Leon zhang <leoncamel@gmail.comwrote: file.read() reads the entire contents of the file. Your code never closes the write end of the pipe, so the read...
2
by: =?Utf-8?B?QWxwaGFwYWdl?= | last post by:
Hello, I want to examine the HttpContext of the submitted requests. If the context is validated, no problem I execute ProcessRequest. But if the context doesn't match, I want to stop the...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.