By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,422 Members | 1,309 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,422 IT Pros & Developers. It's quick & easy.

porting shell scripts: system(list), system_pipe(lists)

P: n/a
One of my recent projects has involved taking an accretion of sh and
perl scripts and "doing them right" - making them modular, improving
the error reporting, making it easier to add even more features to
them. "Of course," I'm redoing them in python - much of the cut&paste
reuse has become common functions, which then get made more robust and
have a common style and are callable from other (python) tools
directly, instead of having to exec scripts to get at them. The usual
"glorious refactoring."

Most of it has been great - os.listdir+i.endswith() instead of
globbing, exception handling instead of "exit 1", that sort of thing.
I've run into one weakness, though: executing programs.

Python has, of course, os.fork and os.exec* corresponding to the raw
unix functions. It also has the higher level os.system, popen,
expect, and commands.get* functions. The former need a bunch of
stylized operations performed; the latter *all* involve passing in
strings which then leads one to quoting issues, which can be serious
risks in some applications.

Perl had one very helpful interface for this kind of thing: system and
exec will both take array arguments:
$ perl -e 'system("echo", "*")'
*
$ perl -e 'exec("echo", "*")'
*
versus
$ perl -e 'exec("echo *")'
#.newsrc-dribble# CVS stuff ...
This has always struck me as "correct" - not the overloading,
necessarily, but the use of a list.

So, implementing system this way is easy enough:

def system(cmd):
pid = os.fork()
if pid > 0:
p, st = os.waitpid(pid, os.P_WAIT)
if st == 0:
return
raise ExecFailed(str(cmd), st)
elif pid == 0:
try:
os.execvp(cmd[0], cmd)
except OSError, e:
traceback.print_exc()
os._exit(113)

[The try/except is an interesting issue: if cmd[0] isn't found,
os.execvp throws -- but it is already in the child, and this walks up
the stack to any surrounding try/except, which then continues,
possibly disastrously, whatever that code had been doing *in a
duplicate process*. The _exit explicitly short cuts this.]

So, this makes a big difference when porting simple bits of shell (and
usually, just in passing, fixing quoting bugs - if you had code that
used to do "ci -l $foo" and it is now "system(['ci', '-l', foo])"
you now properly handle spaces and punctuation in the value of foo,
"for free".) However, the other thing you tend to find in
"advanced"[1] shell scripts is lengthy pipelines. (Sure, you find
while loops and case statements and such - but python's control
structures handle those fine.)

Implementing pipelines takes rather a bit more work, and one might
(not unreasonably) throw up one's hands and just use os.system and
some re.sub's to do the quoting. However, I had enough cases where
the goal really was to run a complex shell pipeline (I also had cases
where the pipeline converted nicely to some inline python code,
especially with the help of the gzip module) that I sat down and
cooked up a pipeline class.

The interface I ended up with is pretty simple:
g_pipe = pipeline()
g_pipe.stdin(open("blort.gz", "r"))
g_pipe.append(["gunzip"])
g_pipe.append(["sort", "-u"])
g_pipe.append(["wc", "-l"])
g_pipe.stdout(open("blort.count", "w"))
print g_pipe.run()

is equivalent to the sh:
gunzip < blort.gz | sort -u | wc -l > blort.count

pipeline also has obvious stderr and chdir methods; pipeline.run
actually returns an array with the return status of *each* pipeline
element (which leads to "if filter(None, st): deal_with_error" being a
useful idiom for noticing failures that a shell script would typically
miss.)

This has lead me to a few questions:

1. Am I being dense? Are there already common modules (included or
otherwise) that do this, or solve the problem some other way?
2. Is there a more pythonic way of expressing the construction?
Would exposing the internal array of commands make more sense,
possibly by "passing through" various array operations on the
class to the internal array (as the use of "append" hints at)? Or
maybe "exec" objects that a "pipe" combiner operates on?
3. Should an interface like this be in a "battery" somewhere? shutil
didn't seem to quite match...
4. Any reason to even try porting this interface to non-unix systems?
Is there a close enough match to os.pipe/os.fork/os.exec/os.wait,
or some other construct that works on microsoft platforms?

_Mark_ <ei****@metacarta.com>

[1] in the Invader Zim sense :)
Jul 18 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a


ei****@metacarta.com wrote:
One of my recent projects has involved taking an accretion of sh and
perl scripts and "doing them right" - making them modular, improving
the error reporting, making it easier to add even more features to
them. "Of course," I'm redoing them in python - much of the cut&paste
reuse has become common functions, which then get made more robust and
have a common style and are callable from other (python) tools
directly, instead of having to exec scripts to get at them. The usual
"glorious refactoring."
<<SNIP>>
Implementing pipelines takes rather a bit more work, and one might
(not unreasonably) throw up one's hands and just use os.system and
some re.sub's to do the quoting. However, I had enough cases where
the goal really was to run a complex shell pipeline (I also had cases
where the pipeline converted nicely to some inline python code,
especially with the help of the gzip module) that I sat down and
cooked up a pipeline class.

The interface I ended up with is pretty simple:
g_pipe = pipeline()
g_pipe.stdin(open("blort.gz", "r"))
g_pipe.append(["gunzip"])
g_pipe.append(["sort", "-u"])
g_pipe.append(["wc", "-l"])
g_pipe.stdout(open("blort.count", "w"))
print g_pipe.run()

is equivalent to the sh:
gunzip < blort.gz | sort -u | wc -l > blort.count
<<SNIP>>
_Mark_ <ei****@metacarta.com>

[1] in the Invader Zim sense :)


I think that your pipeline code looks nothing like the original sh
script pipeline which to me counts heavily against it.
Just playing at the cygwin prompt...
$ ls -l|wc -l > /tmp/lines_in_dir
$ cat /tmp/lines_in_dir
465
$ python
from os import system
system(r'''/bin/ls -l|/bin/wc -l > /tmp/lines_in_dir2''') 0 system(r'''/bin/cat /tmp/lines_in_dir2''')

463
0

I prefer the above because it looks like the original sh command.
Of course, if script security is very important then you may want to
change the way things are implemented again.

Cheers, Paddy.

Jul 18 '05 #2

P: n/a
Quoth ei****@metacarta.com:
....
| 1. Am I being dense? Are there already common modules (included or
| otherwise) that do this, or solve the problem some other way?

I can't tell you whether any of them has come to be common, but
there have been a handful of efforts along these lines - process
and pipeline creation.

| 2. Is there a more pythonic way of expressing the construction?
| Would exposing the internal array of commands make more sense,
| possibly by "passing through" various array operations on the
| class to the internal array (as the use of "append" hints at)? Or
| maybe "exec" objects that a "pipe" combiner operates on?

Only thing that comes to mind is error handling. It certainly is
not characteristic of Python functions to return an error status,
rather they typically raise exceptions. Ideally, I would think
the exception type for this would carry the exit status, other
information in the status word, and text from error/diagnostic
output. That last one is particularly important and particularly
awkward to get.

See appended example for a trick to deal with the special case
where a Python exception is caught in the fork.

| 3. Should an interface like this be in a "battery" somewhere? shutil
| didn't seem to quite match...

No one ever likes anyone else's version of this, so it's typically
reinvented as required.

| 4. Any reason to even try porting this interface to non-unix systems?
| Is there a close enough match to os.pipe/os.fork/os.exec/os.wait,
| or some other construct that works on microsoft platforms?

There's os.spawnv, if you haven't noticed that.

Donn Cave, do**@drizzle.com
-----------
import fcntl
import posix
import sys
import pickle

def spawn_wnw(wait, file, args, env):
p0, p1 = posix.pipe()
pid = posix.fork()
if pid:
posix.close(p1)
ps = posix.read(p0, 1024)
posix.close(p0)
if wait:
junk, ret = posix.waitpid(pid, 0)
else:
ret = pid
if ps:
e, v = pickle.loads(ps)
raise e, v
else:
return ret
else:
try:
fcntl.fcntl(p1, fcntl.F_SETFD, fcntl.FD_CLOEXEC)
posix.close(p0)
posix.execve(file, args, env)
except:
e, v, t = sys.exc_info()
s = pickle.dumps((e, v))
posix.write(p1, s)
posix._exit(117)

def spawnw(file, args, env):
spawn_wnw(1, file, args, env)

def spawn(file, args, env):
spawn_wnw(0, file, args, env)

pid = spawn('/bin/bummer', ['bummer', '-ever', 'summer'], posix.environ)
Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.