470,821 Members | 2,068 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,821 developers. It's quick & easy.

incorrect(?) shlex behaviour

Consider:
import shlex
shlex.split('$(which sh)')

['$(which', 'sh)']

Is this behavior correct? It seems that I should
either get one token, or the list
['$','(','which','sh',')'],
but certainly breaking it the way it does is
erroneous.

Can anyone explain why the string is being split
that way?

Jul 19 '05 #1
4 1298
bill wrote:
Consider:
import shlex
shlex.split('$(which sh)') ['$(which', 'sh)']

Is this behavior correct? It seems that I should
either get one token, or the list
['$','(','which','sh',')'],
but certainly breaking it the way it does is
erroneous.

Can anyone explain why the string is being split
that way?

This may help.
http://www.python.org/dev/doc/devel/...ule-shlex.html
This works on Python 2.4:
import shlex
sh = shlex.shlex('$(which sh)')
sh.get_token() '$' sh.get_token() '(' sh.get_token() 'which' sh.get_token() 'sh' sh.get_token() ')' sh.get_token() etc...

Python 2.2 and maybe lower: import shlex
import StringIO
s = StringIO.StringIO('$(which sh)')
sh = shlex.shlex(s)
sh.get_token() '$' sh.get_token() '(' sh.get_token() 'which' sh.get_token() 'sh' sh.get_token() ')' sh.get_token()

etc...

Hth,
M.E.Farmer

Jul 19 '05 #2
Its gets worse:
from shlex import StringIO
from shlex import shlex
t = shlex(StringIO("2>&1"))
while True: .... b = t.read_token()
.... if not b: break
.... print b
....
2
&
1 <----------- where's the '>' !? import shlex
print shlex.split("2>&1")

['2>&1']

It strikes me that split should be behaving exactly the same way as
read_token, but that may be a misunderstanding on my part of what split
is doing.

However, it is totally bizarre that read_token discards the '>' symbol
in the string! I don't know much about lexical analysis, but it
strikes me that discarding characters is a bad thing.

Jul 19 '05 #3
bill wrote:
Its gets worse:
from shlex import StringIO
from shlex import shlex
t = shlex(StringIO("2>&1"))
while True: ... b = t.read_token()
... if not b: break
... print b
...
2
&
1 <----------- where's the '>' !? import shlex
print shlex.split("2>&1") ['2>&1']

It strikes me that split should be behaving exactly the same way as
read_token, but that may be a misunderstanding on my part of what split is doing.

However, it is totally bizarre that read_token discards the '>' symbol in the string! I don't know much about lexical analysis, but it
strikes me that discarding characters is a bad thing.
From the docs: split(s[, comments])
Split the string s using shell-like syntax. If comments is False
(the default), the parsing of comments in the given string will be
disabled (setting the commenters member of the shlex instance to the
empty string). This function operates in POSIX mode. New in version
2.3.

Maybe looking at string methods split might help.
"$(which sh)".split() ['($(which', 'sh)']
From the docs:

read_token()
Read a raw token. Ignore the pushback stack, and do not interpret
source requests. (This is not ordinarily a useful entry point, and is
documented here only for the sake of completeness.)

# Just like in my first post from StringIO import StringIO
from shlex import shlex
t = shlex(StringIO("2>&1"))
t.get_token() '2' t.get_token() '>' t.get_token() '&' t.get_token() '1' t.get_token() ''
# Your way t = shlex(StringIO("2>&1"))
t.read_token() '2' t.read_token() '&' t.read_token() '1' t.read_token() ''


Hth,
M.E.Farmer

Jul 19 '05 #4
In article <11**********************@g49g2000cwa.googlegroups .com>,
"bill" <bi**********@gmail.com> wrote:
Consider:
import shlex
shlex.split('$(which sh)') ['$(which', 'sh)']

Is this behavior correct? It seems that I should
either get one token, or the list
['$','(','which','sh',')'],
but certainly breaking it the way it does is
erroneous.

Can anyone explain why the string is being split
that way?


Python 2.3.5 (#1, Mar 20 2005, 20:38:20)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1809)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
import shlex
print shlex.__doc__

A lexical analyzer class for simple shell-like syntaxes.
This has a little potential to mislead. Bourne shell
syntax is naturally "shell-like", but it is not "simple" -
as grammars go, it's a notorious mess. In theory, someone
could certainly write Python code to accurately parse Bourne
shell statements, but that doesn't appear to have been the
intention here. The "Parsing Rules" section of the documentation
describes what you can expect, and right off hand I don't see
how the result you got was erroneous.

Donn Cave, do**@u.washington.edu
Jul 19 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

7 posts views Thread by Severus Snape | last post: by
2 posts views Thread by Gerhard Esterhuizen | last post: by
7 posts views Thread by Netocrat | last post: by
11 posts views Thread by Mark Findlay | last post: by
8 posts views Thread by yuliy | last post: by
45 posts views Thread by simnav | last post: by
285 posts views Thread by Sheth Raxit | last post: by
reply views Thread by mihailmihai484 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.