472,102 Members | 2,103 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,102 software developers and data experts.

split a string of space separated substrings - elegant solution?

Hi,

I'm looking for an elegant solution to the following (quite common)
problem:

Given a string of substrings separated by white space,
split this into tuple/list of elements.
The problem are quoted substrings like

abc "xy z" "1 2 3" "a \" x"

should be split into ('abc','xy z','1 2 3','a " x')

For that, one probably has to protect white space between
quotes, then split by white space and finally converted the
'protected white space' to normal white space again.
Is there an elegant solution - perhaps without using a lexer
and something else. With regular expressions alone it seems
clumsy.

Many thanks for a hint,

Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany
Jul 31 '07 #1
5 3622
Helmut Jarausch wrote:
Hi,

I'm looking for an elegant solution to the following (quite common)
problem:

Given a string of substrings separated by white space,
split this into tuple/list of elements.
The problem are quoted substrings like

abc "xy z" "1 2 3" "a \" x"

should be split into ('abc','xy z','1 2 3','a " x')
import csv

s = 'abc "xy z" "1 2 3" "a \\" x"'
r = iter(csv.reader([s], delimiter=" ", escapechar="\\"))
print r.next()

w.
Jul 31 '07 #2
On Tue, 2007-07-31 at 22:30 +0200, Helmut Jarausch wrote:
Hi,

I'm looking for an elegant solution to the following (quite common)
problem:

Given a string of substrings separated by white space,
split this into tuple/list of elements.
The problem are quoted substrings like

abc "xy z" "1 2 3" "a \" x"

should be split into ('abc','xy z','1 2 3','a " x')
>>import shlex
shlex.split('abc "xy z" "1 2 3" "a \\" x"')
['abc', 'xy z', '1 2 3', 'a " x']

I hope that's elegant enough ;)

--
Carsten Haese
http://informixdb.sourceforge.net
Jul 31 '07 #3
On 7/31/07, Helmut Jarausch <ja******@skynet.bewrote:
I'm looking for an elegant solution to the following (quite common)
problem:

Given a string of substrings separated by white space,
split this into tuple/list of elements.
The problem are quoted substrings like

abc "xy z" "1 2 3" "a \" x"

should be split into ('abc','xy z','1 2 3','a " x')
Using the csv module gets you most of the way there. For instance:
>>import csv
text = r'abc "xy z" "1 2 3" "a \" x"'
reader = csv.reader([text], delimiter=" ", escapechar='\\')
for row in reader:
print row

['abc', 'xy z', '', '1 2 3', '', 'a " x']
>>>
That does leave you with empty elements where you had double spaces
between items though. you could fix that with something like:
>>for row in reader:
row = [element for element in row if element != '']
print row

['abc', 'xy z', '1 2 3', 'a " x']
>>>
The CSV module can handle lots of delimited data other that quote and
comma delimited. See the docs at:
http://docs.python.org/lib/module-csv.html and PEP 305:
http://www.python.org/dev/peps/pep-0305/

--
Jerry
Jul 31 '07 #4
On Jul 31, 3:30 pm, Helmut Jarausch <jarau...@skynet.bewrote:
I'm looking for an elegant solution to the following (quite common)
problem:

Given a string of substrings separated by white space,
split this into tuple/list of elements.
The problem are quoted substrings like

abc "xy z" "1 2 3" "a \" x"

should be split into ('abc','xy z','1 2 3','a " x')
Pyparsing has built-in support for special treatment of quoted
strings. Observe:

from pyparsing import *

data = r'abc "xy z" "1 2 3" "a \" x"'

quotedString.setParseAction(removeQuotes)
print OneOrMore(quotedString |
Word(printables) ).parseString(data)

prints:

['abc', 'xy z', '1 2 3', 'a \\" x']

Or perhaps a bit trickier, do the same while skipping items inside /*
*/ comments:

data = r'abc /* 456 "xy z" */ "1 2 3" "a \" x"'

quotedString.setParseAction(removeQuotes)
print OneOrMore(quotedString |
Word(printables) ) \
.ignore(cStyleComment).parseString(data)

prints:

['abc', '1 2 3', 'a \\" x']
-- Paul

Aug 1 '07 #5
Many thanks to all of you!
It's amazing how many elegant solutions there are in Python.
--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany
Aug 1 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Stu Cazzo | last post: by
3 posts views Thread by Ed Swartz | last post: by
2 posts views Thread by AMB | last post: by
8 posts views Thread by Braky Wacky | last post: by
5 posts views Thread by shaiful | last post: by
12 posts views Thread by Helmut Jarausch | last post: by
4 posts views Thread by N9 | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.