By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,441 Members | 1,656 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,441 IT Pros & Developers. It's quick & easy.

split a line, respecting double quotes

P: n/a
Jim
Is there some easy way to split a line, keeping together double-quoted
strings?

I'm thinking of
'a b c "d e"' --['a','b','c','d e']
.. I'd also like
'a b c "d \" e"' --['a','b','c','d " e']
which omits any s.split('"')-based construct that I could come up with.

Thank you,
JIm

Jul 7 '06 #1
Share this Question
Share on Google+
8 Replies


P: n/a
import re
re.findall('\".*\"|\S+', raw_input())

Jim wrote:
Is there some easy way to split a line, keeping together double-quoted
strings?

I'm thinking of
'a b c "d e"' --['a','b','c','d e']
. I'd also like
'a b c "d \" e"' --['a','b','c','d " e']
which omits any s.split('"')-based construct that I could come up with.

Thank you,
JIm
Jul 7 '06 #2

P: n/a
Jim wrote:
Is there some easy way to split a line, keeping together double-quoted
strings?
using the re module I find this to probably be the easiest but in no
way is this gospel :)

import re
rex = re.compile(r'(".*?"|\S)')
sub = 'a b c "d e"'
res = [x for x in re.split(rex, sub) if not x.isspace()][1:-1]
print res # -['a', 'b', 'c', '"d e"']

basically import the re module, compile a pattern, identify a string,
create a list comprehension with a filter, slice out the result and
print to screen. I hope this helps.

Jul 7 '06 #3

P: n/a
sorry, i didn't read all your post.
def test(s):
res = ['']
in_dbl = False
escaped = False
for c in s:
if in_dbl:
if escaped:
res[-1] += c
if c != '\\':
escaped = False
else:
res[-1] += c
if c == '\\':
escaped = True
elif c == '"':
res.append('')
in_dbl = False
elif c == ' ':
res.append('')
elif c == '"':
res.append('')
res[-1] += c
in_dbl = True
else:
res[-1] += c
while '' in res:
res.remove('')
return res

faulkner wrote:
import re
re.findall('\".*\"|\S+', raw_input())

Jim wrote:
Is there some easy way to split a line, keeping together double-quoted
strings?

I'm thinking of
'a b c "d e"' --['a','b','c','d e']
. I'd also like
'a b c "d \" e"' --['a','b','c','d " e']
which omits any s.split('"')-based construct that I could come up with.

Thank you,
JIm
Jul 7 '06 #4

P: n/a
Jim wrote:
Is there some easy way to split a line, keeping together double-quoted
strings?

I'm thinking of
'a b c "d e"' --['a','b','c','d e']
. I'd also like
'a b c "d \" e"' --['a','b','c','d " e']
which omits any s.split('"')-based construct that I could come up with.
>>import shlex
shlex.split('a b c "d e"')
['a', 'b', 'c', 'd e']
>>shlex.split(r'a b c "d \" e"')
['a', 'b', 'c', 'd " e']

Note that I had to use a raw string in the latter case because otherwise
there's no real backslash in the string::
>>'a b c "d \" e"'
'a b c "d " e"'
>>r'a b c "d \" e"'
'a b c "d \\" e"'

STeVe
Jul 7 '06 #5

P: n/a
Is there some easy way to split a line, keeping together double-quoted
strings?
import re
rex = re.compile(r'(".*?"|\S)')
sub = 'a b c "d e"'
res = [x for x in re.split(rex, sub) if not x.isspace()][1:-1]
print res # -['a', 'b', 'c', '"d e"']
instead of slicing the result out, you use this too:
res = [x for x in re.split(rex, sub) if x[0:].strip()]

Jul 7 '06 #6

P: n/a
Jim

Jim wrote:
Is there some easy way to split a line, keeping together double-quoted
strings?
Thank you for the replies.

Jim

Jul 8 '06 #7

P: n/a
Jim <jh*******@smcvt.eduwrote:
>Is there some easy way to split a line, keeping together double-quoted
strings?

I'm thinking of
'a b c "d e"' --['a','b','c','d e']
. I'd also like
'a b c "d \" e"' --['a','b','c','d " e']
which omits any s.split('"')-based construct that I could come up with.
>>csv.reader(StringIO.StringIO('a b c "d e"'), delimiter=' ').next()
['a', 'b', 'c', 'd e']

It can't quite do the second one, but:
>>csv.reader(StringIO.StringIO('a b c "d "" e"'), delimiter=' ').next()
['a', 'b', 'c', 'd " e']
isn't far off.

On the other hand, it's kind of a stupid solution. I'd really go with
shlex as someone suggested up thread.

--
\S -- si***@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
___ | "Frankly I have no feelings towards penguins one way or the other"
\X/ | -- Arthur C. Clarke
her nu becomež se bera eadward ofdun hlęddre heafdes bęce bump bump bump
Jul 10 '06 #8

P: n/a
Sion Arrowsmith wrote:
>csv.reader(StringIO.StringIO('a b c "d "" e"'), delimiter=' ').next()
['a', 'b', 'c', 'd " e']
isn't far off.

On the other hand, it's kind of a stupid solution.
IMO, this solution is on the right track.
FWIW, the StringIO wrapper is unnecessary.
Any iterable will do:
reader(['a b c "d e"'], delimiter=' ')
Raymond

Jul 10 '06 #9

This discussion thread is closed

Replies have been disabled for this discussion.