Hello,
I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.
Examples:
'foo, bar, baz' ='foo' 'bar' 'baz'
'foo, "bar, baz", blurf' ='foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' ='foo' 'bar(baz, blurf)' 'mumble'
Can someone suggest a suitable regular expression or other
method to split such strings?
Thank you very much for your help.
Robert 5 4730
On Jun 18, 10:19 am, Robert Dodier <robert.dod...@gmail.comwrote:
Hello,
I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.
Examples:
'foo, bar, baz' ='foo' 'bar' 'baz'
'foo, "bar, baz", blurf' ='foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' ='foo' 'bar(baz, blurf)' 'mumble'
Can someone suggest a suitable regular expression or other
method to split such strings?
Thank you very much for your help.
Robert
You might look at the shlex module. It doesn't get you 100%, but its
close:
>>shlex.split('foo, bar, baz')
['foo,', 'bar,', 'baz']
>>shlex.split( 'foo, "bar, baz", blurf')
['foo,', 'bar, baz,', 'blurf']
>>shlex.split('foo, bar(baz, blurf), mumble')
['foo,', 'bar(baz,', 'blurf),', 'mumble']
Using a RE will be tricky, especially if it is possible to have
recursive nesting (which by definition REs can't handle). For a real
general purpose solution you will need to create a custom parser.
There are a couple modules out there that can help you with that.
pyparsing is one: http://pyparsing.wikispaces.com/
Matt
Hi,
Le Wednesday 18 June 2008 19:19:57 Robert Dodier, vous avez écrit*:
Hello,
I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.
Examples:
'foo, bar, baz' ='foo' 'bar' 'baz'
'foo, "bar, baz", blurf' ='foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' ='foo' 'bar(baz, blurf)' 'mumble'
Can someone suggest a suitable regular expression or other
method to split such strings?
I'd do something like this (note that it doesn't check for quote/parenthesis
mismatch and removes _all_ the quotes) :
def mysplit (string) :
pardepth = 0
quote = False
ret = ['']
for car in string :
if car == '(' : pardepth += 1
elif car == ')' : pardepth -= 1
elif car in ('"', "'") :
quote = not quote
car = '' # just if you don't want to keep the quotes
if car in ', ' and not (pardepth or quote) :
if ret[-1] != '' : ret.append('')
else :
ret[-1] += car
return ret
# test
for s in ('foo, bar, baz',
'foo, "bar, baz", blurf',
'foo, bar(baz, blurf), mumble') :
print "'%s' ='%s'" % (s, mysplit(s))
# result
'foo, bar, baz' ='['foo', 'bar', 'baz']'
'foo, "bar, baz", blurf' ='['foo', 'bar, baz', 'blurf']'
'foo, bar(baz, blurf), mumble' ='['foo', 'bar(baz, blurf)', 'mumble']'
--
Cédric Lucantis
On Jun 18, 12:19*pm, Robert Dodier <robert.dod...@gmail.comwrote:
Hello,
I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.
Examples:
'foo, bar, baz' ='foo' 'bar' 'baz'
'foo, "bar, baz", blurf' ='foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' ='foo' 'bar(baz, blurf)' 'mumble'
Can someone suggest a suitable regular expression or other
method to split such strings?
Thank you very much for your help.
Robert
tests = """\
foo, bar, baz
foo, "bar, baz", blurf
foo, bar(baz, blurf), mumble""".splitlines()
from pyparsing import Word, alphas, alphanums, Optional, \
Group, delimitedList, quotedString
ident = Word(alphas+"_",alphanums+"_")
func_call = Group(ident + "(" + Optional(Group(delimitedList(ident)))
+ ")")
listItem = func_call | ident | quotedString
for t in tests:
print delimitedList(listItem).parseString(t).asList()
Prints:
['foo', 'bar', 'baz']
['foo', '"bar, baz"', 'blurf']
['foo', ['bar', '(', ['baz', 'blurf'], ')'], 'mumble']
-- Paul
On Jun 18, 10:54 am, Matimus <mccre...@gmail.comwrote:
On Jun 18, 10:19 am, Robert Dodier <robert.dod...@gmail.comwrote:
Hello,
I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.
Examples:
'foo, bar, baz' ='foo' 'bar' 'baz'
'foo, "bar, baz", blurf' ='foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' ='foo' 'bar(baz, blurf)' 'mumble'
Can someone suggest a suitable regular expression or other
method to split such strings?
Thank you very much for your help.
Robert
You might look at the shlex module. It doesn't get you 100%, but its
close:
>shlex.split('foo, bar, baz')
['foo,', 'bar,', 'baz']>>shlex.split( 'foo, "bar, baz", blurf')
['foo,', 'bar, baz,', 'blurf']>>shlex.split('foo, bar(baz, blurf), mumble')
['foo,', 'bar(baz,', 'blurf),', 'mumble']
Using a RE will be tricky, especially if it is possible to have
recursive nesting (which by definition REs can't handle). For a real
general purpose solution you will need to create a custom parser.
There are a couple modules out there that can help you with that.
pyparsing is one:http://pyparsing.wikispaces.com/
Matt
Following up to my own post, Here is a working example that uses the
built-in _ast module. I posted something similar the other day. This
uses pythons own internal parser to do it for you. It works in this
case because, at least from what you have posted, your syntax doesn't
violate python syntax. -
import _ast
-
-
def eval_tuple(text):
-
""" Evaluate a string representing a tuple of strings, names and
-
calls,
-
returns a tuple of strings.
-
"""
-
-
ast = compile(text, "<string>", 'eval', _ast.PyCF_ONLY_AST)
-
return _traverse(ast.body)
-
-
def _traverse(ast):
-
""" Traverse the AST returning string representations of tuples
-
strings
-
names and calls.
-
"""
-
if isinstance(ast, _ast.Tuple):
-
return tuple(_traverse(el) for el in ast.elts)
-
elif isinstance(ast, _ast.Str):
-
return ast.s
-
elif isinstance(ast, _ast.Name):
-
return ast.id
-
elif isinstance(ast, _ast.Call):
-
name = ast.func.id
-
args = [_traverse(x) for x in ast.args]
-
return "%s(%s)"%(name, ", ".join(args))
-
raise SyntaxError()
-
-
examples = [
-
('foo, bar, baz', ('foo', 'bar', 'baz')),
-
('foo, "bar, baz", blurf', ('foo', 'bar, baz', 'blurf')),
-
('foo, bar(baz, blurf), mumble', ('foo', 'bar(baz, blurf)',
-
'mumble')),
-
]
-
-
def test():
-
for text, expected in examples:
-
print "trying %r =%r"%(text, expected)
-
result = eval_tuple(text)
-
if result == expected:
-
print "PASS"
-
else:
-
print "FAIL, GOT: %r"%result
-
-
if __name__ == "__main__":
-
test()
-
Matt
I have actually highlighted a small neat recipe for doing such
unpacking, that I use for parsing arbitrary parameters in Evoque
Templating. I never needed to handle "callable" parameters though, as
you do in your 3rd string example, so the little "unpack_symbol"
recipe I have publiched earlier does not handle it... anyhow, what I
referring to are:
Evoque Templating: http://evoque.gizmojo.org/
Code highlight: http://gizmojo.org/code/unpack_symbol/
However, a little variation of the aboverecipe can do what you are
looking for, in a rather cute way. The difference is to make the
original recipe handle "callable strings", and I achieve this by
modifying the recipe like so:
class callable_str(str):
def __call__(s, *args):
return s+str(args)
class _UnpackGlobals(dict):
def __getitem__(self, name):
return callable_str(name)
def unpack_symbol(symbol, globals=_UnpackGlobals()):
""" If compound symbol (list, tuple, nested) unpack to atomic
symbols """
return eval(symbol, globals, None)
Now, calling unpack_symbol() on each sample string gives the following
tuple of strings:
>>unpack_symbol('foo, bar, baz')
('foo', 'bar', 'baz')
>>unpack_symbol('foo, "bar, baz", blurf')
('foo', 'bar, baz', 'blurf')
>>unpack_symbol('foo, bar(baz, blurf), mumble')
('foo', "bar('baz', 'blurf')", 'mumble')
>>>
Mario Ruggier This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: William Stacey [MVP] |
last post by:
Would like help with a (I think) a common regex split example. Thanks for
your example in advance. Cheers!
Source Data Example:
one "two three" four
Optional, but would also like to...
|
by: Senthil |
last post by:
Code
----------------------
string Line = "\"A\",\"B\",\"C\",\"D\"";
string Line2 = Line.Replace("\",\"","\"\",\"\"");
string CSVColumns = Line2.Split("\",\"".ToCharArray());
|
by: mannyGonzales |
last post by:
Hey guys,
Earliery I posted this common task of reading a csv file.
My data read as: "1","2","3"
Unfortunately it now reads as:
"1","Text with, comma", "2"
embedded commas!...
|
by: pantagruel |
last post by:
Hi,
I'm looking for an optimal javascript function to split a camelcase
string and return an array.
I suppose one could loop through the string, check if character is
uppercase and start...
|
by: Shawn Minisall |
last post by:
I'm trying to unpack a list of 5 floats from a list read from a file and
python is telling me 5 variables are too many for the string.split
statement. Anyone have any other idea's? NOTE: the only...
|
by: Siah |
last post by:
Hi,
I need to convert the string: '(a, b, "c", d, "e")' into the following
list . Much like a csv reader does. I usually
use the split function, but this mini-monster wouldn't properly get...
|
by: teddyber |
last post by:
Hello,
first i'm a newbie to python (but i searched the Internet i swear).
i'm looking for some way to split up a string into a list of pairs
'key=value'. This code should be able to handle this...
|
by: Yimin Rong |
last post by:
For example, given a string "A, B, C (P, Q, R), D (X, Y ,
Z)".
Would like to split into tokens thusly:
a == "A"
a == "B"
a == "C (P, Q, R)"
a == "D (X, Y , Z)"
|
by: sicarie |
last post by:
I am attempting to parse a CSV, but am not allowed to install the CSV parsing module because of "security reasons" (what a joke), so I'm attempting to use 'split' to break up a comma-delimited file....
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
| |