473,776 Members | 1,574 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to split a string containing nested commas-separated substrings

Hello,

I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.

Examples:

'foo, bar, baz' ='foo' 'bar' 'baz'
'foo, "bar, baz", blurf' ='foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' ='foo' 'bar(baz, blurf)' 'mumble'

Can someone suggest a suitable regular expression or other
method to split such strings?

Thank you very much for your help.

Robert
Jun 27 '08 #1
5 4778
On Jun 18, 10:19 am, Robert Dodier <robert.dod...@ gmail.comwrote:
Hello,

I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.

Examples:

'foo, bar, baz' ='foo' 'bar' 'baz'
'foo, "bar, baz", blurf' ='foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' ='foo' 'bar(baz, blurf)' 'mumble'

Can someone suggest a suitable regular expression or other
method to split such strings?

Thank you very much for your help.

Robert
You might look at the shlex module. It doesn't get you 100%, but its
close:
>>shlex.split(' foo, bar, baz')
['foo,', 'bar,', 'baz']
>>shlex.split ( 'foo, "bar, baz", blurf')
['foo,', 'bar, baz,', 'blurf']
>>shlex.split(' foo, bar(baz, blurf), mumble')
['foo,', 'bar(baz,', 'blurf),', 'mumble']

Using a RE will be tricky, especially if it is possible to have
recursive nesting (which by definition REs can't handle). For a real
general purpose solution you will need to create a custom parser.
There are a couple modules out there that can help you with that.

pyparsing is one: http://pyparsing.wikispaces.com/

Matt
Jun 27 '08 #2
Hi,

Le Wednesday 18 June 2008 19:19:57 Robert Dodier, vous avez écrit*:
Hello,

I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.

Examples:

'foo, bar, baz' ='foo' 'bar' 'baz'
'foo, "bar, baz", blurf' ='foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' ='foo' 'bar(baz, blurf)' 'mumble'

Can someone suggest a suitable regular expression or other
method to split such strings?
I'd do something like this (note that it doesn't check for quote/parenthesis
mismatch and removes _all_ the quotes) :

def mysplit (string) :
pardepth = 0
quote = False
ret = ['']

for car in string :

if car == '(' : pardepth += 1
elif car == ')' : pardepth -= 1
elif car in ('"', "'") :
quote = not quote
car = '' # just if you don't want to keep the quotes

if car in ', ' and not (pardepth or quote) :
if ret[-1] != '' : ret.append('')
else :
ret[-1] += car

return ret

# test
for s in ('foo, bar, baz',
'foo, "bar, baz", blurf',
'foo, bar(baz, blurf), mumble') :
print "'%s' ='%s'" % (s, mysplit(s))

# result
'foo, bar, baz' ='['foo', 'bar', 'baz']'
'foo, "bar, baz", blurf' ='['foo', 'bar, baz', 'blurf']'
'foo, bar(baz, blurf), mumble' ='['foo', 'bar(baz, blurf)', 'mumble']'
--
Cédric Lucantis
Jun 27 '08 #3
On Jun 18, 12:19*pm, Robert Dodier <robert.dod...@ gmail.comwrote:
Hello,

I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.

Examples:

'foo, bar, baz' ='foo' 'bar' 'baz'
'foo, "bar, baz", blurf' ='foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' ='foo' 'bar(baz, blurf)' 'mumble'

Can someone suggest a suitable regular expression or other
method to split such strings?

Thank you very much for your help.

Robert
tests = """\
foo, bar, baz
foo, "bar, baz", blurf
foo, bar(baz, blurf), mumble""".split lines()
from pyparsing import Word, alphas, alphanums, Optional, \
Group, delimitedList, quotedString

ident = Word(alphas+"_" ,alphanums+"_")
func_call = Group(ident + "(" + Optional(Group( delimitedList(i dent)))
+ ")")

listItem = func_call | ident | quotedString

for t in tests:
print delimitedList(l istItem).parseS tring(t).asList ()
Prints:

['foo', 'bar', 'baz']
['foo', '"bar, baz"', 'blurf']
['foo', ['bar', '(', ['baz', 'blurf'], ')'], 'mumble']
-- Paul
Jun 27 '08 #4
On Jun 18, 10:54 am, Matimus <mccre...@gmail .comwrote:
On Jun 18, 10:19 am, Robert Dodier <robert.dod...@ gmail.comwrote:
Hello,
I'd like to split a string by commas, but only at the "top level" so
to speak. An element can be a comma-less substring, or a
quoted string, or a substring which looks like a function call.
If some element contains commas, I don't want to split it.
Examples:
'foo, bar, baz' ='foo' 'bar' 'baz'
'foo, "bar, baz", blurf' ='foo' 'bar, baz' 'blurf'
'foo, bar(baz, blurf), mumble' ='foo' 'bar(baz, blurf)' 'mumble'
Can someone suggest a suitable regular expression or other
method to split such strings?
Thank you very much for your help.
Robert

You might look at the shlex module. It doesn't get you 100%, but its
close:
>shlex.split('f oo, bar, baz')

['foo,', 'bar,', 'baz']>>shlex.split ( 'foo, "bar, baz", blurf')

['foo,', 'bar, baz,', 'blurf']>>shlex.split(' foo, bar(baz, blurf), mumble')

['foo,', 'bar(baz,', 'blurf),', 'mumble']

Using a RE will be tricky, especially if it is possible to have
recursive nesting (which by definition REs can't handle). For a real
general purpose solution you will need to create a custom parser.
There are a couple modules out there that can help you with that.

pyparsing is one:http://pyparsing.wikispaces.com/

Matt
Following up to my own post, Here is a working example that uses the
built-in _ast module. I posted something similar the other day. This
uses pythons own internal parser to do it for you. It works in this
case because, at least from what you have posted, your syntax doesn't
violate python syntax.

Expand|Select|Wrap|Line Numbers
  1. import _ast
  2.  
  3. def eval_tuple(text):
  4. """ Evaluate a string representing a tuple of strings, names and
  5. calls,
  6. returns a tuple of strings.
  7. """
  8.  
  9. ast = compile(text, "<string>", 'eval', _ast.PyCF_ONLY_AST)
  10. return _traverse(ast.body)
  11.  
  12. def _traverse(ast):
  13. """ Traverse the AST returning string representations of tuples
  14. strings
  15. names and calls.
  16. """
  17. if isinstance(ast, _ast.Tuple):
  18. return tuple(_traverse(el) for el in ast.elts)
  19. elif isinstance(ast, _ast.Str):
  20. return ast.s
  21. elif isinstance(ast, _ast.Name):
  22. return ast.id
  23. elif isinstance(ast, _ast.Call):
  24. name = ast.func.id
  25. args = [_traverse(x) for x in ast.args]
  26. return "%s(%s)"%(name, ", ".join(args))
  27. raise SyntaxError()
  28.  
  29. examples = [
  30. ('foo, bar, baz', ('foo', 'bar', 'baz')),
  31. ('foo, "bar, baz", blurf', ('foo', 'bar, baz', 'blurf')),
  32. ('foo, bar(baz, blurf), mumble', ('foo', 'bar(baz, blurf)',
  33. 'mumble')),
  34. ]
  35.  
  36. def test():
  37. for text, expected in examples:
  38. print "trying %r =%r"%(text, expected)
  39. result = eval_tuple(text)
  40. if result == expected:
  41. print "PASS"
  42. else:
  43. print "FAIL, GOT: %r"%result
  44.  
  45. if __name__ == "__main__":
  46. test()
  47.  
Matt
Jun 27 '08 #5
I have actually highlighted a small neat recipe for doing such
unpacking, that I use for parsing arbitrary parameters in Evoque
Templating. I never needed to handle "callable" parameters though, as
you do in your 3rd string example, so the little "unpack_sym bol"
recipe I have publiched earlier does not handle it... anyhow, what I
referring to are:

Evoque Templating: http://evoque.gizmojo.org/
Code highlight: http://gizmojo.org/code/unpack_symbol/

However, a little variation of the aboverecipe can do what you are
looking for, in a rather cute way. The difference is to make the
original recipe handle "callable strings", and I achieve this by
modifying the recipe like so:
class callable_str(st r):
def __call__(s, *args):
return s+str(args)

class _UnpackGlobals( dict):
def __getitem__(sel f, name):
return callable_str(na me)

def unpack_symbol(s ymbol, globals=_Unpack Globals()):
""" If compound symbol (list, tuple, nested) unpack to atomic
symbols """
return eval(symbol, globals, None)
Now, calling unpack_symbol() on each sample string gives the following
tuple of strings:
>>unpack_symbol ('foo, bar, baz')
('foo', 'bar', 'baz')
>>unpack_symbol ('foo, "bar, baz", blurf')
('foo', 'bar, baz', 'blurf')
>>unpack_symbol ('foo, bar(baz, blurf), mumble')
('foo', "bar('baz', 'blurf')", 'mumble')
>>>

Mario Ruggier
Jun 27 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
728
by: William Stacey [MVP] | last post by:
Would like help with a (I think) a common regex split example. Thanks for your example in advance. Cheers! Source Data Example: one "two three" four Optional, but would also like to ignore pairs of brackets like: "one" <tab> "two three" ( four "five six" ) Want fields like:
6
6383
by: Senthil | last post by:
Code ---------------------- string Line = "\"A\",\"B\",\"C\",\"D\""; string Line2 = Line.Replace("\",\"","\"\",\"\""); string CSVColumns = Line2.Split("\",\"".ToCharArray());
8
2747
by: mannyGonzales | last post by:
Hey guys, Earliery I posted this common task of reading a csv file. My data read as: "1","2","3" Unfortunately it now reads as: "1","Text with, comma", "2" embedded commas! --------------------------------------------
10
11950
by: pantagruel | last post by:
Hi, I'm looking for an optimal javascript function to split a camelcase string and return an array. I suppose one could loop through the string, check if character is uppercase and start building a new word to add to the array but that seems incredibly wasteful. must be some easy way to do it.
2
3755
by: Shawn Minisall | last post by:
I'm trying to unpack a list of 5 floats from a list read from a file and python is telling me 5 variables are too many for the string.split statement. Anyone have any other idea's? NOTE: the only reason I convert it to a float instead of just leaving it as a string in the loop is because I have to have it printed out as a float besides the names and then the average displayed underneath thx #read in data line by line
12
1978
by: Siah | last post by:
Hi, I need to convert the string: '(a, b, "c", d, "e")' into the following list . Much like a csv reader does. I usually use the split function, but this mini-monster wouldn't properly get split up due to those random quotations postgresql returns to me. Please help me with this, Thanks, Sia
10
2514
by: teddyber | last post by:
Hello, first i'm a newbie to python (but i searched the Internet i swear). i'm looking for some way to split up a string into a list of pairs 'key=value'. This code should be able to handle this particular example string : qop="auth,auth-int,auth-conf",cipher="rc4-40,rc4-56,rc4,des, 3des",maxbuf=1024,charset=utf-8,algorithm=md5-sess
2
1541
by: Yimin Rong | last post by:
For example, given a string "A, B, C (P, Q, R), D (X, Y , Z)". Would like to split into tokens thusly: a == "A" a == "B" a == "C (P, Q, R)" a == "D (X, Y , Z)"
5
12002
sicarie
by: sicarie | last post by:
I am attempting to parse a CSV, but am not allowed to install the CSV parsing module because of "security reasons" (what a joke), so I'm attempting to use 'split' to break up a comma-delimited file. My issue is that as soon as an "empty" field comes up (two commas in a row), split seems to think the line is done and goes to the next one. Everything I've read online says that split will return a null field, but I don't know how to get it to...
0
9464
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10289
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10120
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10061
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9923
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
5367
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5493
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3622
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2860
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.