By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,614 Members | 1,662 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,614 IT Pros & Developers. It's quick & easy.

Regular Expressions to Split Lists Into Sub-Lists

P: n/a
For example, given a string "A, B, C (P, Q, R), D (X, Y [K, L, M ,N],
Z)".

Would like to split into tokens thusly:

a[0] == "A"
a[1] == "B"
a[2] == "C (P, Q, R)"
a[3] == "D (X, Y [K, L, M ,N], Z)"

i.e. do not descend into sub-lists

PHP split() using commas as a delimiter will give 14 tokens.

I can write a routine which checks the input byte by byte and
increments or decrements a counter based on how many opening "( [ {"
or closing ") ] }" brackets it sees. If counter 0, this means ignore
delimiters (i.e. keep looking). Guaranteed to work, but to my mind
seems to be rather clunky.

Is it possible to extract the tokens using regular expressions? E.g.
substitute highest level commas with a special delimiter say "~", and
split using that delimiter.

Thanks for reading.

Regards,

YR
Aug 17 '08 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Yimin Rong wrote:
For example, given a string "A, B, C (P, Q, R), D (X, Y [K, L, M ,N],
Z)".

Would like to split into tokens thusly:

a[0] == "A"
a[1] == "B"
a[2] == "C (P, Q, R)"
a[3] == "D (X, Y [K, L, M ,N], Z)"

i.e. do not descend into sub-lists

PHP split() using commas as a delimiter will give 14 tokens.

I can write a routine which checks the input byte by byte and
increments or decrements a counter based on how many opening "( [ {"
or closing ") ] }" brackets it sees. If counter 0, this means ignore
delimiters (i.e. keep looking). Guaranteed to work, but to my mind
seems to be rather clunky.

Is it possible to extract the tokens using regular expressions? E.g.
substitute highest level commas with a special delimiter say "~", and
split using that delimiter.
Seems like you already have your answer here. If the delimiters for
the top-level are different, it shouldn't be a problem to split on them.

--
Curtis
Aug 17 '08 #2

P: n/a
On Aug 17, 6:29 pm, Curtis <dye...@gmail.comwrote:
Yimin Rong wrote:
For example, given a string "A, B, C (P, Q, R), D (X, Y [K, L, M ,N],
Z)".
Would like tosplitinto tokens thusly:
a[0] == "A"
a[1] == "B"
a[2] == "C (P, Q, R)"
a[3] == "D (X, Y [K, L, M ,N], Z)"
i.e. do not descend into sub-lists
PHPsplit() using commas as a delimiter will give 14 tokens.
I can write a routine which checks the input byte by byte and
increments or decrements a counter based on how many opening "( [ {"
or closing ") ] }" brackets it sees. If counter 0, this means ignore
delimiters (i.e. keep looking). Guaranteed to work, but to my mind
seems to be rather clunky.
Is it possible to extract the tokens using regular expressions? E.g.
substitute highest level commas with a special delimiter say "~", and
splitusing that delimiter.

Seems like you already have your answer here. If the delimiters for
the top-level are different, it shouldn't be a problem tospliton them.

--
Curtis
Agreed, however the step I need is to replace top level delimiters in
the input. Do you think regular expression substitution can do this? /
YR
Aug 18 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.