473,468 Members | 1,349 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Regular Expressions to Split Lists Into Sub-Lists

For example, given a string "A, B, C (P, Q, R), D (X, Y [K, L, M ,N],
Z)".

Would like to split into tokens thusly:

a[0] == "A"
a[1] == "B"
a[2] == "C (P, Q, R)"
a[3] == "D (X, Y [K, L, M ,N], Z)"

i.e. do not descend into sub-lists

PHP split() using commas as a delimiter will give 14 tokens.

I can write a routine which checks the input byte by byte and
increments or decrements a counter based on how many opening "( [ {"
or closing ") ] }" brackets it sees. If counter 0, this means ignore
delimiters (i.e. keep looking). Guaranteed to work, but to my mind
seems to be rather clunky.

Is it possible to extract the tokens using regular expressions? E.g.
substitute highest level commas with a special delimiter say "~", and
split using that delimiter.

Thanks for reading.

Regards,

YR
Aug 17 '08 #1
2 1527
Yimin Rong wrote:
For example, given a string "A, B, C (P, Q, R), D (X, Y [K, L, M ,N],
Z)".

Would like to split into tokens thusly:

a[0] == "A"
a[1] == "B"
a[2] == "C (P, Q, R)"
a[3] == "D (X, Y [K, L, M ,N], Z)"

i.e. do not descend into sub-lists

PHP split() using commas as a delimiter will give 14 tokens.

I can write a routine which checks the input byte by byte and
increments or decrements a counter based on how many opening "( [ {"
or closing ") ] }" brackets it sees. If counter 0, this means ignore
delimiters (i.e. keep looking). Guaranteed to work, but to my mind
seems to be rather clunky.

Is it possible to extract the tokens using regular expressions? E.g.
substitute highest level commas with a special delimiter say "~", and
split using that delimiter.
Seems like you already have your answer here. If the delimiters for
the top-level are different, it shouldn't be a problem to split on them.

--
Curtis
Aug 17 '08 #2
On Aug 17, 6:29 pm, Curtis <dye...@gmail.comwrote:
Yimin Rong wrote:
For example, given a string "A, B, C (P, Q, R), D (X, Y [K, L, M ,N],
Z)".
Would like tosplitinto tokens thusly:
a[0] == "A"
a[1] == "B"
a[2] == "C (P, Q, R)"
a[3] == "D (X, Y [K, L, M ,N], Z)"
i.e. do not descend into sub-lists
PHPsplit() using commas as a delimiter will give 14 tokens.
I can write a routine which checks the input byte by byte and
increments or decrements a counter based on how many opening "( [ {"
or closing ") ] }" brackets it sees. If counter 0, this means ignore
delimiters (i.e. keep looking). Guaranteed to work, but to my mind
seems to be rather clunky.
Is it possible to extract the tokens using regular expressions? E.g.
substitute highest level commas with a special delimiter say "~", and
splitusing that delimiter.

Seems like you already have your answer here. If the delimiters for
the top-level are different, it shouldn't be a problem tospliton them.

--
Curtis
Agreed, however the step I need is to replace top level delimiters in
the input. Do you think regular expression substitution can do this? /
YR
Aug 18 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Michael McGarry | last post by:
Hi, I am horrible with Regular Expressions, can anyone recommend a book on it? Also I am trying to parse the following string to extract the number after load average. ".... load average:...
11
by: Martin Robins | last post by:
I am trying to parse a string that is similar in form to an OLEDB connection string using regular expressions; in principle it is working, but certain character combinations in the string being...
7
by: Chris | last post by:
hello, I have question about the re.I option for Regular Expressions: >>> import re >>> re.findall('x', '1x2X3', re.I) as expected finds both lower and uppercase x
6
by: sheffdog | last post by:
Hello, I often find myself cleaning up strings like the following: setAttr ".ftn" -type "string" /assets/chars/ /boya/geo/textures/lod1/ppbhat.tga"; Using regular expressions, the best I...
9
by: www.douglassdavis.com | last post by:
I am using the preg_match function (in PHP) that uses perl regular expressions. Apparently I don't really understand regular expressions though. Could some one explain this? If this is the...
9
by: Schorschi | last post by:
Not having used regular expressions much, I need some help. Given a string... "This\0Guy\0Needs\0Some\0Help\0\0\0\0\0" Need result as array of strings... "This","Guy", "Needs", "Some", "Help" ...
7
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...
25
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
10
by: Julien | last post by:
Hi, I'm fairly new in Python and I haven't used the regular expressions enough to be able to achieve what I want. I'd like to select terms in a string, so I can then do a search in my database....
6
by: chrispoliquin | last post by:
I am using regular expressions to search a string (always full sentences, maybe more than one sentence) for common abbreviations and remove the periods. I need to break the string into different...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.