473,326 Members | 2,136 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

complex regexp problem

Hi,

I need to split/match the following type of (singleline) syntax on all
commas (or text in between) that are not between qoutes:
A,'B,B',C,,'E',F

The text between quotes can be _any_ text ( except for newlines).

the regexp must either match : A | 'B,B' | C | [empty] | 'E' | F
(pipeline serves as separator here) or split on the first,third and the
rest of the comma's.
I can't get it done, I can't figure out how to exclude s'thing from an
expression except for single characters.

I've tried serveral options but all of them fail on 1 or more of the
following cases:
- 'A' only
- B only.

or do not catch the last item (F)

BTW whats the reason for most regexp testing apps that they do not
support/offer split functionality. Should I only use matching?

Nov 17 '05 #1
6 1270
Try the following code :
string myString="A,'B,B',C,,'E',F"
foreach(string s in Regex.Split(myString,@",(?![^']*',)(?<!,'[^']*)")) {
MessageBox.Show(s);
}

It gives : A | 'B,B' | C | [empty] | 'E' | F

Here some explanations for the regex ,(?![^']*',)(?<!,'[^']*) :
, means that you want to split with a coma
(?![^']*',) means that you do not want the coma being followed by some
characters, a closing quote and then a coma
(?<!,'[^']*) means that you do not want the coma being preceded by a
coma, an opening quote and some characters
The entire expression means that you want to split with a coma except from
the ones who are included between quotes.

Hope this helps

Ludovic SOEUR.

<pa*******@gmail.com> a écrit dans le message de
news:11**********************@g14g2000cwa.googlegr oups.com...
Hi,

I need to split/match the following type of (singleline) syntax on all
commas (or text in between) that are not between qoutes:
A,'B,B',C,,'E',F

The text between quotes can be _any_ text ( except for newlines).

the regexp must either match : A | 'B,B' | C | [empty] | 'E' | F
(pipeline serves as separator here) or split on the first,third and the
rest of the comma's.
I can't get it done, I can't figure out how to exclude s'thing from an
expression except for single characters.

I've tried serveral options but all of them fail on 1 or more of the
following cases:
- 'A' only
- B only.

or do not catch the last item (F)

BTW whats the reason for most regexp testing apps that they do not
support/offer split functionality. Should I only use matching?

Nov 17 '05 #2
Thanks a lot, it did the trick.
I never thought of opening/closing quotes, doh!
I Also added (?<![,]$) to the expression which skips the last comma if
nothing follows.
BTW : What exactly does the '<' in the second group do?

Nov 17 '05 #3
Oops : the expression cannot handle starting/ending commas between
quotes. It ignores the commas around X;

a,'b,b',X,',c,',d,',Y,

My addition to the expression results in a 'Y,', which is not desired
('Y' is; not including the (last) comma).

Can you extend you expression to match the commas around X?

Nov 17 '05 #4
You are right. It does not work for this tricky case.
I can try to correct it but I have to know exactly what you want :

To me, a,'b,b',X,',c,',d,',Y, can't be split correctly because the
opening and closing quotes doesn't match.

With a,'b,b',X,',c,',d,',Y,' I would split like that : a | 'b,b'
| X | ',c,' | d | ',Y,'
Or if you consider ,', as correct with the same meaning as '' I would
split like that :
a | 'b,b' | X | ' | c | ' | d | '
| Y
What is the behavior you would like to have ?

Ludovic Soeur.
<pa*******@gmail.com> a écrit dans le message de
news:11**********************@g43g2000cwa.googlegr oups.com...
Oops : the expression cannot handle starting/ending commas between
quotes. It ignores the commas around X;

a,'b,b',X,',c,',d,',Y,

My addition to the expression results in a 'Y,', which is not desired
('Y' is; not including the (last) comma).

Can you extend you expression to match the commas around X?

Nov 17 '05 #5
if you want
a | 'b,b' | X | ' | c | ' | d | ' | Y
you can use this one : ,(?![^']*',)(?<!,'[^']*)|,(?=',)|,(?<=',)
it's a trick to deal with ,',

Hope it helps.

Ludovic SOEUR.
<pa*******@gmail.com> a écrit dans le message de
news:11**********************@g43g2000cwa.googlegr oups.com...
Oops : the expression cannot handle starting/ending commas between
quotes. It ignores the commas around X;

a,'b,b',X,',c,',d,',Y,

My addition to the expression results in a 'Y,', which is not desired
('Y' is; not including the (last) comma).

Can you extend you expression to match the commas around X?

Nov 17 '05 #6
In article <11**********************@g43g2000cwa.googlegroups .com>,
<pa*******@gmail.com> wrote:

: Oops : the expression cannot handle starting/ending commas between
: quotes. It ignores the commas around X;
:
: a,'b,b',X,',c,',d,',Y,
:
: My addition to the expression results in a 'Y,', which is not desired
: ('Y' is; not including the (last) comma).
:
: Can you extend you expression to match the commas around X?

I borrowed the pattern from "How can I split a [character] delimited
string except when inside [character]?" in section 4 of the Perl FAQ[*]:
[*] http://xrl.us/ifw4 (perldoc.perl.org)

using System;
using System.Collections;
using System.Text.RegularExpressions;

using NUnit.Framework;

namespace Lib
{
public class Record
{
static private readonly Regex extract =
new Regex(
@"^(?:
(?<field>'[^\'\\]*(?:\\.[^\'\\]*)*'),?
| (?<field>[^,]*),
| (?<field>[^,]+),?
| ,
)+$",
RegexOptions.IgnorePatternWhitespace);

string[] fields;

public Record(string line)
{
Match m = extract.Match(line);

if (m.Success)
{
ArrayList hits = new ArrayList();

foreach (Capture field in m.Groups["field"].Captures)
hits.Add(field.Value);

fields = (string[]) hits.ToArray(typeof(string));
}
}

public string[] Fields
{
get { return fields; }
}
}

[TestFixture]
public class RecordTest
{
[Test]
public void OnlyAInQuotes()
{
string input = "'A'";
string[] expect = { input };

Assert.AreEqual(expect, new Record(input).Fields);
}

[Test]
public void OnlyB()
{
string input = "B";
string[] expect = { input };

Assert.AreEqual(expect, new Record(input).Fields);
}

[Test]
public void CommaInQuotes()
{
string input = "A,'B,B',C,,'E',F";
string[] expect = { "A", "'B,B'", "C", "", "'E'", "F" };

Assert.AreEqual(expect, new Record(input).Fields);
}

[Test]
public void LeadingAndTrailingCommasInFields()
{
string input = "a,'b,b',X,',c,',d,',Y,";
string[] expect = { "a", "'b,b'", "X", "',c,'", "d", "'", "Y" };

Assert.AreEqual(expect, new Record(input).Fields);
}
}
}

Hope this helps,
Greg
--
Under democracy one party always devotes its chief energies to trying to
prove that the other party is unfit to rule -- and both commonly succeed,
and are right.
-- H.L. Mencken
Nov 17 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
by: Magnus Lie Hetland | last post by:
I'm working on a project (Atox) where I need to match quite a few regular expressions (several hundred) in reasonably large text files. I've found that this can easily get rather slow. (There are...
1
by: geos | last post by:
hello, I have the problem writing the regular expression to verify the valid system path in the way that RegExp.$1 has to contain path up to the parent folder of a file, and RegExp.$2 should...
2
by: Tommy | last post by:
The problem is how to achieve the transformation as below: The source xml contains tons of repeating structure like below, each item node contains a person element and a insurance element that...
6
by: Christoph | last post by:
I'm trying to set up client side validation for a textarea form element to ensure that the data entered does not exceed 200 characters. I'm using the following code but it doesn't seem to be...
7
by: Csaba Gabor | last post by:
I need to come up with a function function regExpPos (text, re, parenNum) { ... } that will return the position within text of RegExp.$parenNum if there is a match, and -1 otherwise. For...
9
by: vbfoobar | last post by:
Hello I am looking for python code that takes as input a list of strings (most similar, but not necessarily, and rather short: say not longer than 50 chars) and that computes and outputs the...
2
by: Uldis Bojars | last post by:
Hi All, I have encountered problems with JS RegExp.exec() and can't find what is the problem. Could you help me? formRequest is a function that extracts some information from XMLHTTPRequest...
11
by: Fabri | last post by:
I searched and tried to develop (with no luck) a function to do the following: I have a string that may be: "Le'ts go to <a href="my.htm">my car</a>. Tomorrow I'll have to buy a new car. My...
4
by: r | last post by:
Hello, It seems delimiters can cause trouble sometimes. Look at this : <script type="text/javascript"> function isDigit(s) { var DECIMAL = '\\.'; var exp = '/(^?0(' + DECIMAL
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.