473,385 Members | 1,185 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

regex split

Would like help with a (I think) a common regex split example. Thanks for
your example in advance. Cheers!

Source Data Example:
one "two three" four

Optional, but would also like to ignore pairs of brackets like:
"one" <tab> "two three" ( four "five six" )

Want fields like:
field1:one
field2:two three
field3:four

field1:one
field2:two three
field3:four
field4:five six

Thanks much!!

--
William Stacey, MVP
Nov 16 '05 #1
7 10156
Should clarify a little. Basically, want to split a line that ignores all
whitespace (space, tab) except if the space is inclosed in quotes. Anything
in a quote pair is one field. A non-escaped quote (i.e. \") that does not
have a closing quote is an error. Same with the "(" parens. If paren in
not inside a quote, then it is special and needs a closing paren. If the
paren stuff makes this too hard, forget it and please help with the first
requirement. Again thanks!

--
William Stacey, MVP

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:#z*************@TK2MSFTNGP09.phx.gbl...
Would like help with a (I think) a common regex split example. Thanks for
your example in advance. Cheers!

Source Data Example:
one "two three" four

Optional, but would also like to ignore pairs of brackets like:
"one" <tab> "two three" ( four "five six" )

Want fields like:
field1:one
field2:two three
field3:four

field1:one
field2:two three
field3:four
field4:five six

Thanks much!!

--
William Stacey, MVP


Nov 16 '05 #2
I think it's a common match example, but not a common split example.
split assumes you have a single regex that matches the splitting text,
but you don't, since the text between *one* and *two* must start and end
with a double quote because there's one before *one*.

you can, however, match something like

(("\[^"]+")|(\w+))( \<[^>]*\>)*

and iterate on the matches.
HTH

William Stacey [MVP] wrote:
Should clarify a little. Basically, want to split a line that ignores all
whitespace (space, tab) except if the space is inclosed in quotes. Anything
in a quote pair is one field. A non-escaped quote (i.e. \") that does not
have a closing quote is an error. Same with the "(" parens. If paren in
not inside a quote, then it is special and needs a closing paren. If the
paren stuff makes this too hard, forget it and please help with the first
requirement. Again thanks!

Nov 16 '05 #3
Thanks Uri. However I need to preserve an arg like "this is one arg" as one
field and not have that four fields as I can't figure out after the fact
that that was one argument. Probably have to manually parse this, but
thought there may be easy way using regex. Cheers!

--
William Stacey, MVP

"Uri Dor" <re***************@mivzak.com> wrote in message
news:#T**************@tk2msftngp13.phx.gbl...
I think it's a common match example, but not a common split example.
split assumes you have a single regex that matches the splitting text,
but you don't, since the text between *one* and *two* must start and end
with a double quote because there's one before *one*.

you can, however, match something like

(("\[^"]+")|(\w+))( \<[^>]*\>)*

and iterate on the matches.
HTH

William Stacey [MVP] wrote:
Should clarify a little. Basically, want to split a line that ignores all whitespace (space, tab) except if the space is inclosed in quotes. Anything in a quote pair is one field. A non-escaped quote (i.e. \") that does not have a closing quote is an error. Same with the "(" parens. If paren in not inside a quote, then it is special and needs a closing paren. If the paren stuff makes this too hard, forget it and please help with the first requirement. Again thanks!


Nov 16 '05 #4
Here is a cool little method that I modifed from a VB example. Does exactly
what I wanted. Can split on any delimiter or multiple delimiters and can
quote using any pair or chars. Very cool. Have not tested all possible
failures, etc, but appears to work well. Some clever (and generous) pattern
person may want to modify this to allow an *array of quote pairs, so you
could quote on "one two" or {one two} or (one two) in the same call. If you
do, please post update. Cheers!
==
/// <summary>
/// Split a string, dealing correctly with quoted items.
/// The quotes parm is the character pair used to quote strings
/// (default is "", the double quote).
/// You can also use a character pair (eg "{}") if the opening
/// and closing quotes are different.
///
/// For example, you can split the following string:
/// string[] fields = SplitQuoted("[one,two],three,[four,five]", , "[]")
/// into 3 items, because commas inside [] are not taken into account.
/// </summary>
/// <remarks>
/// Multiple seperators are ignored, so splitting "a,,b" using a comma as
/// the seperator will return two fields, not three. To get this behavior,
/// you could use ", " (comma and space) as seperators and default quotes.
/// Then set the string to something like ' a, "", b ' to get the empty
field.
/// You could also use comma as *only seperator and put a space to get a
space field
/// like 'a, ,b'.
/// </remarks>
/// <param name="text">string to split</param>
/// <param name="seperator">The seperator char(s) as string.</param>
/// <param name="quotes">The char pair used to quote a string.</param>
/// <returns>string[]</returns>
private string[] SplitQuoted(string text, string seperators, string quotes)
{
// Default seperators is a space and tab (e.g. " \t").
// All seperators not inside quote pair are ignored.
// Default quotes pair is two double quotes ( e.g. '""' ).
if ( text == null )
throw new ArgumentNullException("text", "text is null.");
if ( seperators == null || seperators.Length < 1 )
seperators = " \t";
if ( quotes == null || quotes.Length < 1 )
quotes = "\"\"";
ArrayList res = new ArrayList();

// Get the open and close chars, escape them for use in regular
expressions.
string openChar = Regex.Escape(quotes[0].ToString());
string closeChar = Regex.Escape(quotes[quotes.Length - 1].ToString());
// Build the pattern that searches for both quoted and unquoted elements
// notice that the quoted element is defined by group #2
// and the unquoted element is defined by group #3.
string pattern = @"\s*(" + openChar + "([^" + closeChar + "]*)" +
closeChar + @"|([^" + seperators + @"]+))\s*";

// Search the string.
foreach ( System.Text.RegularExpressions.Match m in
System.Text.RegularExpressions.Regex.Matches(text, pattern) )
{
string g3 = m.Groups[3].Value;
if ( g3 != null && g3.Length > 0 )
res.Add(g3);
else
{
// get the quoted string, but without the quotes.
res.Add(m.Groups[2].Value);
}
}
return (string[])res.ToArray(typeof(string));
}

Nov 16 '05 #5
Hi William,

I am glad you got what you want. Do you still have any concern on this
issue?

Please feel free to feedback. Thanks

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.

Nov 16 '05 #6
Yes thanks. Here is a ~better one that will escape "\" anything, including
a quote inside quote pairs:
public static string[] SplitQuoted(string text, string seperators)
{
// "([^"\\]*(\\.[^"\\]*)*)"
// |
// ([^\s,]+)
// Default seperators is a space and tab (e.g. " \t").
// All seperators not inside quote pair are ignored.
// Default quotes pair is two double quotes ( e.g. '""' ).
if ( text == null )
throw new ArgumentNullException("text", "text is null.");
if ( seperators == null || seperators.Length < 1 )
seperators = " \t"; // Default is space and tab.

// if ( quotes == null || quotes.Length < 1 )
// quotes = "\"\"";
ArrayList res = new ArrayList();

// Get the open and close chars, escape them for use in regular
expressions.
// string openChar = Regex.Escape(quotes[0].ToString());
// string closeChar = Regex.Escape(quotes[quotes.Length - 1].ToString());
// Build the pattern that searches for both quoted and unquoted elements
// notice that the quoted element is defined by group #2
// and the unquoted element is defined by group #3.
//| \s*("([^"]*)"|([^,]+))\s* |
// match any spaces upto first quote. that does not contain zero or more
" chars
// ending in a quote OR not one or more commas
// string pattern = @"\s*(" + openChar + "([^" + closeChar + "]*)" +
// closeChar + @"|([^" + seperators + @"]+))\s*";

//"([^"\\]*[\\.[^"\\]*]*)" //Note quotes at either end are required.
//|
//([^\s,]+)
//string[] sa = Regex.Split("my string", "pattern");
string pattern =
@"""([^""\\]*[\\.[^""\\]*]*)""" +
"|" +
@"([^" + seperators + @"]+)";

// Search the string.
foreach ( System.Text.RegularExpressions.Match m in
System.Text.RegularExpressions.Regex.Matches(text, pattern) )
{
//string g0 = m.Groups[0].Value;
string g1 = m.Groups[1].Value;
string g2 = m.Groups[2].Value;
if ( g2 != null && g2.Length > 0 )
{
res.Add(g2);
}
else
{
// get the quoted string, but without the quotes in g1;
res.Add(g1);
}
}
return (string[])res.ToArray(typeof(string));
}

--
William Stacey, MVP

""Jeffrey Tan[MSFT]"" <v-*****@online.microsoft.com> wrote in message
news:br**************@cpmsftngxa10.phx.gbl...
Hi William,

I am glad you got what you want. Do you still have any concern on this
issue?

Please feel free to feedback. Thanks

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.


Nov 16 '05 #7
Hi William,

Thanks for sharing your information with the community!!

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.

Nov 16 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Frank Oquendo | last post by:
I have the following code: string pattern = @"(\{)|(})|(\()|(\))|(\)|(\^)|(\*)|(/)|(-)|(\+)|(%)"; Regex regex = new Regex(pattern); string input = "QTY * ESTIMATED COST + 2"; string tokens =...
4
by: William Stacey [MVP] | last post by:
Would like help with a (I think) a common regex split example. Thanks for your example in advance. Cheers! Source Data Example: one "two three" four Optional, but would also like to...
5
by: Jianwei Sun | last post by:
string sTest="TEST1||TEST2"; string asTest =Regex.Split(sTest, "||" ); I want to get an array with two elements TEST1 and TEST2, but it returs every char inside the sTest as a seperate array...
3
by: Rico | last post by:
If there are consecutive occurrences of characters from the given delimiter, String.Split() and Regex.Split() produce an empty string as the token that's between such consecutive occurrences. It...
3
by: Stephan Bour | last post by:
I have a string ³Name² in the following format: ³LastName, FirstName (Department)² that comes from Active Directory. I need to extract the FirstName from the string. Substrings are not practical...
10
by: Claud Balls | last post by:
I am splitting large files based on a text delimeter, but I don't want the delimeter left out of the string. For example if I had a string "NAME: Bill TOWN: Helena NAME: Frank TOWN: Helena" I...
7
by: lgbjr | last post by:
Hi All, I'm trying to split a string on every character. The string happens to be a representation of a hex number. So, my regex expression is (). Seems simple, but for some reason, I'm not...
7
by: Jordi Rico | last post by:
Hi, I know I can split a string into an array doing this: Dim s As String()=Regex.Split("One-Two-Three","-") So I would have: s(0)="One" s(1)="Two"
1
by: mad.scientist.jr | last post by:
I am working in C# ASP.NET framework 1.1 and for some reason Regex.Split isn't working as expected. When trying to split a string, Split is returning an array with the entire string in element ...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.