473,386 Members | 1,679 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

String.Split(), Regex.Split() - empty String

If there are consecutive occurrences of characters from the given
delimiter, String.Split() and Regex.Split() produce an empty string as the
token that's between such consecutive occurrences. It sounds like making
sense, but has anyone ever found this useful? Can this 'feature' be
disabled?

After having used StringTokenizer from the J-language that's not to be
named, it's annoyed me for hours before I figured out that it was just a
matter of modifying our own Split method to ignore tokens returned when
some startIndex is the same as some current pointer.

Now... I went through the trouble of using NHibernate to get rid of SQL
strings.. only to find myself rolling my own Tokenizer... it feels weird.

Rico.
Nov 17 '05 #1
3 6004
In article <pa****************************@yahoo.com>,
Rico <ra*****@yahoo.com> wrote:

: If there are consecutive occurrences of characters from the given
: delimiter, String.Split() and Regex.Split() produce an empty string as
: the token that's between such consecutive occurrences. It sounds like
: making sense, but has anyone ever found this useful? Can this
: 'feature' be disabled?
: [...]

Sure. If you mean a run of multiple separators is semantically
equivalent to a single separator, then say what you mean:

static void Main(string[] args)
{
string input = "one-two--three---four----five";

foreach (string pattern in new string[] { "-", "-+" })
{
Console.WriteLine("Splitting on " + pattern + "...");

Regex separator = new Regex(pattern);
foreach (string field in separator.Split(input))
Console.WriteLine(" - [" + field + "]");
}
}

The above program's output is

Splitting on -...
- [one]
- [two]
- []
- [three]
- []
- []
- [four]
- []
- []
- []
- [five]
Splitting on -+...
- [one]
- [two]
- [three]
- [four]
- [five]

Hope this helps,
Greg
Nov 17 '05 #2
On Wed, 22 Jun 2005 17:30:41 +0000, Greg Bacon wrote:
Sure. If you mean a run of multiple separators is semantically
equivalent to a single separator, then say what you mean:

static void Main(string[] args)
{
string input = "one-two--three---four----five";

foreach (string pattern in new string[] { "-", "-+" })
{
Console.WriteLine("Splitting on " + pattern + "...");

Regex separator = new Regex(pattern);
foreach (string field in separator.Split(input))
Console.WriteLine(" - [" + field + "]");
}
}

The above program's output is
<snipped> output is as expected.
Hope this helps,


It helped. Thanks a lot.

However, how do I get Regex to handle my intention when the input is
"-one-two--three---four----five-" ?

I don't want the first and last empty strings returned. If the delimiter
is a run of empty spaces, then I can Trim() the input, but what when it's
not?

Rico.
Nov 17 '05 #3
In article <pa****************************@yahoo.com>,
Rico <ra*****@yahoo.com> wrote:

: [...]
: However, how do I get Regex to handle my intention when the input is
: "-one-two--three---four----five-" ?
:
: I don't want the first and last empty strings returned. If the
: delimiter is a run of empty spaces, then I can Trim() the input, but
: what when it's not?

Oh, sorry, my sample code handled a separator, not a delimiter.

This should be more to your liking:

static void Main(string[] args)
{
string input = "-one-two--three---four----five---";

string delimiter = "-+";
string pattern = String.Format(
@"^({0}|{0}(?<field>.+?)(?={0}))*$",
delimiter);

Regex delimited = new Regex(pattern);
Match m = delimited.Match(input);
if (m.Success)
foreach (Capture c in m.Groups["field"].Captures)
Console.WriteLine("[" + c + "]");
else
Console.WriteLine("no match");
}

One area to note is the first alternative that matches only the
delimiter. In cases with multiple trailing delimiters, e.g.,
"...-five---", this subpattern disambiguates by telling the matcher
to treat them as a single delimiter and not two.

You could also use \G:

static void Main(string[] args)
{
string input = "----one-----two--three---four----five---";

Regex delimited = new Regex(@"(-+|\G)(?<field>.+?)-+");
Match m = delimited.Match(input);
while (m.Success)
{
Console.WriteLine("[" + m.Groups["field"] + "]");

m = m.NextMatch();
}
}

Hope this helps,
Greg
Nov 17 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Senthil | last post by:
Code ---------------------- string Line = "\"A\",\"B\",\"C\",\"D\""; string Line2 = Line.Replace("\",\"","\"\",\"\""); string CSVColumns = Line2.Split("\",\"".ToCharArray());
19
by: David Logan | last post by:
We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple...
2
by: Dan Schumm | last post by:
I'm relatively new to regular expressions and was looking for some help on a problem that I need to solve. Basically, given an HTML string, I need to highlight certain words within the text of the...
4
by: J | last post by:
anybody know a vb.net command to populate an array with each word in a string? for example " hi how are you" array(0) = hi array(1) = how array(2) = are array(3) = you i know i could...
5
by: kurt sune | last post by:
The code: Dim aLine As String = "cat" & vbNewLine & "dog" & vbNewLine & "fox" & vbNewLine Dim csvColumns1 As String() = aLine.Split(vbNewLine, vbCr, vbLf) Dim csvColumns2 As String() =...
7
by: Sling | last post by:
I code in Rexx on the mainframe which has 2 built-in functions: word(s,i) & words(s). word(s,i) returns the ith word in the s(tring), and words(s) returns the number of words within the s(tring)....
28
by: Materialised | last post by:
Hi all, Just wondering if someone could help me with this little problem I'm having. I have a string value (it actually represents a barcode) which looks like this: 5021378002392 What I...
24
by: garyusenet | last post by:
I'm working on a data file and can't find any common delimmiters in the file to indicate the end of one row of data and the start of the next. Rows are not on individual lines but run accross...
0
by: =?ISO-8859-15?Q?C=E9dric?= | last post by:
Hi all, I want to import a SQL script (SQLite) executing each queries separately. - I read the SQL file - I split the read string with the separator ";" - I execute each query string query...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.