By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,853 Members | 1,052 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,853 IT Pros & Developers. It's quick & easy.

String.Split(), Regex.Split() - empty String

P: n/a
If there are consecutive occurrences of characters from the given
delimiter, String.Split() and Regex.Split() produce an empty string as the
token that's between such consecutive occurrences. It sounds like making
sense, but has anyone ever found this useful? Can this 'feature' be
disabled?

After having used StringTokenizer from the J-language that's not to be
named, it's annoyed me for hours before I figured out that it was just a
matter of modifying our own Split method to ignore tokens returned when
some startIndex is the same as some current pointer.

Now... I went through the trouble of using NHibernate to get rid of SQL
strings.. only to find myself rolling my own Tokenizer... it feels weird.

Rico.
Nov 17 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
In article <pa****************************@yahoo.com>,
Rico <ra*****@yahoo.com> wrote:

: If there are consecutive occurrences of characters from the given
: delimiter, String.Split() and Regex.Split() produce an empty string as
: the token that's between such consecutive occurrences. It sounds like
: making sense, but has anyone ever found this useful? Can this
: 'feature' be disabled?
: [...]

Sure. If you mean a run of multiple separators is semantically
equivalent to a single separator, then say what you mean:

static void Main(string[] args)
{
string input = "one-two--three---four----five";

foreach (string pattern in new string[] { "-", "-+" })
{
Console.WriteLine("Splitting on " + pattern + "...");

Regex separator = new Regex(pattern);
foreach (string field in separator.Split(input))
Console.WriteLine(" - [" + field + "]");
}
}

The above program's output is

Splitting on -...
- [one]
- [two]
- []
- [three]
- []
- []
- [four]
- []
- []
- []
- [five]
Splitting on -+...
- [one]
- [two]
- [three]
- [four]
- [five]

Hope this helps,
Greg
Nov 17 '05 #2

P: n/a
On Wed, 22 Jun 2005 17:30:41 +0000, Greg Bacon wrote:
Sure. If you mean a run of multiple separators is semantically
equivalent to a single separator, then say what you mean:

static void Main(string[] args)
{
string input = "one-two--three---four----five";

foreach (string pattern in new string[] { "-", "-+" })
{
Console.WriteLine("Splitting on " + pattern + "...");

Regex separator = new Regex(pattern);
foreach (string field in separator.Split(input))
Console.WriteLine(" - [" + field + "]");
}
}

The above program's output is
<snipped> output is as expected.
Hope this helps,


It helped. Thanks a lot.

However, how do I get Regex to handle my intention when the input is
"-one-two--three---four----five-" ?

I don't want the first and last empty strings returned. If the delimiter
is a run of empty spaces, then I can Trim() the input, but what when it's
not?

Rico.
Nov 17 '05 #3

P: n/a
In article <pa****************************@yahoo.com>,
Rico <ra*****@yahoo.com> wrote:

: [...]
: However, how do I get Regex to handle my intention when the input is
: "-one-two--three---four----five-" ?
:
: I don't want the first and last empty strings returned. If the
: delimiter is a run of empty spaces, then I can Trim() the input, but
: what when it's not?

Oh, sorry, my sample code handled a separator, not a delimiter.

This should be more to your liking:

static void Main(string[] args)
{
string input = "-one-two--three---four----five---";

string delimiter = "-+";
string pattern = String.Format(
@"^({0}|{0}(?<field>.+?)(?={0}))*$",
delimiter);

Regex delimited = new Regex(pattern);
Match m = delimited.Match(input);
if (m.Success)
foreach (Capture c in m.Groups["field"].Captures)
Console.WriteLine("[" + c + "]");
else
Console.WriteLine("no match");
}

One area to note is the first alternative that matches only the
delimiter. In cases with multiple trailing delimiters, e.g.,
"...-five---", this subpattern disambiguates by telling the matcher
to treat them as a single delimiter and not two.

You could also use \G:

static void Main(string[] args)
{
string input = "----one-----two--three---four----five---";

Regex delimited = new Regex(@"(-+|\G)(?<field>.+?)-+");
Match m = delimited.Match(input);
while (m.Success)
{
Console.WriteLine("[" + m.Groups["field"] + "]");

m = m.NextMatch();
}
}

Hope this helps,
Greg
Nov 17 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.