We need an additional function in the String class. We need the ability
to suppress empty fields, so that we can more effectively parse. Right
now, multiple whitespace characters create multiple empty strings in the
resulting string array. 19 10893
If String.Split doesn't fit your needs you have to create your own split
method which isn't very complicated. String.Split is designed that it meets
the most common application needs.
--
cody
Freeware Tools, Games and Humour http://www.deutronium.de.vu || http://www.deutronium.tk
"David Logan" <dj******@comcast.net> schrieb im Newsbeitrag
news:dtTDc.166670$3x.58747@attbi_s54... We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple empty strings in the resulting string array.
cody wrote: If String.Split doesn't fit your needs you have to create your own split method which isn't very complicated. String.Split is designed that it meets the most common application needs.
-- cody
Freeware Tools, Games and Humour http://www.deutronium.de.vu || http://www.deutronium.tk "David Logan" <dj******@comcast.net> schrieb im Newsbeitrag news:dtTDc.166670$3x.58747@attbi_s54...
We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple empty strings in the resulting string array.
Which is what I have done. But parsing strings of data with multiple
whitespace characters between fields *is* a very common operation. So I
am disagreeing with the part about "meeting the most common application
needs."
Anyway, I just sent it out in case somebody thought "oh, yea, that would
be a good idea."
David Logan
Have you considered using regular expressions (REGEX) to split the string? I have used it to accomplish what you describe.
See System.Text.RegularExpressions
"David Logan" wrote: We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple empty strings in the resulting string array.
Yes, I have considered it, but I prefer not to use a very expensive
regex for an otherwise simple split. String.Split is perfect save the
fact that in something like:
"abc def ghi jkl mnop"
I get an array of 80 elements instead of 5.
I prefer to save regex for parsing strings when:
1) You don't know what you're going to get next
(in a loop of string processing), or
2) There are various optional pieces in a string
that may or may not occur.
In these instances, simple splitting and checking results is already
pretty expensive, so using regex isn't a stretch.
David Logan
Bill O'Neill wrote: Have you considered using regular expressions (REGEX) to split the string? I have used it to accomplish what you describe.
See System.Text.RegularExpressions
"David Logan" wrote:
We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple empty strings in the resulting string array.
> >>We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple empty strings in the resulting string array.
Which is what I have done. But parsing strings of data with multiple whitespace characters between fields *is* a very common operation. So I am disagreeing with the part about "meeting the most common application needs."
Anyway, I just sent it out in case somebody thought "oh, yea, that would be a good idea."
In that case RegEx.Split(string delim) is your friend.
Use @"\s" as separator in your case (IIRC).
--
cody
Freeware Tools, Games and Humour http://www.deutronium.de.vu || http://www.deutronium.tk
I have to agree with David on this one. Every time I looked at StringSplit
to do simple splitting I gave up on it because of all the extra empty
strings.
Philippe
"David Logan" <dj******@comcast.net> wrote in message
news:rpVDc.129576$Sw.70819@attbi_s51... Yes, I have considered it, but I prefer not to use a very expensive regex for an otherwise simple split. String.Split is perfect save the fact that in something like: "abc def ghi jkl mnop"
I get an array of 80 elements instead of 5.
I prefer to save regex for parsing strings when:
1) You don't know what you're going to get next (in a loop of string processing), or 2) There are various optional pieces in a string that may or may not occur.
In these instances, simple splitting and checking results is already pretty expensive, so using regex isn't a stretch.
David Logan
Bill O'Neill wrote: Have you considered using regular expressions (REGEX) to split the
string? I have used it to accomplish what you describe. See System.Text.RegularExpressions
"David Logan" wrote:
We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple empty strings in the resulting string array.
David,
In addition to the other comments.
There are three Split functions in .NET:
Use Microsoft.VisualBasic.Strings.Split if you need to split a string based
on a specific word (string). It is the Split function from VB6.
Use System.String.Split if you need to split a string based on a collection
of specific characters. Each individual character is its own delimiter.
Use System.Text.RegularExpressions.RegEx.Split to split based
on matching patterns.
In your example I would use RegEx.Split, unless it was proven via profiling
to be a performance problem in the routine you are using (remember the 80-20
rule).
Hope this helps
Jay
"David Logan" <dj******@comcast.net> wrote in message
news:dtTDc.166670$3x.58747@attbi_s54... We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple empty strings in the resulting string array.
I was unaware of the .VisualBasic. namespace routines.
Performance may or may not be a problem depending upon which packets I
would need to parse in this manner. I just try to avoid regex in general
unless I need its flexibility.
What is the "80/20" rule?
David Logan
Jay B. Harlow [MVP - Outlook] wrote: David, In addition to the other comments.
There are three Split functions in .NET:
Use Microsoft.VisualBasic.Strings.Split if you need to split a string based on a specific word (string). It is the Split function from VB6.
Use System.String.Split if you need to split a string based on a collection of specific characters. Each individual character is its own delimiter.
Use System.Text.RegularExpressions.RegEx.Split to split based on matching patterns.
In your example I would use RegEx.Split, unless it was proven via profiling to be a performance problem in the routine you are using (remember the 80-20 rule).
Hope this helps Jay
"David Logan" <dj******@comcast.net> wrote in message news:dtTDc.166670$3x.58747@attbi_s54...
We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple empty strings in the resulting string array.
David, Performance may or may not be a problem depending upon which packets I would need to parse in this manner. I just try to avoid regex in general unless I need its flexibility.
Generally if I am going to be reusing the same RegEx, I apply the
RegexOptions.Compiled option and keep the RegEx itself in a static member.
What is the "80/20" rule?
I've heard various variations of it, basically 80% of the time is spent in
20% of the code.
Basically I write "correct" code first, rather then worry how well it will
perform, I only go back & optimize routines, once those routines have proven
to be a performance problem... By "correct" I primarily mean OOP, plus using
the tools available, such as RegEx to solve a problem, if those tools fit
the requirement. Of course "correct" is subjective.
Hope this helps
Jay
"David Logan" <dj******@comcast.net> wrote in message
news:yG4Ec.169467$3x.99527@attbi_s54... I was unaware of the .VisualBasic. namespace routines.
Performance may or may not be a problem depending upon which packets I would need to parse in this manner. I just try to avoid regex in general unless I need its flexibility.
What is the "80/20" rule?
David Logan
Jay B. Harlow [MVP - Outlook] wrote: David, In addition to the other comments.
There are three Split functions in .NET:
Use Microsoft.VisualBasic.Strings.Split if you need to split a string
based on a specific word (string). It is the Split function from VB6.
Use System.String.Split if you need to split a string based on a
collection of specific characters. Each individual character is its own delimiter.
Use System.Text.RegularExpressions.RegEx.Split to split based on matching patterns.
In your example I would use RegEx.Split, unless it was proven via
profiling to be a performance problem in the routine you are using (remember the
80-20 rule).
Hope this helps Jay
"David Logan" <dj******@comcast.net> wrote in message news:dtTDc.166670$3x.58747@attbi_s54...
We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple empty strings in the resulting string array.
David,
It appears that Whidbey (VS.NET 2005) or Longhorn will have an option on
String.Split to omit empty entries. http://longhorn.msdn.microsoft.com/l.../m/split2.aspx
A fourth option may be to use String.Split, then "remove" the array entries
that are blank. (rather then parsing the string yourself...
Hope this helps
Jay
"David Logan" <dj******@comcast.net> wrote in message
news:yG4Ec.169467$3x.99527@attbi_s54... I was unaware of the .VisualBasic. namespace routines.
Performance may or may not be a problem depending upon which packets I would need to parse in this manner. I just try to avoid regex in general unless I need its flexibility.
What is the "80/20" rule?
David Logan
Jay B. Harlow [MVP - Outlook] wrote: David, In addition to the other comments.
There are three Split functions in .NET:
Use Microsoft.VisualBasic.Strings.Split if you need to split a string
based on a specific word (string). It is the Split function from VB6.
Use System.String.Split if you need to split a string based on a
collection of specific characters. Each individual character is its own delimiter.
Use System.Text.RegularExpressions.RegEx.Split to split based on matching patterns.
In your example I would use RegEx.Split, unless it was proven via
profiling to be a performance problem in the routine you are using (remember the
80-20 rule).
Hope this helps Jay
"David Logan" <dj******@comcast.net> wrote in message news:dtTDc.166670$3x.58747@attbi_s54...
We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple empty strings in the resulting string array.
I am currently using a homegrown method:
protected String[] SplitNoEmpty(String data)
{
ArrayList fieldarray = new ArrayList();
foreach (string field in data.Split(' '))
if (field.Length > 0) fieldarray.Add(field);
String[] ret = new String[fieldarray.Count];
for(int x=0;x<fieldarray.Count;x++)
ret[x]=(String)fieldarray[x];
return ret;
}
I mentioned it mainly because splitting strings over multiple whitespace
is such a common operation I think it would be worthwhile to consider
implementing in the common libraries.
David Logan
Jay B. Harlow [MVP - Outlook] wrote: David, It appears that Whidbey (VS.NET 2005) or Longhorn will have an option on String.Split to omit empty entries.
http://longhorn.msdn.microsoft.com/l.../m/split2.aspx
A fourth option may be to use String.Split, then "remove" the array entries that are blank. (rather then parsing the string yourself...
Hope this helps Jay
"David Logan" <dj******@comcast.net> wrote in message news:yG4Ec.169467$3x.99527@attbi_s54...
I was unaware of the .VisualBasic. namespace routines.
Performance may or may not be a problem depending upon which packets I would need to parse in this manner. I just try to avoid regex in general unless I need its flexibility.
What is the "80/20" rule?
David Logan
Jay B. Harlow [MVP - Outlook] wrote:
David, In addition to the other comments.
There are three Split functions in .NET:
Use Microsoft.VisualBasic.Strings.Split if you need to split a string based on a specific word (string). It is the Split function from VB6.
Use System.String.Split if you need to split a string based on a collection of specific characters. Each individual character is its own delimiter.
Use System.Text.RegularExpressions.RegEx.Split to split based on matching patterns.
In your example I would use RegEx.Split, unless it was proven via profiling to be a performance problem in the routine you are using (remember the 80-20 rule).
Hope this helps Jay
"David Logan" <dj******@comcast.net> wrote in message news:dtTDc.166670$3x.58747@attbi_s54...
We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple empty strings in the resulting string array.
"David Logan" <dj******@comcast.net> wrote: ArrayList fieldarray = new ArrayList(); [...] String[] ret = new String[fieldarray.Count]; for(int x=0;x<fieldarray.Count;x++) ret[x]=(String)fieldarray[x]; return ret;
FYI, there's a more concise way of doing that:
return (string[]) fieldarray.ToArray(typeof(string));
splitting strings over multiple whitespace is such a common operation I think it would be worthwhile to consider implementing in the common libraries.
I agree. An extra bool parameter to String.Split, indicating whether
to omit zero-length strings from the resulting array, wouldn't hurt.
P.
Hi,
"David Logan" <dj******@comcast.net> wrote in message
news:dtTDc.166670$3x.58747@attbi_s54... We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple empty strings in the resulting string array.
string [] fields = Regex.Split (strInput, "\\s+");
Why bother writing it yourself if it can be done as easely. There is
nothing wrong with regex.
I don't like the argument that it shouldn't be used in simple cases, for one
you shouldn't be concerned about writing an inefficient pattern.
HTH
greetings
I completely agree that, for now, Regex is the best solution for most of us.
I wrote a test that split David's string 10,000 times. The string.split method took 0.143 seconds while Regex took 1.104 seconds. Regex is almost an order of magnitude slower; however, it is a good solution.
Unless your application performance constraints are very strict, I would use Regex.
"BMermuys" wrote: Hi,
"David Logan" <dj******@comcast.net> wrote in message news:dtTDc.166670$3x.58747@attbi_s54... We need an additional function in the String class. We need the ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple empty strings in the resulting string array.
string [] fields = Regex.Split (strInput, "\\s+");
Why bother writing it yourself if it can be done as easely. There is nothing wrong with regex.
I don't like the argument that it shouldn't be used in simple cases, for one you shouldn't be concerned about writing an inefficient pattern. HTH greetings
If you would have used a compiled RegEx instead of everytime calling
RegEx.Split() which compiles the RegEx everytime again, I suspect that
RegEx.Split() would have been even fast than String.Split().
--
cody
Freeware Tools, Games and Humour http://www.deutronium.de.vu || http://www.deutronium.tk
"Bill O'Neill" <Bi********@discussions.microsoft.com> schrieb im Newsbeitrag
news:01**********************************@microsof t.com... I completely agree that, for now, Regex is the best solution for most of
us. I wrote a test that split David's string 10,000 times. The string.split
method took 0.143 seconds while Regex took 1.104 seconds. Regex is almost an
order of magnitude slower; however, it is a good solution. Unless your application performance constraints are very strict, I would
use Regex.
"BMermuys" wrote:
Hi,
"David Logan" <dj******@comcast.net> wrote in message news:dtTDc.166670$3x.58747@attbi_s54... We need an additional function in the String class. We need the
ability to suppress empty fields, so that we can more effectively parse. Right now, multiple whitespace characters create multiple empty strings in
the resulting string array.
string [] fields = Regex.Split (strInput, "\\s+");
Why bother writing it yourself if it can be done as easely. There is nothing wrong with regex.
I don't like the argument that it shouldn't be used in simple cases, for
one you shouldn't be concerned about writing an inefficient pattern. HTH greetings
That *is* using a compiled Regex instance, and not the static Split method.
"cody" wrote: If you would have used a compiled RegEx instead of everytime calling RegEx.Split() which compiles the RegEx everytime again, I suspect that RegEx.Split() would have been even fast than String.Split().
-- cody
Freeware Tools, Games and Humour http://www.deutronium.de.vu || http://www.deutronium.tk "Bill O'Neill" <Bi********@discussions.microsoft.com> schrieb im Newsbeitrag news:01**********************************@microsof t.com... I completely agree that, for now, Regex is the best solution for most of us. I wrote a test that split David's string 10,000 times. The string.split
method took 0.143 seconds while Regex took 1.104 seconds. Regex is almost an order of magnitude slower; however, it is a good solution. Unless your application performance constraints are very strict, I would
use Regex.
"BMermuys" wrote:
Hi,
"David Logan" <dj******@comcast.net> wrote in message news:dtTDc.166670$3x.58747@attbi_s54... > We need an additional function in the String class. We need the ability > to suppress empty fields, so that we can more effectively parse. Right > now, multiple whitespace characters create multiple empty strings in the > resulting string array.
string [] fields = Regex.Split (strInput, "\\s+");
Why bother writing it yourself if it can be done as easely. There is nothing wrong with regex.
I don't like the argument that it shouldn't be used in simple cases, for one you shouldn't be concerned about writing an inefficient pattern. HTH greetings
Bill O'Neill wrote: I completely agree that, for now, Regex is the best solution for most of us.
I wrote a test that split David's string 10,000 times. The string.split method took 0.143 seconds while Regex took 1.104 seconds. Regex is almost an order of magnitude slower; however, it is a good solution.
That's exactly why I reserve regex to cases where it's really useful. Unless your application performance constraints are very strict, I would use Regex.
Why use a very inefficient method when there is a perfectly good and
efficient one? And it *could* be supported by the library. :)
David Logan
David, I wrote a test that split David's string 10,000 times. The string.split method took 0.143 seconds while Regex took 1.104 seconds. Regex is almost an order of magnitude slower; however, it is a good solution.
That's exactly why I reserve regex to cases where it's really useful.
Yes the Regex took almost 10 times longer, however what happens when the
RegEx is only 1% or even .01% of the total cost of your routine, is it
really worth worrying about?
By routine I mean what you do with the array after splitting it. For example
placing the values into a DataTable. If the cost of using the DataTable is
significantly more then cost of the RegEx is it really worth worring about
avoiding the RegEx?
My concern with coding around it, is how much memory pressure (work for the
GC) are you creating to avoid the time on the RegEx. Are you simply robbing
Peter to pay Paul?
Which is where I would not avoid the RegEx, simply because RegEx is slow, I
would use the RegEx because it is quicker coding, and its a good fit for
this problem. Once the RegEx was proven to be too high a cost of the
routine, via profiling (the CLR profiler for example) then I would take the
extra time to code a quicker solution...
Granted if we get the String.Split ignore empties option in Whidbey, the
option would be the better fit in Whidbey...
For info on the 80/20 rule & optimizing only the 20% see Martin Fowler's
article "Yet Another Optimization Article" at http://martinfowler.com/ieeeSoftware...timization.pdf
For a list of Martin's articles see: http://martinfowler.com/articles.html
Info on the CLR Profiler: http://msdn.microsoft.com/library/de...nethowto13.asp http://msdn.microsoft.com/library/de...anagedapps.asp
Hope this helps
Jay
"David Logan" <dj******@comcast.net> wrote in message
news:YqeEc.125976$eu.16729@attbi_s02... Bill O'Neill wrote: I completely agree that, for now, Regex is the best solution for most of
us. I wrote a test that split David's string 10,000 times. The string.split
method took 0.143 seconds while Regex took 1.104 seconds. Regex is almost an
order of magnitude slower; however, it is a good solution. That's exactly why I reserve regex to cases where it's really useful.
Unless your application performance constraints are very strict, I would
use Regex.
Why use a very inefficient method when there is a perfectly good and efficient one? And it *could* be supported by the library. :)
David Logan
David,
Looking at this closer the expression you use makes a huge difference!
For example BMermuy's statement:
string [] fields = Regex.Split (strInput, "\\s+");
is slower then
string [] fields = Regex.Split (strInput, " +");
String.Split: 0.105655048337149
SplitNoEmpty: 0.168633723001108
regex(" +"): 0.286259287144036
regex("\\s+"): 0.713445703294692
If you know your string only has a space as a delimiter, then the RegEx time
is only about 2x the SplitNoEmpty routine, however if you can have any white
space character (\s is short hand for [\f\n\r\t\v\x85\p{Z}]) as a delimiter
then the time is about 7x...
Times are in seconds based on QueryPerformanceCounter &
QueryPerformanceFrequency, using a loop of 10,000 iterations. I compiled the
RegEx outside the loop.
Hope this helps
Jay
"David Logan" <dj******@comcast.net> wrote in message
news:YqeEc.125976$eu.16729@attbi_s02... Bill O'Neill wrote: I completely agree that, for now, Regex is the best solution for most of
us. I wrote a test that split David's string 10,000 times. The string.split
method took 0.143 seconds while Regex took 1.104 seconds. Regex is almost an
order of magnitude slower; however, it is a good solution. That's exactly why I reserve regex to cases where it's really useful.
Unless your application performance constraints are very strict, I would
use Regex.
Why use a very inefficient method when there is a perfectly good and efficient one? And it *could* be supported by the library. :)
David Logan This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: William Stacey [MVP] |
last post by:
Would like help with a (I think) a common regex split example. Thanks for
your example in advance. Cheers!
Source Data Example:
one "two three" four
Optional, but would also like to...
|
by: Rico |
last post by:
If there are consecutive occurrences of characters from the given
delimiter, String.Split() and Regex.Split() produce an empty string as the
token that's between such consecutive occurrences. It...
|
by: Dan Schumm |
last post by:
I'm relatively new to regular expressions and was looking for some help on a
problem that I need to solve. Basically, given an HTML string, I need to
highlight certain words within the text of the...
|
by: Guadala Harry |
last post by:
In an ASCX, I have a Literal control into which I inject a
at runtime.
litInjectedContent.Text = dataClass.GetHTMLSnippetFromDB(someID);
This works great as long as the contains just...
|
by: kurt sune |
last post by:
The code:
Dim aLine As String = "cat" & vbNewLine & "dog" & vbNewLine & "fox" &
vbNewLine
Dim csvColumns1 As String() = aLine.Split(vbNewLine, vbCr, vbLf)
Dim csvColumns2 As String() =...
| |
by: Michele Petrazzo |
last post by:
Hello ng,
I don't understand why split (string split) doesn't work with the same
method if I can't pass values or if I pass a whitespace value:
>>> "".split()
>>> "".split(" ")
But into...
|
by: klineb |
last post by:
Good Day,
I have written and utility to convert our DOS COBOL data files to a SQL
Server database. Part of the process requires parsing each line into a
sql statement and validting the data to...
|
by: 6afraidbecause789 |
last post by:
If able, can someone please help make a Where clause that strings
together IDs in a multi-select listbox AND includes a date range.
I wasn’t thinking when I used the code below that strings...
|
by: Stevo |
last post by:
If you split a string into an array using the split method, it's not
working the way I'd expect it to. That doesn't mean it's wrong of
course, but would anyone else agree it's working somewhat...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
| |
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...
| |