string routines and code libraries

Hi Zoro,

I'm not familiar with Dephi, so I may be misinterpreting the meaning of the
functions here. Assuming that when you mention "pattern" you are talking
about a Regular Expression-type pattern, I can see how such functions might
indeed be useful. As I may need such functions in the future as well, I've
taken the liberty of writing a few .Net methods for doing what you're
talking about:

/// <summary>
/// Returns all Indices of Regex Matches in a string
/// </summary>
/// <param name="input">string to evaluate</param>
/// <param name="pattern">pattern to match</param>
/// <returns>Array of indices of all matches</returns>
/// <remarks>If no matches are found, zero-length array is
returned</remarks>
public static int[] IndicesOf(string input, string pattern)
{
int[] returnVal;
Regex rx = new Regex(pattern);
MatchCollection matches = rx.Matches(input);
returnVal = new int[matches.Count];
for (int i = 0; i < matches.Count; i++)
returnVal[i] = matches[i].Index;
return returnVal;
}

/// <summary>
/// Returns first index of a Regex match in a string
/// </summary>
/// <param name="input">string to evaluate</param>
/// <param name="pattern">pattern to match</param>
/// <returns>Index of first match in input string</returns>
/// <remarks>Returns -1 if no match is found</remarks>
public static int IndexOf(string input, string pattern)
{
Regex rx = new Regex(pattern);
if (!rx.IsMatch(input)) return -1;
return rx.Match(input).Index;
}

/// <summary>
/// Returns the index of the last match of a pattern in an input string
/// </summary>
/// <param name="input">string to evaluate</param>
/// <param name="pattern">pattern to match</param>
/// <returns>Index of the last match of a pattern in the input
string</returns>
public static int LastIndexOf(string input, string pattern)
{
int[] vals = IndicesOf(input, pattern);
if (vals.Length == 0) return -1;
return vals[vals.Length - 1];
}

/// <summary>
/// Returns a Substring of a string
/// before or after the first occurrence of a pattern in the string
/// </summary>
/// <param name="input">string to evaluate</param>
/// <param name="pattern">pattern to match</param>
/// <param name="before">Get the Substring before the pattern?</param>
/// <returns>Substring of input string, starting from the beginning
/// of the string and ending before the first character of the match,
/// or, if before is false, starting from the end of the match, and ending
/// at the end of the string.</returns>
/// <remarks>If there is no match, returns the input string.
/// If before is false, returns the substring after the match</remarks>
public static string Substring(string input, string pattern, bool before)
{
int i;
if (before)
{
i = IndexOf(input, pattern);
if (i > -1) return input.Substring(0, i);
}
else
{
Regex rx = new Regex(pattern);
MatchCollection matches = rx.Matches(input);
if (matches.Count > 0)
{
i = matches[matches.Count - 1].Index + matches[matches.Count -
1].Value.Length;
return input.Substring(i);
}
}
return input;
}

/// <summary>
/// Finds a substring of ain input string between 2 pattern matches
/// </summary>
/// <param name="input">string to evaluate</param>
/// <param name="pattern1">first pattern</param>
/// <param name="pattern2">second pattern</param>
/// <returns>Substring of input string between the 2 patterns</returns>
/// <remarks><para>The order of the patterns is only important if both
/// paterns are found, and are not identical patterns.
/// If the patterns are different, and both patterns are found,
/// the substring returned will be the substring between them
/// regardless of the order in which they appear in the input text</para>
/// <para>If both patterns are found, but their matches overlap, there
/// is nothing between them, and a blank string is returned</para>
/// <para>If both patterns are found, and they are the same pattern,
/// The method will look for a second occurrence of the pattern, and
/// attempt to return the substring between the first and second match
/// of the pattern used. If there is not a second match, the patterns
/// overlap, as they occupy the same space,
/// and there is nothing between.</para>
/// <para>If the first pattern is found, but the second pattern
/// is not found, the substring will be either the substring
/// of the input string after the first match of pattern1,
/// or if the first pattern matches the end of the string,
/// the substring of the string after the first match of pattern1</para>
/// <para>If the second pattern is found, but the first pattern is not,
/// the substring will be either the substring of the input string
/// before the beginning of the first match of the second pattern,
/// or if the second pattern is the beginning of the string, the
/// substring of the string after the end of the first match of the
/// second pattern.</para>
/// <para>If neither pattern is found, the entire input string will
/// be returned.</para>
/// </remarks>
public static string SubstringBetween(string input,
string pattern1, string pattern2)
{
// indices of 2 matches matching 2 patterns
int index1 = -2, index2 = -2;

// 2 Matches to use in calculation
Match m1 = null, m2 = null;
int len1, len2;

// Calculate first match
if (!Regex.IsMatch(input, pattern1)) index1 = -1;
else
{
m1 = Regex.Match(input, pattern1);
index1 = m1.Index;
}

// Calculate second match
if (!Regex.IsMatch(input, pattern2)) index2 = -1;
else
{
m2 = Regex.Match(input, pattern2);
index2 = m2.Index;
}

// if neither is found, return input
if (index1 == -1 && index2 == -1) return input;

// Otherwise, at least 1 is found. Return a substring

// pattern1 not found.
if (index1 == -1)
{
if (index2 > 0)
return input.Substring(0, index2); // treat as second
else
return input.Substring(index2 + m2.Length); // treat as first
}

// Used for no pattern2, identical patterns, and overlaps

// Length of input to end of m1
len1 = index1 + m1.Length;

//pattern2 not found.
if (index2 == -1)
{
if (len1 < input.Length)
return input.Substring(len1); // treat as first
else
return input.Substring(0, index1); // treat as second
}

// Length of input to end of m2
len2 = index2 + m2.Length;

//Test for identical patterns
if (pattern1 == pattern2)
{
int[] indices = IndicesOf(input, pattern1);
// overlap, as both are the same
if (indices.Length == 1) return "";
return input.Substring(len1, indices[1] - len1);
}

// Not identical patterns. Test for overlap

// Test for overlap (index2 falls inside m1)
if (index2 >= index1 && index2 <= len1) return "";

// Test for overlap (index1 falls inside m2)

if (index1 >= index2 && index1 <= len2) return "";

// No overlap. See which one is first, and get value between

// m1 is first match
if (index2 < index1)
return input.Substring(len2, index1 - len2);

// m2 is first match
// Length of input to end of m1
len1 = index1 + m1.Length;
return input.Substring(len1, index2 - len1);
}

/// <summary>
/// Returns a Substring of a string
/// before the first occurrence of a pattern in the string
/// </summary>
/// <param name="input">string to evaluate</param>
/// <param name="pattern">pattern to match</param>
/// <returns>Substring of input string, starting from the beginning
/// of the string and ending before the first character of the
pattern</returns>
/// <remarks>If there is no match, returns the input string.
public static string Substring(string input, string pattern)
{
return Substring(input, pattern, true);
}

A couple of notes: You will need to reference the
System.Text.RegularExpressions NameSpace to use these. You may want to
change the names of the methods for clarity. I have them in a class for
doing Regular Expression functions, so the class name is sufficient for my
needs. Also, carefully examine the Substring method in particular. The rules
for it are fairly complex, and may not conform to the same rules in Delphi.
I have commented it quite a bit for clarity. It is not primarily concerned
about the order of the 2 patterns, unless one of them is not found. It
returns the entire string if neither of them is found. If only one pattern
is not found, it attempts first to use the order in which they appear, but
the rule changes if, for example, the first pattern matches, but at the end
of the string, or the second pattern matches, but at the beginning of the
string. In essence, it treats a non-match as if it were a blank string.

Of course, these methods could be expanded an extended quite a bit. Some of
them only look for a single match. But they should give you (or anyone) a
good starting point for your own class library.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

"zoro" <il****@gmail.com> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...

Hi,
I am new to C#, coming from Delphi. In Delphi, I am using a 3rd party
string handling library that includes some very useful string
functions, in particular I'm interested in BEFORE (return substring
before a pattern), AFTER (return substring after a pattern), and
BETWEEN (return substring between 2 patterns).
My questions are:
1. Can any tell me how I can implement such functionality in C#?
2. Is it possible to add/include function libraries to C~, and if so
how?

Thank you very much for your help.

Zoro.

Nov 17 '05 #4

zoro

Thank you very much for all the suggestions. It still looks very
complex for the simple functions I wanted:

AFTER returns the substring AFTER the pattern so:
str := AFTER('@', 'b*********@microsoft.com');
str = 'microsoft.com'

BEFORE returns the substring BEFORE the pattern so:
str := BEFORE('@', 'b*********@microsoft.com');
str = 'bill.gates

BETWEEN returns the substring BETWEEN 2 patterns so:
str := BETWEEN('@', '.', 'b*********@microsoft.com');
str = 'microsoft'

There must be a simpler way to achieve this in C# - surely?
Also, does anyone know of a third party library that will include such
functions?

Thanks again,

ilZoro.

Nov 17 '05 #5

zoro <il****@gmail.com> wrote:

Thank you very much for all the suggestions. It still looks very
complex for the simple functions I wanted:

AFTER returns the substring AFTER the pattern so:
str := AFTER('@', 'b*********@microsoft.com');
str = 'microsoft.com'

BEFORE returns the substring BEFORE the pattern so:
str := BEFORE('@', 'b*********@microsoft.com');
str = 'bill.gates

BETWEEN returns the substring BETWEEN 2 patterns so:
str := BETWEEN('@', '.', 'b*********@microsoft.com');
str = 'microsoft'

There must be a simpler way to achieve this in C# - surely?
All of those can be done with IndexOf very easily.
Also, does anyone know of a third party library that will include such
functions?

No, but there may be one around. However, it would be only a matter of
about five minutes to write you one for the above. What else would you
want?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #6

The Crow

public sealed class StringHelper
{

public static string Before(string pattern, string strLookup)
{
int index = strLookup.IndexOf(pattern);
if(index > -1)
return strLookup.SubString(0, index);
else
return null;
}

public static string After(string pattern, string strLookup)
{
int index = strLookup.IndexOf(pattern);
if(index > -1)
return strLookup.SubString(index, strLookup.Length - index);
else
return null;
}

public static string Between(string pattern1, string pattern2, string
strLookup)
{
int index1 = strLookup.IndexOf(pattern1);
int index2 = strLookup.IndexOf(pattern2);

if(index1 == -1 && index2 == -1) // if either is not found, return null
return null;
else if(index1 == -1)
return strLookup.SubString(index1, strLookup.Length - index); // if
only first pattern is found, return after.
else if(index2 == -1)
return strLookup.SubString(0, index2); // if only second pattern is
found, return before.
else
return strLookup.SubString(index1, (strLookup.Length - index1 -
index 2)); // else return between
}

}

if you want to use this class as a librar, create a new dynamic code library
project under Visual Studio, insert this class in your project, compile it,
and then use output dll in your desired projects by referencing it.

Nov 17 '05 #7

Well, you did say "pattern." I assumed you were talking about a pattern. In
any case, all you have to do is use the functions I wrote for you. Don't
know how I could have made it any easier.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

"zoro" <il****@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...

Thank you very much for all the suggestions. It still looks very
complex for the simple functions I wanted:

AFTER returns the substring AFTER the pattern so:
str := AFTER('@', 'b*********@microsoft.com');
str = 'microsoft.com'

BEFORE returns the substring BEFORE the pattern so:
str := BEFORE('@', 'b*********@microsoft.com');
str = 'bill.gates

BETWEEN returns the substring BETWEEN 2 patterns so:
str := BETWEEN('@', '.', 'b*********@microsoft.com');
str = 'microsoft'

There must be a simpler way to achieve this in C# - surely?
Also, does anyone know of a third party library that will include such
functions?

Thanks again,

ilZoro.

Nov 17 '05 #8

zoro

Thank you all for your help. Sorry, I didn't I didn't make myself
clear the first time.
Zoro.

Nov 17 '05 #9

zoro

Thanks crow - your solutions are exactly what I needed. But instead
of compiling this to a dll, shouldn't it be possible/desirable in C#
to add these functions to the system some how, by expanding the built
in string class?

Thanks,
Zoro.

Nov 17 '05 #10

The Crow

you may inherit from string and add this methods, but i think this would be
very bad idea. and you cant add static method definitions to String class.

Nov 17 '05 #11

<"The Crow" <q>> wrote:

you may inherit from string and add this methods, but i think this would be
very bad idea. and you cant add static method definitions to String class.

No, you can't derive from string - it's a sealed class.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #12

zoro <il****@gmail.com> wrote:

Thanks crow - your solutions are exactly what I needed. But instead
of compiling this to a dll, shouldn't it be possible/desirable in C#
to add these functions to the system some how, by expanding the built
in string class?

You can't change the string class, but by adding a DLL you effectively
are adding them to "the system" as far as the code which uses it is
concerned - you just need to add a reference to your library in the
same way that you add references to system libraries.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #13

Michael S

"zoro" <il****@gmail.com> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...

Hi,
I am new to C#, coming from Delphi.

Welcome!

I am also from the Delphi (and Borland C++) corner.
I think you'll find that the 'Hejlsberg-fenonemon' is very much present in
..NET. You'll learn C# in a jiffy! =)
But forget all you knew about strings. They are invariant in .NET and not as
cool as in Delphi.

I'm still waiting for .NET (and Java) to have a string class that have the
by-reference-but-with-copy-on-write-semantics as in Delphi. There is
something missing between String and StringBuilder. We sure need strings
like in Delphi for performance....

Anybody knows why we don't get such a class? If I think for 2 seconds I'd
imagine it would screw up the GC as every such string must be pinned to a
memory location. If anyone else could think for like 4 seconds or even a
minute, I would appriecate your input on why and why not.

I sure miss 'em....

Happy Strings
- Michael S

Nov 17 '05 #14

> Anybody knows why we don't get such a class? If I think for 2 seconds I'd

imagine it would screw up the GC as every such string must be pinned to a
memory location. If anyone else could think for like 4 seconds or even a
minute, I would appriecate your input on why and why not.
Ask Anders Hejlsberg. He led the team that created Delphi AND the Mcirosoft
..Net platform.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
A watched clock never boils.

"Michael S" <a@b.c> wrote in message
news:uE**************@TK2MSFTNGP09.phx.gbl... "zoro" <il****@gmail.com> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...
Hi,
I am new to C#, coming from Delphi.

Welcome!

I am also from the Delphi (and Borland C++) corner.
I think you'll find that the 'Hejlsberg-fenonemon' is very much present in
.NET. You'll learn C# in a jiffy! =)
But forget all you knew about strings. They are invariant in .NET and not
as cool as in Delphi.

I'm still waiting for .NET (and Java) to have a string class that have the
by-reference-but-with-copy-on-write-semantics as in Delphi. There is
something missing between String and StringBuilder. We sure need strings
like in Delphi for performance....

Anybody knows why we don't get such a class? If I think for 2 seconds I'd
imagine it would screw up the GC as every such string must be pinned to a
memory location. If anyone else could think for like 4 seconds or even a
minute, I would appriecate your input on why and why not.

I sure miss 'em....

Happy Strings
- Michael S

Nov 17 '05 #15

Michael S

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:uW*************@TK2MSFTNGP10.phx.gbl...

Anybody knows why we don't get such a class? If I think for 2 seconds I'd
imagine it would screw up the GC as every such string must be pinned to a
memory location. If anyone else could think for like 4 seconds or even a
minute, I would appriecate your input on why and why not.

Ask Anders Hejlsberg. He led the team that created Delphi AND the
Mcirosoft .Net platform.

No shit!
I've been a follower of Hejlsberg since Turbo Pascal..

<joke>But since I showed up in tiger-tanga-lingerie, he sorta stopped
calling me</joke>

But this is not the question. My question is why we don't have a
variant-copy-on-write-string in the .NET-framework..

Happy Coding
- Michael S

Nov 17 '05 #16

Are you imagining some sort of reference-counting scheme where a string
will only be copied if there is more than one reference to the string?
That doesn't play well with threading.

Nov 17 '05 #17

In a word, no, it's not desirable. You end up with a huge number of
simple functions that are relatively useless, because it's rare that
one of them will do the entire job. And once you need more than one
function call, you might as well use a regular expression, which often
can do the whole job.

Nov 17 '05 #18

Michael S

"kevin cline" <ke*********@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...

Are you imagining some sort of reference-counting scheme where a string
will only be copied if there is more than one reference to the string?
That doesn't play well with threading.

No, it doesn't play well with threading at all.
But I'm not dreaming. It is all there. And I don't take credit for it as it
has been in Delphi since 2.0. =)

Have a look how strings are done in Delphi and you'll see something neat.
Or don't. I'll do it for you...

I'm not saying that System.String should be replaced, but that a sorta
System.StringBuffer would be desirable.
I just picked the name from Java, just to make sure Javaites would get
really really confused...

StringBuffer o1 = "Hello World!" // o1 points to virtual memory of address
1000 and has a refcount of 1.
StringBuffer o2 = o1; // o2 now also points to the memory address of 1000
that keeps a refcount of 2. No chars hurt!
o2.CharAt[1] = 'a'; // Now a new string gets copied to the heap at address
2000 and points to 'Hallo World!".
StringBuffer o3 = o2; // o3 is simply a reference to address of 2000. No
chars was copied.

But there is more to strings in Delphi. A string in Delphi also keeps its
length.

o1 = "OK; // o1 still points to the memory of address 1000 containing "OKllo
World!"
o1 = "Now this is really cool"; // The allocated space of o1 cannot hold the
string. It is being copied to address of 3000.

There is (somewhat) no magic. This is how the structure works.

[32-bit refcount][32-bit allocated][32-bit length[0
depricated]][1][2][3][4]...[N] ascii characters.

o1 = "Get it?" //o1 does not reallocate. It stays at 3000 and contains "Get
it?s is really cool"

Hence the reference of o1 would point to address of 3000:
1, 23, 7 [points here]Get it?s is really cool

Also why Length(o1) in Delphi is actually nothing more than a single fetch
of the address with a -3 offset. No strlen needed at all.

Happy Strings
- Michael S

Nov 17 '05 #19

kevin cline <ke*********@gmail.com> wrote:

In a word, no, it's not desirable. You end up with a huge number of
simple functions that are relatively useless, because it's rare that
one of them will do the entire job. And once you need more than one
function call, you might as well use a regular expression, which often
can do the whole job.

Do you use regular expressions every time you need to do more than one
operation on a string then? I certainly don't. I'd rather see a few
simple operations than one regular expression which could take a while
to understand or even to write properly in the first place.

Regular expressions are great when they take the place of *complicated*
string processing, but when you've just got a few operations to
perform, I'll take the simplicity of straight string operations any
day.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #20

The Crow

how do you think about performance comparation between regular string
operations and regular expression? can we use regular expressions with no
performance consideration (is performance slow down is very little)?

Nov 17 '05 #21

<"The Crow" <q>> wrote:

how do you think about performance comparation between regular string
operations and regular expression? can we use regular expressions with no
performance consideration (is performance slow down is very little)?

It entirely depends what you're doing. In some situations, compiled
regular expressions will be faster than the same kind of operations
done just with String methods - at least without significant work.

In most cases, however, regular expressions are slower, sometimes quite
significantly.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #22

It's a well-known technique, and doesn't play well with threading
because the reference count has to be updated atomically.

Nov 17 '05 #23

Jon wrote:

kevin cline <ke*********@gmail.com> wrote:
In a word, no, it's not desirable. You end up with a huge number of
simple functions that are relatively useless, because it's rare that
one of them will do the entire job. And once you need more than one
function call, you might as well use a regular expression, which often
can do the whole job.
Do you use regular expressions every time you need to do more than one
operation on a string then?

Mostly, yes, I do. I've been using regular expressions for a long time
and it's easier for me to read and verify one regular expression than
to understand multiple calls to index and substring. Also, that sort
of string manipulation is very easy to get wrong.

I certainly don't. I'd rather see a few simple operations than one regular expression which could take a while
to understand or even to write properly in the first place.
With practice, you'll find that regular expressions are easy to
understand.
Regular expressions are great when they take the place of *complicated*
string processing, but when you've just got a few operations to
perform, I'll take the simplicity of straight string operations any
day.

As soon as you get to 'a few' operations, it's no longer simple. Such
code is quite prone to off-by-one errors, index out of range
exceptions, invalid argument exceptions, etc. It also tends to be
slower than a single regular expression match.

Nov 17 '05 #24

kevin cline <ke*********@gmail.com> wrote:

Do you use regular expressions every time you need to do more than one
operation on a string then?
Mostly, yes, I do. I've been using regular expressions for a long time
and it's easier for me to read and verify one regular expression than
to understand multiple calls to index and substring.

Have all the other engineers who might read your code also been using
regular expressions for that long?
Also, that sort of string manipulation is very easy to get wrong.
Whereas no-one ever gets regular expressions wrong, I suppose? ;)
I certainly don't. I'd rather see a few
simple operations than one regular expression which could take a while
to understand or even to write properly in the first place.
With practice, you'll find that regular expressions are easy to
understand.

Without practice, simple string calls are easy to understand, IME. Why
should anyone who has to read my code also have to have years of
experience with regular expressions?

Regular expressions are great when they take the place of *complicated*
string processing, but when you've just got a few operations to
perform, I'll take the simplicity of straight string operations any
day.

As soon as you get to 'a few' operations, it's no longer simple.

If it genuinely is "a few" (as opposed to several including a couple of
loops), it can still be very simple IMO.
Such code is quite prone to off-by-one errors, index out of range
exceptions, invalid argument exceptions, etc.
Likewise regular expressions are prone to forgetting to escape certain
characters, forgetting just which bits need matching, etc. They're also
prone to assumptions in terms of portability - not all regular
expression environments are the same, so you either have to limit
yourself to a basic core, or learn the extensions in each and remember
which platform you're dealing with. Of course, not all string-handling
libraries are the same either - but I've got the compiler and
intellisense to help me there.
It also tends to be slower than a single regular expression match.

That's not my experience in the benchmarks I've done on various
operations over the years (in response to newsgroup questions). It
depends what exactly is being done, but often "hard-coded" string
operations are significantly faster. That makes sense, as they're
(each) less generalised.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #25

The Crow

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

kevin cline <ke*********@gmail.com> wrote:
> Do you use regular expressions every time you need to do more than one
> operation on a string then?

Mostly, yes, I do. I've been using regular expressions for a long time
and it's easier for me to read and verify one regular expression than
to understand multiple calls to index and substring.

Have all the other engineers who might read your code also been using
regular expressions for that long?
Also, that sort of string manipulation is very easy to get wrong.

Whereas no-one ever gets regular expressions wrong, I suppose? ;)
I certainly don't. I'd rather see a few
> simple operations than one regular expression which could take a while
> to understand or even to write properly in the first place.

With practice, you'll find that regular expressions are easy to
understand.

Without practice, simple string calls are easy to understand, IME. Why
should anyone who has to read my code also have to have years of
experience with regular expressions?
> Regular expressions are great when they take the place of *complicated*
> string processing, but when you've just got a few operations to
> perform, I'll take the simplicity of straight string operations any
> day.

As soon as you get to 'a few' operations, it's no longer simple.

If it genuinely is "a few" (as opposed to several including a couple of
loops), it can still be very simple IMO.
Such code is quite prone to off-by-one errors, index out of range
exceptions, invalid argument exceptions, etc.

Likewise regular expressions are prone to forgetting to escape certain
characters, forgetting just which bits need matching, etc. They're also
prone to assumptions in terms of portability - not all regular
expression environments are the same, so you either have to limit
yourself to a basic core, or learn the extensions in each and remember
which platform you're dealing with. Of course, not all string-handling
libraries are the same either - but I've got the compiler and
intellisense to help me there.

"intellisense" is available only in .net platform.

It also tends to be slower than a single regular expression match.

That's not my experience in the benchmarks I've done on various
operations over the years (in response to newsgroup questions). It
depends what exactly is being done, but often "hard-coded" string
operations are significantly faster. That makes sense, as they're
(each) less generalised.

in my opinion, someone who has a little knowledge on regular expressions and
software engineering can sense where to use regular string operations or
regular expressions... if you ask me, ill choose expressing rather then
doing the work. doing the work is always more error prone.

Nov 17 '05 #26

<"The Crow" <q>> wrote:

Likewise regular expressions are prone to forgetting to escape certain
characters, forgetting just which bits need matching, etc. They're also
prone to assumptions in terms of portability - not all regular
expression environments are the same, so you either have to limit
yourself to a basic core, or learn the extensions in each and remember
which platform you're dealing with. Of course, not all string-handling
libraries are the same either - but I've got the compiler and
intellisense to help me there.

"intellisense" is available only in .net platform.

Call it what you like, many IDEs have the same sort of auto-completion
and prompting with documentation that VS.NET has. Eclipse's version is
actually rather better than VS.NET 2003's, in fact.

I believe that most developers on most platforms use an IDE which can
help them with basic string handling.

I believe that very few developers use an IDE which can help them
(without having to go to a different view/window/whatever) get regular
expressions right first time.

It also tends to be slower than a single regular expression match.

That's not my experience in the benchmarks I've done on various
operations over the years (in response to newsgroup questions). It
depends what exactly is being done, but often "hard-coded" string
operations are significantly faster. That makes sense, as they're
(each) less generalised.

in my opinion, someone who has a little knowledge on regular expressions and
software engineering can sense where to use regular string operations or
regular expressions... if you ask me, ill choose expressing rather then
doing the work. doing the work is always more error prone.

Of course, everyone in this thread probably thinks they can sense where
to use regular string operations and where to use regular expressions -
but come out with completely different answers.

And if you think that using a regular expression means you aren't doing
work, you're kidding yourself. There's a reason I see more questions
about regular expressions on the newsgroups than string operations -
and that reason is that regular expressions are relatively complex to
both read and write.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #27

Jon wrote:

kevin cline <ke*********@gmail.com> wrote:
Do you use regular expressions every time you need to do more than one
operation on a string then?
Mostly, yes, I do. I've been using regular expressions for a long time
and it's easier for me to read and verify one regular expression than
to understand multiple calls to index and substring.

Have all the other engineers who might read your code also been using
regular expressions for that long?

Also, that sort of string manipulation is very easy to get wrong.
Whereas no-one ever gets regular expressions wrong, I suppose? ;)

It's easier to get regular expressions right because they are usually
closer to the requirement. All I know is that I've seen a lot of buggy
string manipulation functions that could be easily performed with a
single regular expression.

With practice, you'll find that regular expressions are easy to
understand.

Without practice, simple string calls are easy to understand, IME.

Individually, they are trivial to understand. But it's not so easy to
understand the purpose of five or six of them in a row, and usually not
at all easy to verify that the code is doing what it is supposed to do.

which
should anyone who has to read my code also have to have years of
experience with regular expressions?

I generally assume the other programmers on my team are competent
enough to read the documentation of library functions. It's not rocket
science, just basic computer science. An hour of study will save you
hundreds of hours of programming and debugging in the future.

Nov 17 '05 #28

kevin cline <ke*********@gmail.com> wrote:

Also, that sort of string manipulation is very easy to get wrong.
Whereas no-one ever gets regular expressions wrong, I suppose? ;)

It's easier to get regular expressions right because they are usually
closer to the requirement. All I know is that I've seen a lot of buggy
string manipulation functions that could be easily performed with a
single regular expression.

And I've seen people going out of their way to use regular expressions
(often needing to ask for help because they can't get it right on their
own) when the code can be significantly simpler with just a few string
operations.

With practice, you'll find that regular expressions are easy to
understand.

Without practice, simple string calls are easy to understand, IME.

Individually, they are trivial to understand. But it's not so easy to
understand the purpose of five or six of them in a row, and usually not
at all easy to verify that the code is doing what it is supposed to do.

I see it's gone up from "more than one" to "five or six"...

Verification is necessary with either technique, and should involve
enough test cases to give confidence. I'd be a lot happier

which
should anyone who has to read my code also have to have years of
experience with regular expressions?

I generally assume the other programmers on my team are competent
enough to read the documentation of library functions.

I think it's far more likely that people will know the *basic* library
functions (including string manipulations) than that they'll know the
details of the regular expression dialect used on every platform they
happen to come across.

Even when you know regular expressions, when they become even slightly
non-trivial they take a while to understand, IMO.
It's not rocket science, just basic computer science. An hour of
study will save you hundreds of hours of programming and debugging in
the future.

I think we'll have to agree to disagree. Regular expressions certainly
have their place, but for me the bar for their use is much higher than
it is for you. I believe it's much easier to make a mistake -
particularly when changing the behaviour of a working regular
expression in a way which appears trivial at first sight, but where you
need to be careful about escaping, grouping etc.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #29