By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,403 Members | 880 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,403 IT Pros & Developers. It's quick & easy.

Ignoring spaces in regular expression matching

P: n/a
Hi,

I'm trying to construct a RegEx pattern which will validate a string so that
it can contain:

only the numerical characters from 0 to 9 i.e. no decimal points, negative
signs, exponentials etc
only the 26 letters of the standard Western alphabet in either upper or
lower case
spaces i.e. ASCII character 32

I seem to be doing OK with the first two criteria, but am having trouble
with the space character.

E.g. the following works perfectly:

Regex.IsMatch("ThisIsThe2ndString", @"[^0-9][^a-z][^A-Z]")

However, this doesn't work:

Regex.IsMatch("This Is The 2nd String", @"[^0-9][^a-z][^A-Z]")

I've tried various combinations of [\s] and [^\s] but with little success.

However, the following works, though I don't really understand why:

Regex.IsMatch("This is the 2nd string", @"[^0-9][^a-z][^A-Z]",
RegexOptions.IgnoreCase)

Any assistance gratefully received.

Mark
May 21 '06 #1
Share this Question
Share on Google+
15 Replies


P: n/a
Mark Rae wrote:
I'm trying to construct a RegEx pattern which will
validate a string so that it can contain [only digits.
letters and spaces]


I think you want something like this:
^[a-zA-Z0-9 ]*$
i.e. every character between ^ start and $ end must be in the [group],
and there can be * zero or more of them (you'd use + if you want at
least one character in there). Be aware that "\s" would match some
things that aren't spaces (like tabs and newlines).

Of course, if you're having special trouble with spaces, you could do
s.Replace(" ", "") first to get rid of them in your validator.

Finally, I'm not convinced that regexes are ideal in .NET for this
kind of trivial check (as opposed to something complicated like nested
expressions and optional segments), because they're a special library
call and not a native operator as in Perl, which I suspect you might
have come from. I expect a loop like this would be more efficient:

bool valid = true;
for (int i = 0; i < s.Length; i++)
{
if (!((s[i] >= 'A' && s[i] <= 'Z') || (s[i] >= 'a' && s[i] <= 'z')
|| (s[i] >= '0' && s[i] <= '9') || s[i] == ' '))
{
valid = false; break;
}
}

Eq.
May 21 '06 #2

P: n/a
string[] strs = new string[] { "ABC123", "ABC1.1", "ABC 123", "ABC 123
.." };

string srx = @"[^\.]+|[\w\s\d]+";
Regex rx = new Regex(srx,RegexOptions.ECMAScript);

foreach (string str in strs)
{
Console.WriteLine("{0} {1}", str,
rx.Match(str).Length==str.Length);
}

This works (if I understood correctly your problem). IsMatch returns
true for any match in the string so I don't think this is the one you
want.

Regards,
Tasos

May 21 '06 #3

P: n/a
You can use a literal space in your character set:

(?i)[^a-z 0-9]

The "(?i)" indicates case-insensitivity. Note the literal space between
"a-z" and "0-9". This excludes the space character as well.

The "\s" indicates *any* white-space character, including such things as
tabs. If that is what you want, use:

(?i)[^a-z\s0-9]

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

The man who questions opinions is wise.
The man who quarrels with facts is a fool.

"Mark Rae" <ma**@markN-O-S-P-A-M.co.uk> wrote in message
news:eV**************@TK2MSFTNGP03.phx.gbl...
Hi,

I'm trying to construct a RegEx pattern which will validate a string so
that it can contain:

only the numerical characters from 0 to 9 i.e. no decimal points, negative
signs, exponentials etc
only the 26 letters of the standard Western alphabet in either upper or
lower case
spaces i.e. ASCII character 32

I seem to be doing OK with the first two criteria, but am having trouble
with the space character.

E.g. the following works perfectly:

Regex.IsMatch("ThisIsThe2ndString", @"[^0-9][^a-z][^A-Z]")

However, this doesn't work:

Regex.IsMatch("This Is The 2nd String", @"[^0-9][^a-z][^A-Z]")

I've tried various combinations of [\s] and [^\s] but with little success.

However, the following works, though I don't really understand why:

Regex.IsMatch("This is the 2nd string", @"[^0-9][^a-z][^A-Z]",
RegexOptions.IgnoreCase)

Any assistance gratefully received.

Mark

May 21 '06 #4

P: n/a
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:eF**************@TK2MSFTNGP03.phx.gbl...
You can use a literal space in your character set:

(?i)[^a-z 0-9]

The "(?i)" indicates case-insensitivity. Note the literal space between
"a-z" and "0-9". This excludes the space character as well.

The "\s" indicates *any* white-space character, including such things as
tabs. If that is what you want, use:

(?i)[^a-z\s0-9]


Excellent! Thanks very much.
May 21 '06 #5

P: n/a
"Tasos Vogiatzoglou" <tv*****@gmail.com> wrote in message
news:11**********************@j33g2000cwa.googlegr oups.com...
This works (if I understood correctly your problem).
It doesn't.
IsMatch returns true for any match in the string so I don't think this is
the one you
want.


There you go, then... :-)
May 21 '06 #6

P: n/a
"Paul E Collins" <fi******************@CL4.org> wrote in message
news:CI******************************@bt.com...
I think you want something like this:
^[a-zA-Z0-9 ]*$
i.e. every character between ^ start and $ end must be in the [group], and
there can be * zero or more of them (you'd use + if you want at least one
character in there).
Doesn't work...
Of course, if you're having special trouble with spaces, you could do
s.Replace(" ", "") first to get rid of them in your validator.
I could do that, or even not do any validation at all...
Finally, I'm not convinced that regexes are ideal in .NET for this kind of
trivial check (as opposed to something complicated like nested expressions
and optional segments), because they're a special library call and not a
native operator as in Perl, which I suspect you might have come from.
I've never written a line of Perl in my life...
I expect a loop like this would be more efficient:


I wouldn't know...
May 21 '06 #7

P: n/a
Mark Rae <ma**@markN-O-S-P-A-M.co.uk> wrote:
"Tasos Vogiatzoglou" <tv*****@gmail.com> wrote in message
news:11**********************@j33g2000cwa.googlegr oups.com...
This works (if I understood correctly your problem).


It doesn't.


When a proposed solution doesn't work, could you explain in what way?
It makes life a lot easier for people who want to make further
suggestions.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
May 21 '06 #8

P: n/a
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
When a proposed solution doesn't work, could you explain in what way?
I'm afraid I can't in this case, other than to say it always seems to find a
match no matter what string I pass into it...

I simply don't know enough about regular expressions to make a valuable
response - I don't mind confessing that it remains one area of coding which
I find very difficult to get my head around, to the extent where I still
find it difficult to look at even the simplest of patterns and understand
instinctively what it's trying to do...
It makes life a lot easier for people who want to make further
suggestions.


I couldn't agree more! However, in this case, Kevin Spencer has solved my
problem completely.
May 21 '06 #9

P: n/a
Mark Rae <ma**@markN-O-S-P-A-M.co.uk> wrote:
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
When a proposed solution doesn't work, could you explain in what way?
I'm afraid I can't in this case, other than to say it always seems to find a
match no matter what string I pass into it...


That's enough - just an example of something which should fail but
passes would be good.
I simply don't know enough about regular expressions to make a valuable
response


A sample which doesn't do what you want to is the most valuable
response you can make in this case :)
It makes life a lot easier for people who want to make further
suggestions.


I couldn't agree more! However, in this case, Kevin Spencer has solved my
problem completely.


Right. I'd still be interested in an example which should fail but
passes, so I can try to beef up my own regex experience.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
May 21 '06 #10

P: n/a
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
I'm afraid I can't in this case, other than to say it always seems to
find a
match no matter what string I pass into it...
That's enough - just an example of something which should fail but
passes would be good.
I simply don't know enough about regular expressions to make a valuable
response


A sample which doesn't do what you want to is the most valuable
response you can make in this case :)


See the reply I'm referring to:
IsMatch returns true for any match in the string so I don't think this is
the
one you want.


That's correct - no matter what string I pass into it, it always returns
true...
May 21 '06 #11

P: n/a
Hi Mark,

I may be able to help you there. It helps to understand how the Regular
Expressions Engine works. First, it evaluates a character at a time, and it
is procedural in nature. A regular expression is like a series of
instructions, rather than a real single pattern. In your case:
Regex.IsMatch("This is the 2nd string", @"[^0-9][^a-z][^A-Z]",
RegexOptions.IgnoreCase)
Basically, this is using character classes. A character class is a series of
tokens inside square brackets, and it can be translated as "this type of
character or this type of character or this type of character..." In other
words, multiple character types or literals are joined with an implicit "or"
operator:

[\dA!] literally means "any single digit or an 'A' or an '!' character".
Note that it also implies a singular value, that is, one character.
Quantifiers are used to indicate that anything in the character class are
repeated 0, 1 or more times, as in:

[\dA!] (any of these characters 1 time)
[\dA!]* (any of these characters 0 or more times)
[\dA!]+ (any of these characters 1 or more times)
etc.

The '^' is the logical "Not" operator, which means "Not any of these
characters."

So, you had at first "[^0-9]" (Not a digit between 0 and 9)
followed by "[^a-z]" (Not a character between a and z)
and followed by "[^A-Z]" (Not a character between A and Z)

Now, remember that it's looking for a match. A match satisfies *all* of the
criteria you specify, so you can think of this and joining all of these
character classes with "AND" as in:

"Not a digit between 0 and 9 AND not a character between a and z AND not a
character between A and Z."

Note that the space character is not any of those, so it's a match. Using
negation is tricky. In fact, *any* character that was NOT in any of those 3
character sets would be a match.

The character class is used to apply the same rules to a set of characters.
The only time you need to separate them into groups is when the rules
(specifically logical Not or quantifiers) do not apply the same to all of
the characters.

Also, as a regular expression is basically procedural (although it does
employ backtracking), you should be careful about the order of the matches.
The following 2 sets are NOT the same:

[\dA!][0X]
[0X][\dA!]

In the first case, "0X3A" would *not* match. In the second case it would.
This is because the string and the pattern are evaluated in sequence. One
term for this is "consumption" - a regular expression "consumes" a string as
it evaluates it.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

The man who questions opinions is wise.
The man who quarrels with facts is a fool.

"Mark Rae" <ma**@markN-O-S-P-A-M.co.uk> wrote in message
news:%2****************@TK2MSFTNGP03.phx.gbl... "Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
I'm afraid I can't in this case, other than to say it always seems to
find a
match no matter what string I pass into it...


That's enough - just an example of something which should fail but
passes would be good.
I simply don't know enough about regular expressions to make a valuable
response


A sample which doesn't do what you want to is the most valuable
response you can make in this case :)


See the reply I'm referring to:
IsMatch returns true for any match in the string so I don't think this is
the
one you want.


That's correct - no matter what string I pass into it, it always returns
true...

May 21 '06 #12

P: n/a
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:OS**************@TK2MSFTNGP05.phx.gbl...
I may be able to help you there.


Very interesting - thanks.

I still find it really hard to get my head round it, though...
May 22 '06 #13

P: n/a
Mark Rae <ma**@markN-O-S-P-A-M.co.uk> wrote:
IsMatch returns true for any match in the string so I don't think this is
the
one you want.


That's correct - no matter what string I pass into it, it always returns
true...


Well, I've only tried the version that Paul Collins gave (which you
replied to with the same "doesn't work" answer), and that seems to
work:

using System;
using System.Text.RegularExpressions;

class Test
{
static void Main()
{
Regex r = new Regex("^[a-zA-Z0-9 ]*$");
Console.WriteLine (r.IsMatch ("Hello"));
Console.WriteLine (r.IsMatch ("Hello there"));
Console.WriteLine (r.IsMatch ("Hell#o"));
}
}

Produces:
True
True
False
This is why it's important to give a specific example of something that
fails - preferrably with a short but complete program which
demonstrates what you've been trying it with.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
May 22 '06 #14

P: n/a
Hi Mark,

You may find the following article informative:

http://www.codeproject.com/csharp/regex.asp

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

The man who questions opinions is wise.
The man who quarrels with facts is a fool.

"Mark Rae" <ma**@markN-O-S-P-A-M.co.uk> wrote in message
news:uC**************@TK2MSFTNGP05.phx.gbl...
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:OS**************@TK2MSFTNGP05.phx.gbl...
I may be able to help you there.


Very interesting - thanks.

I still find it really hard to get my head round it, though...

May 22 '06 #15

P: n/a
"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:us**************@TK2MSFTNGP03.phx.gbl...
You may find the following article informative:

http://www.codeproject.com/csharp/regex.asp


I love it - it's almost "RegEx for Dummies"... :-)

Just what I need!
May 22 '06 #16

This discussion thread is closed

Replies have been disabled for this discussion.