Ignoring spaces in regular expression matching

Mark Rae

Hi,

I'm trying to construct a RegEx pattern which will validate a string so that
it can contain:

only the numerical characters from 0 to 9 i.e. no decimal points, negative
signs, exponentials etc
only the 26 letters of the standard Western alphabet in either upper or
lower case
spaces i.e. ASCII character 32

I seem to be doing OK with the first two criteria, but am having trouble
with the space character.

E.g. the following works perfectly:

Regex.IsMatch("ThisIsThe2ndString", @"[^0-9][^a-z][^A-Z]")

However, this doesn't work:

Regex.IsMatch("This Is The 2nd String", @"[^0-9][^a-z][^A-Z]")

I've tried various combinations of [\s] and [^\s] but with little success.

However, the following works, though I don't really understand why:

Regex.IsMatch("This is the 2nd string", @"[^0-9][^a-z][^A-Z]",
RegexOptions.IgnoreCase)

Any assistance gratefully received.

Mark

May 21 '06 #1

Subscribe Post Reply

16659

Paul E Collins

Mark Rae wrote:

I'm trying to construct a RegEx pattern which will
validate a string so that it can contain [only digits.
letters and spaces]

I think you want something like this:
^[a-zA-Z0-9 ]*$
i.e. every character between ^ start and $ end must be in the [group],
and there can be * zero or more of them (you'd use + if you want at
least one character in there). Be aware that "\s" would match some
things that aren't spaces (like tabs and newlines).

Of course, if you're having special trouble with spaces, you could do
s.Replace(" ", "") first to get rid of them in your validator.

Finally, I'm not convinced that regexes are ideal in .NET for this
kind of trivial check (as opposed to something complicated like nested
expressions and optional segments), because they're a special library
call and not a native operator as in Perl, which I suspect you might
have come from. I expect a loop like this would be more efficient:

bool valid = true;
for (int i = 0; i < s.Length; i++)
{
if (!((s[i] >= 'A' && s[i] <= 'Z') || (s[i] >= 'a' && s[i] <= 'z')
|| (s[i] >= '0' && s[i] <= '9') || s[i] == ' '))
{
valid = false; break;
}
}

Eq.

May 21 '06 #2

Tasos Vogiatzoglou

string[] strs = new string[] { "ABC123", "ABC1.1", "ABC 123", "ABC 123
.." };

string srx = @"[^\.]+|[\w\s\d]+";
Regex rx = new Regex(srx,RegexOptions.ECMAScript);

foreach (string str in strs)
{
Console.WriteLine("{0} {1}", str,
rx.Match(str).Length==str.Length);
}

This works (if I understood correctly your problem). IsMatch returns
true for any match in the string so I don't think this is the one you
want.

Regards,
Tasos

May 21 '06 #3

Kevin Spencer

You can use a literal space in your character set:

(?i)[^a-z 0-9]

The "(?i)" indicates case-insensitivity. Note the literal space between
"a-z" and "0-9". This excludes the space character as well.

The "\s" indicates *any* white-space character, including such things as
tabs. If that is what you want, use:

(?i)[^a-z\s0-9]

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

The man who questions opinions is wise.
The man who quarrels with facts is a fool.

"Mark Rae" <ma**@markN-O-S-P-A-M.co.uk> wrote in message
news:eV**************@TK2MSFTNGP03.phx.gbl...

Hi,

I'm trying to construct a RegEx pattern which will validate a string so
that it can contain:

only the numerical characters from 0 to 9 i.e. no decimal points, negative
signs, exponentials etc
only the 26 letters of the standard Western alphabet in either upper or
lower case
spaces i.e. ASCII character 32

I seem to be doing OK with the first two criteria, but am having trouble
with the space character.

E.g. the following works perfectly:

Regex.IsMatch("ThisIsThe2ndString", @"[^0-9][^a-z][^A-Z]")

However, this doesn't work:

Regex.IsMatch("This Is The 2nd String", @"[^0-9][^a-z][^A-Z]")

I've tried various combinations of [\s] and [^\s] but with little success.

However, the following works, though I don't really understand why:

Regex.IsMatch("This is the 2nd string", @"[^0-9][^a-z][^A-Z]",
RegexOptions.IgnoreCase)

Any assistance gratefully received.

Mark

May 21 '06 #4

Mark Rae

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:eF**************@TK2MSFTNGP03.phx.gbl...

You can use a literal space in your character set:

(?i)[^a-z 0-9]

The "(?i)" indicates case-insensitivity. Note the literal space between
"a-z" and "0-9". This excludes the space character as well.

The "\s" indicates *any* white-space character, including such things as
tabs. If that is what you want, use:

(?i)[^a-z\s0-9]

Excellent! Thanks very much.

May 21 '06 #5

Mark Rae

"Tasos Vogiatzoglou" <tv*****@gmail.com> wrote in message
news:11**********************@j33g2000cwa.googlegr oups.com...

This works (if I understood correctly your problem).
It doesn't.
IsMatch returns true for any match in the string so I don't think this is
the one you
want.

There you go, then... :-)

May 21 '06 #6

Mark Rae

"Paul E Collins" <fi******************@CL4.org> wrote in message
news:CI******************************@bt.com...

I think you want something like this:
^[a-zA-Z0-9 ]*$
i.e. every character between ^ start and $ end must be in the [group], and
there can be * zero or more of them (you'd use + if you want at least one
character in there).
Doesn't work...
Of course, if you're having special trouble with spaces, you could do
s.Replace(" ", "") first to get rid of them in your validator.
I could do that, or even not do any validation at all...
Finally, I'm not convinced that regexes are ideal in .NET for this kind of
trivial check (as opposed to something complicated like nested expressions
and optional segments), because they're a special library call and not a
native operator as in Perl, which I suspect you might have come from.
I've never written a line of Perl in my life...
I expect a loop like this would be more efficient:

I wouldn't know...

May 21 '06 #7

Jon Skeet [C# MVP]

Mark Rae <ma**@markN-O-S-P-A-M.co.uk> wrote:

"Tasos Vogiatzoglou" <tv*****@gmail.com> wrote in message
news:11**********************@j33g2000cwa.googlegr oups.com...
This works (if I understood correctly your problem).

It doesn't.

When a proposed solution doesn't work, could you explain in what way?
It makes life a lot easier for people who want to make further
suggestions.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

May 21 '06 #8

Mark Rae

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

When a proposed solution doesn't work, could you explain in what way?
I'm afraid I can't in this case, other than to say it always seems to find a
match no matter what string I pass into it...

I simply don't know enough about regular expressions to make a valuable
response - I don't mind confessing that it remains one area of coding which
I find very difficult to get my head around, to the extent where I still
find it difficult to look at even the simplest of patterns and understand
instinctively what it's trying to do...
It makes life a lot easier for people who want to make further
suggestions.

I couldn't agree more! However, in this case, Kevin Spencer has solved my
problem completely.

May 21 '06 #9

Jon Skeet [C# MVP]

Mark Rae <ma**@markN-O-S-P-A-M.co.uk> wrote:

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
When a proposed solution doesn't work, could you explain in what way?
I'm afraid I can't in this case, other than to say it always seems to find a
match no matter what string I pass into it...

That's enough - just an example of something which should fail but
passes would be good.
I simply don't know enough about regular expressions to make a valuable
response

A sample which doesn't do what you want to is the most valuable
response you can make in this case :)

It makes life a lot easier for people who want to make further
suggestions.

I couldn't agree more! However, in this case, Kevin Spencer has solved my
problem completely.

Right. I'd still be interested in an example which should fail but
passes, so I can try to beef up my own regex experience.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

May 21 '06 #10

Mark Rae

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

I'm afraid I can't in this case, other than to say it always seems to
find a
match no matter what string I pass into it...
That's enough - just an example of something which should fail but
passes would be good.
I simply don't know enough about regular expressions to make a valuable
response

A sample which doesn't do what you want to is the most valuable
response you can make in this case :)

See the reply I'm referring to:
IsMatch returns true for any match in the string so I don't think this is
the
one you want.

That's correct - no matter what string I pass into it, it always returns
true...

May 21 '06 #11

Kevin Spencer

Hi Mark,

I may be able to help you there. It helps to understand how the Regular
Expressions Engine works. First, it evaluates a character at a time, and it
is procedural in nature. A regular expression is like a series of
instructions, rather than a real single pattern. In your case:

Regex.IsMatch("This is the 2nd string", @"[^0-9][^a-z][^A-Z]",
RegexOptions.IgnoreCase)
Basically, this is using character classes. A character class is a series of
tokens inside square brackets, and it can be translated as "this type of
character or this type of character or this type of character..." In other
words, multiple character types or literals are joined with an implicit "or"
operator:

[\dA!] literally means "any single digit or an 'A' or an '!' character".
Note that it also implies a singular value, that is, one character.
Quantifiers are used to indicate that anything in the character class are
repeated 0, 1 or more times, as in:

[\dA!] (any of these characters 1 time)
[\dA!]* (any of these characters 0 or more times)
[\dA!]+ (any of these characters 1 or more times)
etc.

The '^' is the logical "Not" operator, which means "Not any of these
characters."

So, you had at first "[^0-9]" (Not a digit between 0 and 9)
followed by "[^a-z]" (Not a character between a and z)
and followed by "[^A-Z]" (Not a character between A and Z)

Now, remember that it's looking for a match. A match satisfies *all* of the
criteria you specify, so you can think of this and joining all of these
character classes with "AND" as in:

"Not a digit between 0 and 9 AND not a character between a and z AND not a
character between A and Z."

Note that the space character is not any of those, so it's a match. Using
negation is tricky. In fact, *any* character that was NOT in any of those 3
character sets would be a match.

The character class is used to apply the same rules to a set of characters.
The only time you need to separate them into groups is when the rules
(specifically logical Not or quantifiers) do not apply the same to all of
the characters.

Also, as a regular expression is basically procedural (although it does
employ backtracking), you should be careful about the order of the matches.
The following 2 sets are NOT the same:

[\dA!][0X]
[0X][\dA!]

In the first case, "0X3A" would *not* match. In the second case it would.
This is because the string and the pattern are evaluated in sequence. One
term for this is "consumption" - a regular expression "consumes" a string as
it evaluates it.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

The man who questions opinions is wise.
The man who quarrels with facts is a fool.

"Mark Rae" <ma**@markN-O-S-P-A-M.co.uk> wrote in message
news:%2****************@TK2MSFTNGP03.phx.gbl... "Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
I'm afraid I can't in this case, other than to say it always seems to
find a
match no matter what string I pass into it...

That's enough - just an example of something which should fail but
passes would be good.
I simply don't know enough about regular expressions to make a valuable
response

A sample which doesn't do what you want to is the most valuable
response you can make in this case :)

See the reply I'm referring to:
IsMatch returns true for any match in the string so I don't think this is
the
one you want.

That's correct - no matter what string I pass into it, it always returns
true...

May 21 '06 #12

Mark Rae

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:OS**************@TK2MSFTNGP05.phx.gbl...

I may be able to help you there.

Very interesting - thanks.

I still find it really hard to get my head round it, though...

May 22 '06 #13

Jon Skeet [C# MVP]

Mark Rae <ma**@markN-O-S-P-A-M.co.uk> wrote:

IsMatch returns true for any match in the string so I don't think this is
the
one you want.

That's correct - no matter what string I pass into it, it always returns
true...

Well, I've only tried the version that Paul Collins gave (which you
replied to with the same "doesn't work" answer), and that seems to
work:

using System;
using System.Text.RegularExpressions;

class Test
{
static void Main()
{
Regex r = new Regex("^[a-zA-Z0-9 ]*$");
Console.WriteLine (r.IsMatch ("Hello"));
Console.WriteLine (r.IsMatch ("Hello there"));
Console.WriteLine (r.IsMatch ("Hell#o"));
}
}

Produces:
True
True
False
This is why it's important to give a specific example of something that
fails - preferrably with a short but complete program which
demonstrates what you've been trying it with.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

May 22 '06 #14

Kevin Spencer

Hi Mark,

You may find the following article informative:

http://www.codeproject.com/csharp/regex.asp

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

The man who questions opinions is wise.
The man who quarrels with facts is a fool.

"Mark Rae" <ma**@markN-O-S-P-A-M.co.uk> wrote in message
news:uC**************@TK2MSFTNGP05.phx.gbl...

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:OS**************@TK2MSFTNGP05.phx.gbl...
I may be able to help you there.

Very interesting - thanks.

I still find it really hard to get my head round it, though...

May 22 '06 #15

Mark Rae

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:us**************@TK2MSFTNGP03.phx.gbl...

You may find the following article informative:

http://www.codeproject.com/csharp/regex.asp

I love it - it's almost "RegEx for Dummies"... :-)

Just what I need!

May 22 '06 #16

by: Francesco | last post by:

I've created a form asking a user to enter a text; the user can press key or any other "strange" non printable char that i want to strip away from the entred text and replace it with a " "...

PHP

Dictionary that uses regular expressions

by: Erik Lechak | last post by:

Hello all, I wrote the code below. It is simply a dictionary that uses regular expressions to match keys. A quick look at _test() will give you an example. Is there a module that already...

Python

Request for Feedback; a module making it easier to use regular expressions.

by: Kenneth McDonald | last post by:

I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...

Python

Help needed with a regular expression

by: Neri | last post by:

Some document processing program I write has to deal with documents that have headers and footers that are unnecessary for the main processing part. Therefore, I'm using a regular expression to go...

C# / C Sharp

Regular expression optimization

by: Billa | last post by:

Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...

.NET Framework

Get regular expression

by: Mike | last post by:

I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...

C# / C Sharp

embedding executable code in a regular expression in Python

by: Avi Kak | last post by:

Folks, Does regular expression processing in Python allow for executable code to be embedded inside a regular expression? For example, in Perl the following two statements $regex =...

Python

How to write Regular Expression for recursive matching?

by: lisong | last post by:

Hi All, I have problem to split a string like this: 'abc.defg.hij.klmnop' and I want to get all substrings with only one '.' in mid. so the output I expect is : 'abc.defg', 'defg.hij',...

Python

Regular Expression - Matching Multiples of 3 Characters exactly.

by: blaine | last post by:

Hey everyone, For the regular expression gurus... I'm trying to write a string matching algorithm for genomic sequences. I'm pulling out Genes from a large genomic pattern, with certain start...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Ignoring spaces in regular expression matching

Similar topics