By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,057 Members | 1,249 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,057 IT Pros & Developers. It's quick & easy.

Regex - escaping the dot

P: n/a
Hi,

I am using the following pattern:
"\\b" + MySttring + "\\b"

If MyString is "one", this should pick up whole words like "one".

The problem is, it will pick up also the word: "one.two" How should I modify
the patter to pickup only "one"?

Thanks,

Lubomir
Sep 29 '06 #1
Share this Question
Share on Google+
6 Replies


P: n/a
Lubomir wrote:
I am using the following pattern:
"\\b" + MySttring + "\\b"
Bizarre, looking for a backspace (\x08) followed by "one" followed by a
backspace (\x08)?

Maybe you want to be searching for whitespace before and after mySttring?

Maybe I'm misunderstanding what you're trying to do?

~Jason

--
Sep 29 '06 #2

P: n/a
The "\\b" should be sign for a word boundary, or am I wrong?

So expression like @"\bone\b" should pick up only whole words "one" from
input text. Not the words "one.two"

Here is what I have:
reg = new Regex("\\bInPutString\\b", RegexOptions.ExplicitCapture);
reg.IsMatch(MyString) === returns true for "one.two" string

It seems it isn't picked up the whole word but was searching for a substring
also.

Thanks,
Lubomir
"Jason Gurtz" wrote:
Lubomir wrote:
I am using the following pattern:
"\\b" + MySttring + "\\b"

Bizarre, looking for a backspace (\x08) followed by "one" followed by a
backspace (\x08)?

Maybe you want to be searching for whitespace before and after mySttring?

Maybe I'm misunderstanding what you're trying to do?

~Jason

--
Sep 29 '06 #3

P: n/a
Jim Hollenhorst wrote a great utility for writing & debugging regular
expressions.

http://www.codeproject.com/dotnet/expresso.asp

-- Mark

"Lubomir" <Lu*****@discussions.microsoft.comwrote in message
news:A0**********************************@microsof t.com...
Hi,

I am using the following pattern:
"\\b" + MySttring + "\\b"

If MyString is "one", this should pick up whole words like "one".

The problem is, it will pick up also the word: "one.two" How should I
modify
the patter to pickup only "one"?

Thanks,

Lubomir

Sep 30 '06 #4

P: n/a
P.S. The latest version is here.

http://www.ultrapico.com/ExpressoDownload.htm

-- Mark

"Mark Jerde" <Ma*******@nospam.nospamwrote in message
news:eV**************@TK2MSFTNGP04.phx.gbl...
Jim Hollenhorst wrote a great utility for writing & debugging regular
expressions.

http://www.codeproject.com/dotnet/expresso.asp

-- Mark

"Lubomir" <Lu*****@discussions.microsoft.comwrote in message
news:A0**********************************@microsof t.com...
>Hi,

I am using the following pattern:
"\\b" + MySttring + "\\b"

If MyString is "one", this should pick up whole words like "one".

The problem is, it will pick up also the word: "one.two" How should I
modify
the patter to pickup only "one"?

Thanks,

Lubomir


Sep 30 '06 #5

P: n/a
Hi Lubomir,

First, yes, "\b" (I'm assuming you escaped the backslash for C# syntax)
means "word boundary." However, I believe your understanding of "word
boundary" is the source of your confusion.

Remember that Regular Expressions deal with characters, not language. A
Regular Expression "word" character is an alpha-numeric character (letters
and digits). The shorthand character class "\w" matches all word characters.
All other characters are "non-word" characters. The shorthand character
class "\W" will match all non-word characters. The anchor "\b" indicates
that a match sequence of characters either begins or ends (depending upon
the position of the anchor in the regular expression) at a "word boundary".
This is defined as either a non-word character, or the beginning or end of
the string being evaluated. Therefore, a period being a "non-word"
character, defines the beginning or end of a match.

Now, I'm trying to understand what you meant by what you said, because the
character sequence "one.two" is *not* a match. Only the character sequence
(or substring, if you will) "one" in that character sequence is a match.
From the period on is not part of the match. What confuses me is your
assertion that the regular expression "picks up...the words 'one.two'. I
*think* you are saying that the character sequence "one" in the sequence
"one.two" is matched, and that because these look like 2 words separated by
a period to you. But a string is simply a sequence of characters, with no
significance, other than their individual values and/or character classes. I
hope you understand what I'm trying to explain.

So, going out on a limb just a bit, I'm guessing that you only want
character sequences such as "xyz" (to make the example look less like a
human word) to match if they are at the beginning of a string, at the end of
the string, and, if other characters are in the string, are separated from
these other characters by a white-space character (space, newline, tab,
etc).

If so, your regular expression string would be:

(?m)(?<=^|\s+)xyz(?=$|\s)

Let me explain this:

The "(?m)" modifier indicates that the '^' (Beginnning of string) and '$'
(End of string) also match at the beginning and end of lines within a
string. I included that because I do not know whether your text will have
multiple lines in it.

I replaced your anchors with a positive look-ahead expression and a positive
look-behind expression. The reason for the look-arounds is that an anchor
defines a position, not a match. In other words, an anchor is a
non-capturing expression. Look-arounds are also non-capturing expressions,
which means that anything which matches the expression in the look-around is
not part of the match, but defines the beginning or end of a match.

(?<=^|\s+) is a positive look-behind. It defines the start of a match as
being one of 2 possible alternatives - the beginning of a line, or a
white-space character.

(?=$|\s+) is a positive look-ahead. It defines the end of a match as being
one of 2 possible alternatives - the end of a line, or a white-space
character.

Therefore, for example, consider the following multi-line string:

xyz
xyz.two
xyz two three.xyz
three two xyz zero
zero xyz

In the above string, there are 4 matches using the above regular expression.
In line 1, the entire sequence "xyz" is a match. There is no match in line
2, because the sequence "xyz" is at the beginning of a line, but is not
followed by a white-space character (Both look-arounds must be met). In line
3, the first "xyz" is captured, because it is at the beginning of the line,
and followed by a white-space character. The second "xyz" is *not* a match,
because it is preceded by a non-white-space character. In line 4, "xyz" is a
match because it has a white-space character both before and after it. And
in line 5, "xyz" is a match because it is preceded by a white-space
character, and is at the end of the line.

--
HTH,

Kevin Spencer
Microsoft MVP
Software Composer
http://unclechutney.blogspot.com

A watched clock never boils.

"Lubomir" <Lu*****@discussions.microsoft.comwrote in message
news:51**********************************@microsof t.com...
The "\\b" should be sign for a word boundary, or am I wrong?

So expression like @"\bone\b" should pick up only whole words "one" from
input text. Not the words "one.two"

Here is what I have:
reg = new Regex("\\bInPutString\\b", RegexOptions.ExplicitCapture);
reg.IsMatch(MyString) === returns true for "one.two" string

It seems it isn't picked up the whole word but was searching for a
substring
also.

Thanks,
Lubomir
"Jason Gurtz" wrote:
>Lubomir wrote:
I am using the following pattern:
"\\b" + MySttring + "\\b"

Bizarre, looking for a backspace (\x08) followed by "one" followed by a
backspace (\x08)?

Maybe you want to be searching for whitespace before and after mySttring?

Maybe I'm misunderstanding what you're trying to do?

~Jason

--

Oct 1 '06 #6

P: n/a
Thanks for detailed answers.
Lubomir
Oct 2 '06 #7

This discussion thread is closed

Replies have been disabled for this discussion.