473,378 Members | 1,544 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

c# regex word boundaries

Hi All,
Being a bit of a newbie with regex, I am confused when using word boundaries.

For instance, I want to replace all the stand alone '.5k' that occur in an
input string, with 500. In other words

"this is a .5k example" goes to "this is a 500 example"

The replace should not touch '.5k' that occurs inside a word. For example:

"this 30.5k is not an example" should be unchanged.

So, I put together the regex below, thinking that the \b would match word
boundaries, and only replace stand alone occurences of '.5k'

Regex r = new Regex(@"\b\.5k\b");
MatchCollection mColl = r.Matches(txtInput.Text);
StringBuilder sb = new StringBuilder(txtInput.Text);
foreach (Match m in mColl)
{
sb.Remove(m.Index, 3);
sb.Insert(m.Index, "500");
}
txtResults.Text = sb.ToString();

(I used the stringbuilder Remove method, rather than the regex replace
method, since the documentation states that \b matches a backspace when used
in a replace operation.)

If the txtInput.Text is this

..5k is an example, not like this 30.5k but this .5k or this .5k

the txtResults.Text is this

..5k is an example, not like this 30500 but this .5k or this .5k

which is the complete opposite of what I would expect.

Also, if I replace the regex with this @"\B\.5k", the output from the above
code is
500 is an example, not like this 30.5k but this 500 or this 500

But doesn't \B mean that the match must not occur on a word boundary? (Is
not the change from a [space] to [5] a word boundary?)

I am more than willing to believe that the fault is my comprehension of this
stuff, but I am a bit stuck to see where I am going wrong at the moment.

So, any pointers as to where to look would be most appreciated. Many thanks
regards,
Gary
Nov 13 '06 #1
6 9632
..NET appears to use these symbols backwards. It appears .NET thinks \B is on
a word boundary and \b is not.

Very strange.

Ciaran O'Donnell

"Gary Bond" wrote:
Hi All,
Being a bit of a newbie with regex, I am confused when using word boundaries.

For instance, I want to replace all the stand alone '.5k' that occur in an
input string, with 500. In other words

"this is a .5k example" goes to "this is a 500 example"

The replace should not touch '.5k' that occurs inside a word. For example:

"this 30.5k is not an example" should be unchanged.

So, I put together the regex below, thinking that the \b would match word
boundaries, and only replace stand alone occurences of '.5k'

Regex r = new Regex(@"\b\.5k\b");
MatchCollection mColl = r.Matches(txtInput.Text);
StringBuilder sb = new StringBuilder(txtInput.Text);
foreach (Match m in mColl)
{
sb.Remove(m.Index, 3);
sb.Insert(m.Index, "500");
}
txtResults.Text = sb.ToString();

(I used the stringbuilder Remove method, rather than the regex replace
method, since the documentation states that \b matches a backspace when used
in a replace operation.)

If the txtInput.Text is this

.5k is an example, not like this 30.5k but this .5k or this .5k

the txtResults.Text is this

.5k is an example, not like this 30500 but this .5k or this .5k

which is the complete opposite of what I would expect.

Also, if I replace the regex with this @"\B\.5k", the output from the above
code is
500 is an example, not like this 30.5k but this 500 or this 500

But doesn't \B mean that the match must not occur on a word boundary? (Is
not the change from a [space] to [5] a word boundary?)

I am more than willing to believe that the fault is my comprehension of this
stuff, but I am a bit stuck to see where I am going wrong at the moment.

So, any pointers as to where to look would be most appreciated. Many thanks
regards,
Gary
Nov 13 '06 #2
Hi Ciaran,

Thanks for the quick reply.
I just tried this regex

Regex r = new Regex(@"\B\.5k\B")

in the code, thinking that \B might then work as the word boundary. But the
input string

..5k is an example, not like this 30.5k but this .5k or this .5k

is now totally untouched. So the above regex does not match anything. (and
in fact if I debug the variable mColl it has a count of zero - showing there
were no matches).

So, I don't quite understand that either. Thanks anyway for the help, but I
am still a bit confused - no change there then 8-)
cheers,
Gary
"Ciaran O''Donnell" wrote:
.NET appears to use these symbols backwards. It appears .NET thinks \B is on
a word boundary and \b is not.

Very strange.

Ciaran O'Donnell

"Gary Bond" wrote:
Hi All,
Being a bit of a newbie with regex, I am confused when using word boundaries.

For instance, I want to replace all the stand alone '.5k' that occur in an
input string, with 500. In other words

"this is a .5k example" goes to "this is a 500 example"

The replace should not touch '.5k' that occurs inside a word. For example:

"this 30.5k is not an example" should be unchanged.

So, I put together the regex below, thinking that the \b would match word
boundaries, and only replace stand alone occurences of '.5k'

Regex r = new Regex(@"\b\.5k\b");
MatchCollection mColl = r.Matches(txtInput.Text);
StringBuilder sb = new StringBuilder(txtInput.Text);
foreach (Match m in mColl)
{
sb.Remove(m.Index, 3);
sb.Insert(m.Index, "500");
}
txtResults.Text = sb.ToString();

(I used the stringbuilder Remove method, rather than the regex replace
method, since the documentation states that \b matches a backspace when used
in a replace operation.)

If the txtInput.Text is this

.5k is an example, not like this 30.5k but this .5k or this .5k

the txtResults.Text is this

.5k is an example, not like this 30500 but this .5k or this .5k

which is the complete opposite of what I would expect.

Also, if I replace the regex with this @"\B\.5k", the output from the above
code is
500 is an example, not like this 30.5k but this 500 or this 500

But doesn't \B mean that the match must not occur on a word boundary? (Is
not the change from a [space] to [5] a word boundary?)

I am more than willing to believe that the fault is my comprehension of this
stuff, but I am a bit stuck to see where I am going wrong at the moment.

So, any pointers as to where to look would be most appreciated. Many thanks
regards,
Gary
Nov 13 '06 #3
Hi Gary,

Try the following:

(?<!\d)\.\d+k

This uses a negative look-behind. The rules can be explained as:

A match is a dot followed by 1 or more number characters, followed by the
letter 'k',
ONLY if it is NOT immediately preceded by a number character (negative
look-behind)

--
HTH,

Kevin Spencer
Microsoft MVP
Ministry of Software Development
http://unclechutney.blogspot.com

If you have little, is that your lot?
"Gary Bond" <Ga***@community.nospamwrote in message
news:31**********************************@microsof t.com...
Hi Ciaran,

Thanks for the quick reply.
I just tried this regex

Regex r = new Regex(@"\B\.5k\B")

in the code, thinking that \B might then work as the word boundary. But
the
input string

.5k is an example, not like this 30.5k but this .5k or this .5k

is now totally untouched. So the above regex does not match anything. (and
in fact if I debug the variable mColl it has a count of zero - showing
there
were no matches).

So, I don't quite understand that either. Thanks anyway for the help, but
I
am still a bit confused - no change there then 8-)
cheers,
Gary
"Ciaran O''Donnell" wrote:
>.NET appears to use these symbols backwards. It appears .NET thinks \B is
on
a word boundary and \b is not.

Very strange.

Ciaran O'Donnell

"Gary Bond" wrote:
Hi All,
Being a bit of a newbie with regex, I am confused when using word
boundaries.

For instance, I want to replace all the stand alone '.5k' that occur in
an
input string, with 500. In other words

"this is a .5k example" goes to "this is a 500 example"

The replace should not touch '.5k' that occurs inside a word. For
example:

"this 30.5k is not an example" should be unchanged.

So, I put together the regex below, thinking that the \b would match
word
boundaries, and only replace stand alone occurences of '.5k'

Regex r = new Regex(@"\b\.5k\b");
MatchCollection mColl = r.Matches(txtInput.Text);
StringBuilder sb = new StringBuilder(txtInput.Text);
foreach (Match m in mColl)
{
sb.Remove(m.Index, 3);
sb.Insert(m.Index, "500");
}
txtResults.Text = sb.ToString();

(I used the stringbuilder Remove method, rather than the regex replace
method, since the documentation states that \b matches a backspace when
used
in a replace operation.)

If the txtInput.Text is this

.5k is an example, not like this 30.5k but this .5k or this .5k

the txtResults.Text is this

.5k is an example, not like this 30500 but this .5k or this .5k

which is the complete opposite of what I would expect.

Also, if I replace the regex with this @"\B\.5k", the output from the
above
code is
500 is an example, not like this 30.5k but this 500 or this 500

But doesn't \B mean that the match must not occur on a word boundary?
(Is
not the change from a [space] to [5] a word boundary?)

I am more than willing to believe that the fault is my comprehension of
this
stuff, but I am a bit stuck to see where I am going wrong at the
moment.

So, any pointers as to where to look would be most appreciated. Many
thanks
regards,
Gary

Nov 13 '06 #4
Gary Bond wrote:
Being a bit of a newbie with regex, I am confused when using word boundaries.
A word boundary appears between a word character (\w) and a non word
character (\W).
For instance, I want to replace all the stand alone '.5k' that occur in an
input string, with 500.
'.' is a non word character, '5' and 'k' a word character so there is
one word boundary in there, between '.' and '5'.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Nov 13 '06 #5
Hi Martin,

Many thanks for the help.

I think I see what you mean : the match does not work because the transition
from a '.' to a space is not a word boundary since they are both \W, (non
word) characters. Therefore my match can not work since '.5k' is never
surrounded by word boundaries.

Just to check that out, I tried the same regex, (@"\b\.5k\b"), on this string:
..5k is an example, not like this 30.5k but this a.5k or this .5k

and sure enough the answer was

..5k is an example, not like this 30500 but this a500 or this .5k

which makes sense now: the only \b word boundaries we are interested in are
between '0' and '.' in '30.5k', and between 'a' and '.' in 'a.5k'.

brilliant - thanks again,
cheers,
Gary.

"Martin Honnen" wrote:
Gary Bond wrote:
Being a bit of a newbie with regex, I am confused when using word boundaries.

A word boundary appears between a word character (\w) and a non word
character (\W).
For instance, I want to replace all the stand alone '.5k' that occur in an
input string, with 500.

'.' is a non word character, '5' and 'k' a word character so there is
one word boundary in there, between '.' and '5'.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Nov 13 '06 #6
Hi Kevin,

Brilliant, that seems to work fine, so that gets the problem sorted.

I also got my misunderstanding cleared up I think - see the answer from
Martin, below.

Thanks again,
cheers,
Gary.

"Kevin Spencer" wrote:
Hi Gary,

Try the following:

(?<!\d)\.\d+k

This uses a negative look-behind. The rules can be explained as:

A match is a dot followed by 1 or more number characters, followed by the
letter 'k',
ONLY if it is NOT immediately preceded by a number character (negative
look-behind)

--
HTH,

Kevin Spencer
Microsoft MVP
Ministry of Software Development
http://unclechutney.blogspot.com

If you have little, is that your lot?
"Gary Bond" <Ga***@community.nospamwrote in message
news:31**********************************@microsof t.com...
Hi Ciaran,

Thanks for the quick reply.
I just tried this regex

Regex r = new Regex(@"\B\.5k\B")

in the code, thinking that \B might then work as the word boundary. But
the
input string

.5k is an example, not like this 30.5k but this .5k or this .5k

is now totally untouched. So the above regex does not match anything. (and
in fact if I debug the variable mColl it has a count of zero - showing
there
were no matches).

So, I don't quite understand that either. Thanks anyway for the help, but
I
am still a bit confused - no change there then 8-)
cheers,
Gary
"Ciaran O''Donnell" wrote:
.NET appears to use these symbols backwards. It appears .NET thinks \B is
on
a word boundary and \b is not.

Very strange.

Ciaran O'Donnell

"Gary Bond" wrote:

Hi All,
Being a bit of a newbie with regex, I am confused when using word
boundaries.

For instance, I want to replace all the stand alone '.5k' that occur in
an
input string, with 500. In other words

"this is a .5k example" goes to "this is a 500 example"

The replace should not touch '.5k' that occurs inside a word. For
example:

"this 30.5k is not an example" should be unchanged.

So, I put together the regex below, thinking that the \b would match
word
boundaries, and only replace stand alone occurences of '.5k'

Regex r = new Regex(@"\b\.5k\b");
MatchCollection mColl = r.Matches(txtInput.Text);
StringBuilder sb = new StringBuilder(txtInput.Text);
foreach (Match m in mColl)
{
sb.Remove(m.Index, 3);
sb.Insert(m.Index, "500");
}
txtResults.Text = sb.ToString();

(I used the stringbuilder Remove method, rather than the regex replace
method, since the documentation states that \b matches a backspace when
used
in a replace operation.)

If the txtInput.Text is this

.5k is an example, not like this 30.5k but this .5k or this .5k

the txtResults.Text is this

.5k is an example, not like this 30500 but this .5k or this .5k

which is the complete opposite of what I would expect.

Also, if I replace the regex with this @"\B\.5k", the output from the
above
code is
500 is an example, not like this 30.5k but this 500 or this 500

But doesn't \B mean that the match must not occur on a word boundary?
(Is
not the change from a [space] to [5] a word boundary?)

I am more than willing to believe that the fault is my comprehension of
this
stuff, but I am a bit stuck to see where I am going wrong at the
moment.

So, any pointers as to where to look would be most appreciated. Many
thanks
regards,
Gary


Nov 13 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Marek Mand | last post by:
<script> var newval = ''; var name = 'marek mänd-österreich a'; // http://www.faqts.com/knowledge_base/view.phtml/aid/15940 correctedname = name.replace(/\b\w+b/g, function(word) { return ...
3
by: glevik | last post by:
Hello, Anyone that can think of a way to programmaticaly determine the word on an HTML page that the user clicked on will be my hero for life. Leo
5
by: MrNobody | last post by:
is there a simple way to make it so your regex only matches whole words? i was thinking simply something like: *match_string* but then I think it would fail if the word was at the beginning...
10
by: igor.kulkin | last post by:
I have a small utility program written in Python which works pretty slow so I've decided to implement it in C. I did some benchmarking of Python's code performance. One of the parts of the program...
0
by: Itanium | last post by:
Hi all. I need to recognize some special keywords in my app. I usually accomplish this task with a regex construction like this… \bkeyword\b …that means “match the keyword if it is preceded...
13
by: brad | last post by:
Still learning C++. I'm writing some regex using boost. It works great. Only thing is... this code seems slow to me compared to equivelent Perl and Python. I'm sure I'm doing something incorrect....
3
by: Peter Proost | last post by:
Hi group first of all I need to say that I almost never use regex hence my question may be stupid. I'm using regex to find all words that start with an @ in a string. But the regex that I figured...
3
by: Jeff | last post by:
I'm parsing this: name="value" and sometimes it looks like this: name2="value2 without the closing '"'. I don't want to capture the end quote.
31
by: raylopez99 | last post by:
I went through a bunch of Regex examples, and indeed it's quite powerful, including 'groups' using 'matches', word boundaries, lookahead matches, replacing and splitting text,etc--apparently...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.