473,388 Members | 1,939 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

Replacing whole word using regex in C#

Hi All

Really hoping someone can help me out here with my deficient regex skills :)

I have a function which takes a string of HTML and replaces a term (word or phrase) with a link. The pupose is that I seek out terms which are in a glossary on our site, and automatically link to this definition. Its slightly complex becase certain elements have to be ignored, for exampleI dont want to add links within existing links, or for example link terms contained in e.g. <h1><h1> tags.
Anyway I have a function which I didnt actually write, hense the problem I have in modifying it.

The problem I have is that I dont want it to replace terms which are not WHOLE words. So for example if I an searching for the term "fund", it current does the replace if it finds the word "funds".

I have tried using the regex \b escape, but this doesnt seem to work.

Ay help would be much much appreciated.

Thanks in advance
Expand|Select|Wrap|Line Numbers
  1. // Replaces all instances of text match in HTML string, ignoring instances in HTML 
  2. #region public static string PlainTextReplace(string html, string oldString, string newString, string Definition)
  4. // Regex matches for PlainTextReplace
  5. static Regex rxPlainText = new Regex(@"^[^\<]+", RegexOptions.IgnoreCase);
  6. static Regex rxTag = new Regex(@"</?\s*(?'tagname'[^>\s]+).*?>", RegexOptions.Compiled);
  7. static Regex[] rxForbiddenTags = new Regex[]{
  8. new Regex(@"^h\d$", RegexOptions.Compiled), // Matches <h?>
  9. new Regex("^a$", RegexOptions.Compiled)    // Matches <a>
  10. }; 
  12. public static string PlainTextReplace(string html, string oldString, string urlString, string Definition)
  13. {
  14.     int iStringPos=0;
  15.     Stack tagStack = new Stack();
  16.     StringBuilder sbResult = new StringBuilder();
  17.     Match match;
  18.     while (iStringPos < html.Length)
  19.     {
  20.         bool bContainsForbiddenTag = false;
  21.         IEnumerator enumTags = tagStack.GetEnumerator();
  23.         while (enumTags.MoveNext())
  24.         {
  25.             string sCurrentTag = (string) enumTags.Current;
  26.             foreach (Regex rxForbiddenTag in rxForbiddenTags) // loop through all enclosing tags and check for forbidden ones.
  27.             {
  28.                 match = rxForbiddenTag.Match(sCurrentTag);
  29.                 if (match.Success)
  30.                 {
  31.                     bContainsForbiddenTag = true;
  32.                     break;
  33.                 }
  34.             }
  35.             if (bContainsForbiddenTag)
  36.                 break;
  37.         }
  39.         //if (tagStack.Count == 0) // only perform replacement at tag depth 0.
  40.         if (!bContainsForbiddenTag) // Ignores tag depth. Skips all text enclosed in one or more forbidden tags.
  41.         {
  42.             match = rxPlainText.Match(html, iStringPos, html.Length - iStringPos);
  43.             if (match.Success)
  44.             {
  45.                 string searchString = match.Value;
  46.                 int index = searchString.ToLower().IndexOf(oldString.ToLower());
  47.                 if (index != -1)
  48.                 {
  49.                     searchString = "<a href=\"/" + Globals.SiteAlias + "/jargon-" + urlString + ".aspx\" class=\"jargon\" title=\"" + Definition + "\">" + searchString.Substring(index, oldString.Length) + "</a>";
  50.                 }
  52.                 // Do the replace and move on.
  53.                 sbResult.Append( Regex.Replace(match.Value, oldString, searchString, RegexOptions.IgnoreCase) );
  54.                 // THIS DOESNT WORK
  55.                 //sbResult.Append( Regex.Replace(match.Value, @"\b" + oldString + "\b", searchString, RegexOptions.IgnoreCase) );
  56.                 iStringPos = match.Index + match.Length;
  57.             }
  58.         }
  60.         match = rxTag.Match(html, iStringPos, html.Length - iStringPos);
  61.         if (match.Success)
  62.         {
  63.             if (match.Value.StartsWith("</"))
  64.             {
  65.                 try
  66.                 {
  67.                     if(match.Groups["tagname"].Value.ToLower().Trim().Equals(((string) tagStack.Peek())))
  68.                         tagStack.Pop();    
  69.                 }
  70.                 catch
  71.                 {
  73.                 }                    
  74.             }
  75.             else if (match.Value.EndsWith("/>") || match.Value.StartsWith("<!--"))
  76.             {
  77.                 // ignore
  78.             }
  79.             else
  80.             {
  81.                 tagStack.Push(match.Groups["tagname"].Value.ToLower().Trim());
  82.             }
  83.             sbResult.Append( html.Substring(iStringPos, match.Index + match.Length - iStringPos));
  84.             iStringPos = match.Index + match.Length;
  85.         }
  87.     }
  88.     return sbResult.ToString();
  89. }
  90. #endregion
Oct 2 '07 #1
1 12112

sbResult.Append(Regex.Replace(match.Value, "\\b" + oldString + "\\b", searchString, RegexOptions.IgnoreCase));

the \\b will work :)

Gaurav Bhatt
<link removed>
Apr 27 '10 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

by: Christopher R. Barry | last post by:
I need to search and replace multiple words in one pass of an input stream or string. For example, given the input: "The quick brown fox jumped over the lazy dog's back" and given the...
by: leeonions | last post by:
Hi there, i am trying to use regular expressions to search through a text string and replace a given whole word. take the string = "The matsat on the mat!" (bad example i know) i want to...
by: leeonions | last post by:
Hi there, i am trying to use regular expressions to search through a text string and replace a given whole word. take the string = "The matsat on the mat!" (bad example i know) i want to...
by: shonend | last post by:
I am trying to extract the pattern like this : "SUB: some text LOT: one-word" Described, "SUB" and "LOT" are key words; I want those words, everything in between and one word following the...
by: seannakasone | last post by:
Is there a way to search a string for a whole word w/o using a regular expression? i.e. mystring.IndexOf(mypattern, *WholeWord); The reason i rather not use a regular expression is because...
by: MrNobody | last post by:
is there a simple way to make it so your regex only matches whole words? i was thinking simply something like: *match_string* but then I think it would fail if the word was at the beginning...
by: Gary Bond | last post by:
Hi All, Being a bit of a newbie with regex, I am confused when using word boundaries. For instance, I want to replace all the stand alone '.5k' that occur in an input string, with 500. In other...
by: wildman | last post by:
RE: Replacing Text without changing case?? This code works great, but case has to be exact. Research.Text = Research.Text.Replace(textboxSearch.Text, "<b>" + textboxSearch.Text + "</b>") ...
by: Peng Yu | last post by:
Hi, The following code snippet is from /usr/bin/rpl. I would like the it to match a word, for example, "abc" in ":abc:". But the current one would not match "abc" in ":abc:". I tried to modify...
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.