473,651 Members | 2,716 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

RegEx : Match and replace term within HTML tags

I have a search app that searches local HTML files for a specified
term. I then display the pages that contain the term.

I would like to highlight the search term within the HTML when it is
viewed.

I have the following regular expression code:

string searchTerm = "(?<STARTTA G>(<[^>]*>.*))(?<MATCHT ERM>(" +
lastSearchTerm + "))(?<ENDTAG>(. *<[^>]*>))";

string replaceString = "${STARTTAG}<sp an
style=\"backgro und-color:#FFFFCC\" >${MATCHTERM} </span>${ENDTAG}" ;

Regex.Replace(h tmlBody, searchTerm, replaceString,
RegexOptions.Ig noreCase);

I am trying to match the search term within HTML tags. i.e.

<htmltag>search term</htmltag>

and then replace the search term with a span tag to color it, like so:

<htmltag><spa n
style=\"backgro und-color:#FFFFCC\" >searchterm</span></htmltag>

This works, but works inconsitently (and without a discernable pattern
when it fails).

So, does anyone see anything obviously wrong with my Regular
Expressions? I am pretty new to regular expressions, although I
usually know enough to get stuff done.

mike c
Nov 16 '05 #1
1 3268
Hi,
inline

"mike c" <m@foo.com> wrote in message
news:r1******** *************** *********@4ax.c om...
I have a search app that searches local HTML files for a specified
term. I then display the pages that contain the term.

I would like to highlight the search term within the HTML when it is
viewed.

I have the following regular expression code:

string searchTerm = "(?<STARTTA G>(<[^>]*>.*))(?<MATCHT ERM>(" +
lastSearchTerm + "))(?<ENDTAG>(. *<[^>]*>))";

string replaceString = "${STARTTAG}<sp an
style=\"backgro und-color:#FFFFCC\" >${MATCHTERM} </span>${ENDTAG}" ;

Regex.Replace(h tmlBody, searchTerm, replaceString,
RegexOptions.Ig noreCase);

I am trying to match the search term within HTML tags. i.e.

<htmltag>search term</htmltag>
Because of the .* (greedy) in ENDTAG it will match the last tag. Even if
you replace it with .*? (non-greedy) there are still some problems:

<h1> searchterm <b> searchterm </b> </h1>
<h1> searchterm <br> searchterm </h1>
<h1> searchterm searchterm </h1>

In all cases only one searchterm will be replaced.
If you have valid html, then you can say that a word isn't inside a tag if
the first following bracket is a < and not a >. So put together with a
positive lookahead this would become:

string searchTerm = lastSearchTerm + "(?=[^>]*<)";

string replaceString = "<span style=\"backgro und-color:#FFFFCC\" >"+
lastSearchTerm + "</span>";

It may still do wrong at title and scripts.

hth,
greetings


and then replace the search term with a span tag to color it, like so:

<htmltag><spa n
style=\"backgro und-color:#FFFFCC\" >searchterm</span></htmltag>

This works, but works inconsitently (and without a discernable pattern
when it fails).

So, does anyone see anything obviously wrong with my Regular
Expressions? I am pretty new to regular expressions, although I
usually know enough to get stuff done.

mike c

Nov 16 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1425
by: mike c | last post by:
I have a search app that searches local HTML files for a specified term. I then display the pages that contain the term. I would like to highlight the search term within the HTML when it is viewed. I have the following regular expression code: string searchTerm = "(?<STARTTAG>(<*>.*))(?<MATCHTERM>(" + lastSearchTerm + "))(?<ENDTAG>(.*<*>))";
3
2434
by: DDK | last post by:
I am trying to figure out how to Replace tags such as ... with the correct HTML <b>...</b> tags in C#. The code below works however only if one set of tags are found, if you have more than two sets of the tags within the text it renders from the first . Here is the code that doesn't quite work: //////////////////////////////////////////////////////////////////////////// /////
1
1593
by: George Durzi | last post by:
Consider this excerpt from some HTML. (This is a copy from View->Source, except for the comment) <TABLE WIDTH=100% CELLPADDING=0 CELLSPACING=0 border=0> <?xml version="1.0" encoding="UTF-16"?> <!-- need to extract whatever is here --> </TABLE> I need to extract all the HTML that would be in the <!-- need to extract whatever is here --> section. So I did the following.
1
2892
by: darrel | last post by:
I have some vb.net code that is running a regex, matching groups, and replacing them. I'm trying to come up with a simple script that will strip all attributes from all HTML tags. This is what I have: ============================================================= function stripAllAttributes(ByVal textToParse as String, ByVal tagToFind as String) as String
13
2361
by: Chris Lieb | last post by:
I am trying to write a regex that will parse BBcode into HTML using JavaScript. Everything was going smoothly using the string class replace() operator with regex's until I got to the list tag. Implementing the list tag itself was fairly easy. What was not was trying to handle the list items. For some reason, in BBcode, they didn't bother defining an end tag for a list item. I guess that they designed it with bad old HTML 3.2 in mind...
2
2871
by: Tim_Mac | last post by:
hi, i have a tricky problem and my regex expertise has reached its limit. i have read other posts on this newsgroup that pull out the plain text from a html string, but that won't work for me because i want to preserve the html, and replace some of the plain text. i basically want to show the user's search terms highlighted in the page, like google does, but i want to do this server side (i have the mechanics of intercepting the html...
2
6056
by: Craig Buchanan | last post by:
I have a HTML fragment that looks like this: <tr> <td valign="top" nowrap><span class="textBold">Property ID: </span></td> <td valign="top" nowrap colspan="4" bgcolor="#F0F0F0"><b>&nbsp;01-068-24-64-1024</b></td> </tr> I am trying to extract the '' part of it.
1
2722
by: =?Utf-8?B?QWxCcnVBbg==?= | last post by:
I have a regular expression for capturing all occurrences of words contained between {{ and }} in a file. My problem is I need to capture what is between those symbols. For instance, if I have tags such as {{FirstName}}, {{LastName}}, and {{Address}} placed in the file, I need to be able to capture the text strings of FirstName, LastName and Address, respectively. I'm sure it can be done with Regex as easily as finding the locations of...
1
12175
by: jonnyboy6969 | last post by:
Hi All Really hoping someone can help me out here with my deficient regex skills :) I have a function which takes a string of HTML and replaces a term (word or phrase) with a link. The pupose is that I seek out terms which are in a glossary on our site, and automatically link to this definition. Its slightly complex becase certain elements have to be ignored, for exampleI dont want to add links within existing links, or for example link...
0
8275
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8802
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
8465
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8579
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7297
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6158
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
1
2699
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1909
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1587
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.