473,511 Members | 15,046 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

HTML parsing suggestions welcome

I need to parse HTML output and find all instances of a word/phrase and then
convert it to a link.

We have a reasonably large product catalogue. If a particular product page
contains the name of another product I want to convert the name into a link
pointing to the page of that product. As the pages use mixed static and
dynamic content it makes sense to parse and convert the HTML stream just
before it is rendered.

I would like certain pages in the website hierarchy to be exempt from the
parsing process.

My questions are: How could I implement something like this across the
website heirarchy exempting certain pages as I go? Global.asax?Also should I
be using regular expressions for parsing HTML or are there too many pitfalls?
Can you suggest a better way of doing this?

Thanks in advance.
Dec 8 '05 #1
0 1002

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
4301
by: KC | last post by:
I have written a parser using htmllib.HTMLParser and it functions fine unless the HTML is malformed. For example, is some instances, the provider of the HTML leaves out the <TR> tags but includes...
0
1572
by: Fuzzyman | last post by:
I am trying to parse an HTML page an only modify URLs within tags - e.g. inside IMG, A, SCRIPT, FRAME tags etc... I have built one that works fine using the HTMLParser.HTMLParser and it works...
10
29415
by: Curtis | last post by:
Does anyone have any good examples of parsing WebPages in VB.Net. My application needs to get information from certain HTML tables and I haven't been able to find a good way to approach the...
59
6941
by: Lennart Björk | last post by:
Hi All, I have a tiny program: <!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <title>MyTitle</title> <meta...
4
6824
by: Neil.Smith | last post by:
I can't seem to find any references to this, but here goes: In there anyway to parse an html/aspx file within an asp.net application to gather a collection of controls in the file. For instance...
4
1343
by: gert | last post by:
Anybody who is interested in a sql client with a html interface http://sourceforge.net/projects/dfo/ db7 is for python based on cherrypy, code is straightforward and easy to modify so it can...
0
1163
by: bharathitm | last post by:
I'm working on regular expressions to parse html tags into the wiki syntax. i.e. for example, if i encounter text like - some <bmore </ btext, my regular expression should be able to convert that...
26
3275
by: Ramon F Herrera | last post by:
http://groups.google.com/group/comp.lang.c/browse_frm/thread/86a3ddf0724d9630/4e38340aa824bee0?lnk=gst&q=how+to+best+parse+a+CSV&rnum=1#4e38340aa824bee0 http://tinyurl.com/29q4kf Michael & Paul...
3
4571
by: =?Utf-8?B?RGFuYQ==?= | last post by:
I am re-posting this message after registering my posting alias. When I specify an end tag for the clear element of namespaces in my web.config file, the parser error "Unrecognized element 'add'"...
0
7237
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7137
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7349
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7417
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7074
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
1
5063
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
3219
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
1572
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
445
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.