473,657 Members | 2,439 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Regex Validator - detect all but certain HTML tags


Hi all... hope someone can help out.

Not a unique situation, but my search for a solution has not yielded
what I need yet.

I'm trying to come up with a regular expression for a
RegularExpressi onValidator that will allow certain HTML tags:

<a>, <b>, <blockquote>, <br>, <i>, <img>, <li>, <ol>, <p>, <quote>,
<ul>

but block others. So basically I'd like to detect "<" then look for
certain sequences (the tags above). But I also of course have to
account for any number of possible attributes. And then some of the
tags above have closing tags, others do not.

I don't fully understand regular expressions, yet would like to learn;
however, I also want to find a way to do this soon.

I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

Any help from anyone out there would be greatly appreciate. Thanks
Much!

Barry L. Camp

Jan 22 '07 #1
10 3100
Barry,

When you do not want to invent the wheel, than why do you not use the wheel
that is invented for this?

MSHTML.

http://www.vb-tips.com/dbpages.aspx?...f-56dbb63fdf1c

I hope this helps,

Cor

"Barry L. Camp" <bl****@gmail.c omschreef in bericht
news:11******** *************@l 53g2000cwa.goog legroups.com...
>
Hi all... hope someone can help out.

Not a unique situation, but my search for a solution has not yielded
what I need yet.

I'm trying to come up with a regular expression for a
RegularExpressi onValidator that will allow certain HTML tags:

<a>, <b>, <blockquote>, <br>, <i>, <img>, <li>, <ol>, <p>, <quote>,
<ul>

but block others. So basically I'd like to detect "<" then look for
certain sequences (the tags above). But I also of course have to
account for any number of possible attributes. And then some of the
tags above have closing tags, others do not.

I don't fully understand regular expressions, yet would like to learn;
however, I also want to find a way to do this soon.

I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

Any help from anyone out there would be greatly appreciate. Thanks
Much!

Barry L. Camp

Jan 22 '07 #2

I need to find a regex for Regular Expression Validator to perform what
I am describing.

I'm not interested in parsing a web page. I'm trying to parse the
contents of a textbox - I have a DetailsView control, for which one
textbox is for content that may be displayed in an .aspx page. I want
to allow certain tags, but block all others. The DetailsView is bound
to an ObjectDataSourc e, so naturally it would be nice to have the
Validator catch everything for me if at all possible.

Any ideas (anyone)?

Thanks,

Barry
Cor Ligthert [MVP] wrote:
Barry,

When you do not want to invent the wheel, than why do you not use the wheel
that is invented for this?

MSHTML.

http://www.vb-tips.com/dbpages.aspx?...f-56dbb63fdf1c

I hope this helps,

Cor

"Barry L. Camp" <bl****@gmail.c omschreef in bericht
news:11******** *************@l 53g2000cwa.goog legroups.com...

Hi all... hope someone can help out.

Not a unique situation, but my search for a solution has not yielded
what I need yet.

I'm trying to come up with a regular expression for a
RegularExpressi onValidator that will allow certain HTML tags:

<a>, <b>, <blockquote>, <br>, <i>, <img>, <li>, <ol>, <p>, <quote>,
<ul>

but block others. So basically I'd like to detect "<" then look for
certain sequences (the tags above). But I also of course have to
account for any number of possible attributes. And then some of the
tags above have closing tags, others do not.

I don't fully understand regular expressions, yet would like to learn;
however, I also want to find a way to do this soon.

I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

Any help from anyone out there would be greatly appreciate. Thanks
Much!

Barry L. Camp
Jan 22 '07 #3
"Barry L. Camp" <bl****@gmail.c omwrote in news:1169470557 .654043.70840
@a75g2000cwd.go oglegroups.com:
I'm not interested in parsing a web page. I'm trying to parse the
contents of a textbox - I have a DetailsView control, for which one
textbox is for content that may be displayed in an .aspx page. I want
to allow certain tags, but block all others. The DetailsView is bound
to an ObjectDataSourc e, so naturally it would be nice to have the
Validator catch everything for me if at all possible.
You can load the textbox contents into MSHTML.

Finding a regular expression to handle all the cases of HTML will be
challenging to say the least - I suggest you take a look at MSHTML again
and see if it'll work.
Jan 22 '07 #4

That example is not what I am looking for. I'm not trying to grab an
entire web page, or rebuild the content that may be in the textbox. All
I am trying to do is detect whether "forbidden" HTML tags are in the
text, and prevent further processing (or in my case, prevent the user
from saving a record in the DetailsView) until the user has edited the
contents of the textbox such that they are acceptable (i.e. don't have
"forbidden" HTML tags).

I'll grant that finding a suitable regex is not easy. Ideally that
would be the best solution, though, as I have my DetailsView hooked
into an ObjectDataSourc e, and would like to have the validator catch
everything, in-stream. I don't want to have to instantiate MSHTML just
to parse one single textbox.
Spam Catcher wrote:
"Barry L. Camp" <bl****@gmail.c omwrote in news:1169470557 .654043.70840
@a75g2000cwd.go oglegroups.com:
I'm not interested in parsing a web page. I'm trying to parse the
contents of a textbox - I have a DetailsView control, for which one
textbox is for content that may be displayed in an .aspx page. I want
to allow certain tags, but block all others. The DetailsView is bound
to an ObjectDataSourc e, so naturally it would be nice to have the
Validator catch everything for me if at all possible.

You can load the textbox contents into MSHTML.

Finding a regular expression to handle all the cases of HTML will be
challenging to say the least - I suggest you take a look at MSHTML again
and see if it'll work.
Jan 22 '07 #5
"Barry L. Camp" <bl****@gmail.c omwrote in news:1169492660 .650762.3150
@s34g2000cwa.go oglegroups.com:
I'll grant that finding a suitable regex is not easy. Ideally that
would be the best solution, though, as I have my DetailsView hooked
into an ObjectDataSourc e, and would like to have the validator catch
everything, in-stream. I don't want to have to instantiate MSHTML just
to parse one single textbox.
Regular Expressions are not well suited to parse complex XML type documents
due to the nested nature of such documents. There are better tools for the
job.

Perhaps loading the HTML into an XML Doc and using XPath to search for
unwanted tags?

If you're set on using Regular Expressions, take a look at Community
Server's source code. I recall it had a set of regular expressions to parse
out unwanted tags.

http://communityserver.org/
Jan 22 '07 #6
Barry,

Why are you than written this in your starting message?
>I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)
You definitly show that you want to reinvent the wheel that exist already.

Cor

"Barry L. Camp" <bl****@gmail.c omschreef in bericht
news:11******** ************@s3 4g2000cwa.googl egroups.com...
>
That example is not what I am looking for. I'm not trying to grab an
entire web page, or rebuild the content that may be in the textbox. All
I am trying to do is detect whether "forbidden" HTML tags are in the
text, and prevent further processing (or in my case, prevent the user
from saving a record in the DetailsView) until the user has edited the
contents of the textbox such that they are acceptable (i.e. don't have
"forbidden" HTML tags).

I'll grant that finding a suitable regex is not easy. Ideally that
would be the best solution, though, as I have my DetailsView hooked
into an ObjectDataSourc e, and would like to have the validator catch
everything, in-stream. I don't want to have to instantiate MSHTML just
to parse one single textbox.
Spam Catcher wrote:
>"Barry L. Camp" <bl****@gmail.c omwrote in news:1169470557 .654043.70840
@a75g2000cwd.g ooglegroups.com :
I'm not interested in parsing a web page. I'm trying to parse the
contents of a textbox - I have a DetailsView control, for which one
textbox is for content that may be displayed in an .aspx page. I want
to allow certain tags, but block all others. The DetailsView is bound
to an ObjectDataSourc e, so naturally it would be nice to have the
Validator catch everything for me if at all possible.

You can load the textbox contents into MSHTML.

Finding a regular expression to handle all the cases of HTML will be
challenging to say the least - I suggest you take a look at MSHTML again
and see if it'll work.

Jan 23 '07 #7

Cor Ligthert [MVP] wrote:
Barry,

Why are you than written this in your starting message?
I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

You definitly show that you want to reinvent the wheel that exist already.

Cor
You still don't understand what I am trying to do.

I'm not trying to read HTML documents, XML, XHTML or any *TML.

The best way to explain what I am doing would be to go on any
discussion forum or web-based e-mail. There's a big, huge textbox
(textarea, or whatever). I've got one of these in a .NET 2.0
DetailsView control. I've got it bound to an ObjectDataSourc e. They're
all in TemplateFields, and I've got Validator controls hooked to all of
the textbox inputs. The smaller ones are tied to regex validators, with
simple expressions to allow text only, because that's all I need.

But this big textbox... I want to accept a small subset of HTML tags.
because this data... bound into a database... I want to mash together
with a MasterPage and some content, and render it back. It's not an
HTML document already. It's just text. I want to take the data and make
it PART of a web page.

Jan 23 '07 #8

Spam Catcher wrote:
"Barry L. Camp" <bl****@gmail.c omwrote in news:1169492660 .650762.3150
@s34g2000cwa.go oglegroups.com:
I'll grant that finding a suitable regex is not easy. Ideally that
would be the best solution, though, as I have my DetailsView hooked
into an ObjectDataSourc e, and would like to have the validator catch
everything, in-stream. I don't want to have to instantiate MSHTML just
to parse one single textbox.

Regular Expressions are not well suited to parse complex XML type documents
due to the nested nature of such documents. There are better tools for the
job.

Perhaps loading the HTML into an XML Doc and using XPath to search for
unwanted tags?

If you're set on using Regular Expressions, take a look at Community
Server's source code. I recall it had a set of regular expressions to parse
out unwanted tags.

http://communityserver.org/

Well, I'm not exactly working with entire HTML pages. But I think
you've made a great suggestion.

I am building essentially a home-grown CMS, not on the order of a
CommunityServer or DNN, but just something that I can easily add/edit
content later on. It's just to suit what I need. It's also for me to
tinker with, and help educate myself on .NET 2.0, to help prepare for
the new Cert exams. (I've already got 70-431 done).

Looks like I have a lot more reading to do, but that's fine. I
appreciate the idea. Thanks much!

Barry

Jan 23 '07 #9
Barry,

You still did not look at the sample I gave you or what is written here.
The first part of the sample is to get a HTML page. It is mostly impossible
to give a sample withouth that you have underlaying data. The second part is
to show how you can tear a page appart accoording to its tages in different
parts.

However, what has Regex to do with this.
But this big textbox... I want to accept a small subset of HTML tags.
because this data... bound into a database... I want to mash together
with a MasterPage and some content, and render it back. It's not an
HTML document already. It's just text. I want to take the data and make
it PART of a web page.
For that is as well already a wheel. Why do you want so strongly to invent
your own wheels.

Cor

"Barry L. Camp" <bl****@gmail.c omschreef in bericht
news:11******** **************@ 38g2000cwa.goog legroups.com...
>
Cor Ligthert [MVP] wrote:
>Barry,

Why are you than written this in your starting message?
>I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

You definitly show that you want to reinvent the wheel that exist
already.

Cor

You still don't understand what I am trying to do.

I'm not trying to read HTML documents, XML, XHTML or any *TML.

The best way to explain what I am doing would be to go on any
discussion forum or web-based e-mail. There's a big, huge textbox
(textarea, or whatever). I've got one of these in a .NET 2.0
DetailsView control. I've got it bound to an ObjectDataSourc e. They're
all in TemplateFields, and I've got Validator controls hooked to all of
the textbox inputs. The smaller ones are tied to regex validators, with
simple expressions to allow text only, because that's all I need.

But this big textbox... I want to accept a small subset of HTML tags.
because this data... bound into a database... I want to mash together
with a MasterPage and some content, and render it back. It's not an
HTML document already. It's just text. I want to take the data and make
it PART of a web page.

Jan 23 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
1486
by: G. Stewart | last post by:
The objective is to extract the first n characters of text from an HTML block. I wish to preserve all HTML (links, formatting etc.), and at the same time, extend the size of the block to ensure that all closing tags are recovered. For example, simply extracting the first 400 characters of a HTML block may result in an <i> opening tag being including, but its closing tag being excluding. Or a link may get chopped halfway - may be the...
5
318
by: Shaun Wilde | last post by:
When using the regular expression validator is there a way of failing a validation when you detect a match - I suppose a sort of anti-match. I want to detect certain things on user input and if they exist then fail client-side validation. An example would be detecting html input, a simple test of which would be to look for '<', but I would like to fail if this sequence is detected anywhere in a multiline string.
17
3961
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher http://forta.com/books/0672325667/
13
2361
by: Chris Lieb | last post by:
I am trying to write a regex that will parse BBcode into HTML using JavaScript. Everything was going smoothly using the string class replace() operator with regex's until I got to the list tag. Implementing the list tag itself was fairly easy. What was not was trying to handle the list items. For some reason, in BBcode, they didn't bother defining an end tag for a list item. I guess that they designed it with bad old HTML 3.2 in mind...
1
12176
by: jonnyboy6969 | last post by:
Hi All Really hoping someone can help me out here with my deficient regex skills :) I have a function which takes a string of HTML and replaces a term (word or phrase) with a link. The pupose is that I seek out terms which are in a glossary on our site, and automatically link to this definition. Its slightly complex becase certain elements have to be ignored, for exampleI dont want to add links within existing links, or for example link...
0
8402
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8315
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8829
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
8508
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7341
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5633
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4164
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4323
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
1627
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.