472,358 Members | 1,817 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,358 software developers and data experts.

Regex Validator - detect all but certain HTML tags


Hi all... hope someone can help out.

Not a unique situation, but my search for a solution has not yielded
what I need yet.

I'm trying to come up with a regular expression for a
RegularExpressionValidator that will allow certain HTML tags:

<a>, <b>, <blockquote>, <br>, <i>, <img>, <li>, <ol>, <p>, <quote>,
<ul>

but block others. So basically I'd like to detect "<" then look for
certain sequences (the tags above). But I also of course have to
account for any number of possible attributes. And then some of the
tags above have closing tags, others do not.

I don't fully understand regular expressions, yet would like to learn;
however, I also want to find a way to do this soon.

I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

Any help from anyone out there would be greatly appreciate. Thanks
Much!

Barry L. Camp

Jan 22 '07 #1
10 2980
Barry,

When you do not want to invent the wheel, than why do you not use the wheel
that is invented for this?

MSHTML.

http://www.vb-tips.com/dbpages.aspx?...f-56dbb63fdf1c

I hope this helps,

Cor

"Barry L. Camp" <bl****@gmail.comschreef in bericht
news:11*********************@l53g2000cwa.googlegro ups.com...
>
Hi all... hope someone can help out.

Not a unique situation, but my search for a solution has not yielded
what I need yet.

I'm trying to come up with a regular expression for a
RegularExpressionValidator that will allow certain HTML tags:

<a>, <b>, <blockquote>, <br>, <i>, <img>, <li>, <ol>, <p>, <quote>,
<ul>

but block others. So basically I'd like to detect "<" then look for
certain sequences (the tags above). But I also of course have to
account for any number of possible attributes. And then some of the
tags above have closing tags, others do not.

I don't fully understand regular expressions, yet would like to learn;
however, I also want to find a way to do this soon.

I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

Any help from anyone out there would be greatly appreciate. Thanks
Much!

Barry L. Camp

Jan 22 '07 #2

I need to find a regex for Regular Expression Validator to perform what
I am describing.

I'm not interested in parsing a web page. I'm trying to parse the
contents of a textbox - I have a DetailsView control, for which one
textbox is for content that may be displayed in an .aspx page. I want
to allow certain tags, but block all others. The DetailsView is bound
to an ObjectDataSource, so naturally it would be nice to have the
Validator catch everything for me if at all possible.

Any ideas (anyone)?

Thanks,

Barry
Cor Ligthert [MVP] wrote:
Barry,

When you do not want to invent the wheel, than why do you not use the wheel
that is invented for this?

MSHTML.

http://www.vb-tips.com/dbpages.aspx?...f-56dbb63fdf1c

I hope this helps,

Cor

"Barry L. Camp" <bl****@gmail.comschreef in bericht
news:11*********************@l53g2000cwa.googlegro ups.com...

Hi all... hope someone can help out.

Not a unique situation, but my search for a solution has not yielded
what I need yet.

I'm trying to come up with a regular expression for a
RegularExpressionValidator that will allow certain HTML tags:

<a>, <b>, <blockquote>, <br>, <i>, <img>, <li>, <ol>, <p>, <quote>,
<ul>

but block others. So basically I'd like to detect "<" then look for
certain sequences (the tags above). But I also of course have to
account for any number of possible attributes. And then some of the
tags above have closing tags, others do not.

I don't fully understand regular expressions, yet would like to learn;
however, I also want to find a way to do this soon.

I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

Any help from anyone out there would be greatly appreciate. Thanks
Much!

Barry L. Camp
Jan 22 '07 #3
"Barry L. Camp" <bl****@gmail.comwrote in news:1169470557.654043.70840
@a75g2000cwd.googlegroups.com:
I'm not interested in parsing a web page. I'm trying to parse the
contents of a textbox - I have a DetailsView control, for which one
textbox is for content that may be displayed in an .aspx page. I want
to allow certain tags, but block all others. The DetailsView is bound
to an ObjectDataSource, so naturally it would be nice to have the
Validator catch everything for me if at all possible.
You can load the textbox contents into MSHTML.

Finding a regular expression to handle all the cases of HTML will be
challenging to say the least - I suggest you take a look at MSHTML again
and see if it'll work.
Jan 22 '07 #4

That example is not what I am looking for. I'm not trying to grab an
entire web page, or rebuild the content that may be in the textbox. All
I am trying to do is detect whether "forbidden" HTML tags are in the
text, and prevent further processing (or in my case, prevent the user
from saving a record in the DetailsView) until the user has edited the
contents of the textbox such that they are acceptable (i.e. don't have
"forbidden" HTML tags).

I'll grant that finding a suitable regex is not easy. Ideally that
would be the best solution, though, as I have my DetailsView hooked
into an ObjectDataSource, and would like to have the validator catch
everything, in-stream. I don't want to have to instantiate MSHTML just
to parse one single textbox.
Spam Catcher wrote:
"Barry L. Camp" <bl****@gmail.comwrote in news:1169470557.654043.70840
@a75g2000cwd.googlegroups.com:
I'm not interested in parsing a web page. I'm trying to parse the
contents of a textbox - I have a DetailsView control, for which one
textbox is for content that may be displayed in an .aspx page. I want
to allow certain tags, but block all others. The DetailsView is bound
to an ObjectDataSource, so naturally it would be nice to have the
Validator catch everything for me if at all possible.

You can load the textbox contents into MSHTML.

Finding a regular expression to handle all the cases of HTML will be
challenging to say the least - I suggest you take a look at MSHTML again
and see if it'll work.
Jan 22 '07 #5
"Barry L. Camp" <bl****@gmail.comwrote in news:1169492660.650762.3150
@s34g2000cwa.googlegroups.com:
I'll grant that finding a suitable regex is not easy. Ideally that
would be the best solution, though, as I have my DetailsView hooked
into an ObjectDataSource, and would like to have the validator catch
everything, in-stream. I don't want to have to instantiate MSHTML just
to parse one single textbox.
Regular Expressions are not well suited to parse complex XML type documents
due to the nested nature of such documents. There are better tools for the
job.

Perhaps loading the HTML into an XML Doc and using XPath to search for
unwanted tags?

If you're set on using Regular Expressions, take a look at Community
Server's source code. I recall it had a set of regular expressions to parse
out unwanted tags.

http://communityserver.org/
Jan 22 '07 #6
Barry,

Why are you than written this in your starting message?
>I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)
You definitly show that you want to reinvent the wheel that exist already.

Cor

"Barry L. Camp" <bl****@gmail.comschreef in bericht
news:11********************@s34g2000cwa.googlegrou ps.com...
>
That example is not what I am looking for. I'm not trying to grab an
entire web page, or rebuild the content that may be in the textbox. All
I am trying to do is detect whether "forbidden" HTML tags are in the
text, and prevent further processing (or in my case, prevent the user
from saving a record in the DetailsView) until the user has edited the
contents of the textbox such that they are acceptable (i.e. don't have
"forbidden" HTML tags).

I'll grant that finding a suitable regex is not easy. Ideally that
would be the best solution, though, as I have my DetailsView hooked
into an ObjectDataSource, and would like to have the validator catch
everything, in-stream. I don't want to have to instantiate MSHTML just
to parse one single textbox.
Spam Catcher wrote:
>"Barry L. Camp" <bl****@gmail.comwrote in news:1169470557.654043.70840
@a75g2000cwd.googlegroups.com:
I'm not interested in parsing a web page. I'm trying to parse the
contents of a textbox - I have a DetailsView control, for which one
textbox is for content that may be displayed in an .aspx page. I want
to allow certain tags, but block all others. The DetailsView is bound
to an ObjectDataSource, so naturally it would be nice to have the
Validator catch everything for me if at all possible.

You can load the textbox contents into MSHTML.

Finding a regular expression to handle all the cases of HTML will be
challenging to say the least - I suggest you take a look at MSHTML again
and see if it'll work.

Jan 23 '07 #7

Cor Ligthert [MVP] wrote:
Barry,

Why are you than written this in your starting message?
I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

You definitly show that you want to reinvent the wheel that exist already.

Cor
You still don't understand what I am trying to do.

I'm not trying to read HTML documents, XML, XHTML or any *TML.

The best way to explain what I am doing would be to go on any
discussion forum or web-based e-mail. There's a big, huge textbox
(textarea, or whatever). I've got one of these in a .NET 2.0
DetailsView control. I've got it bound to an ObjectDataSource. They're
all in TemplateFields, and I've got Validator controls hooked to all of
the textbox inputs. The smaller ones are tied to regex validators, with
simple expressions to allow text only, because that's all I need.

But this big textbox... I want to accept a small subset of HTML tags.
because this data... bound into a database... I want to mash together
with a MasterPage and some content, and render it back. It's not an
HTML document already. It's just text. I want to take the data and make
it PART of a web page.

Jan 23 '07 #8

Spam Catcher wrote:
"Barry L. Camp" <bl****@gmail.comwrote in news:1169492660.650762.3150
@s34g2000cwa.googlegroups.com:
I'll grant that finding a suitable regex is not easy. Ideally that
would be the best solution, though, as I have my DetailsView hooked
into an ObjectDataSource, and would like to have the validator catch
everything, in-stream. I don't want to have to instantiate MSHTML just
to parse one single textbox.

Regular Expressions are not well suited to parse complex XML type documents
due to the nested nature of such documents. There are better tools for the
job.

Perhaps loading the HTML into an XML Doc and using XPath to search for
unwanted tags?

If you're set on using Regular Expressions, take a look at Community
Server's source code. I recall it had a set of regular expressions to parse
out unwanted tags.

http://communityserver.org/

Well, I'm not exactly working with entire HTML pages. But I think
you've made a great suggestion.

I am building essentially a home-grown CMS, not on the order of a
CommunityServer or DNN, but just something that I can easily add/edit
content later on. It's just to suit what I need. It's also for me to
tinker with, and help educate myself on .NET 2.0, to help prepare for
the new Cert exams. (I've already got 70-431 done).

Looks like I have a lot more reading to do, but that's fine. I
appreciate the idea. Thanks much!

Barry

Jan 23 '07 #9
Barry,

You still did not look at the sample I gave you or what is written here.
The first part of the sample is to get a HTML page. It is mostly impossible
to give a sample withouth that you have underlaying data. The second part is
to show how you can tear a page appart accoording to its tages in different
parts.

However, what has Regex to do with this.
But this big textbox... I want to accept a small subset of HTML tags.
because this data... bound into a database... I want to mash together
with a MasterPage and some content, and render it back. It's not an
HTML document already. It's just text. I want to take the data and make
it PART of a web page.
For that is as well already a wheel. Why do you want so strongly to invent
your own wheels.

Cor

"Barry L. Camp" <bl****@gmail.comschreef in bericht
news:11**********************@38g2000cwa.googlegro ups.com...
>
Cor Ligthert [MVP] wrote:
>Barry,

Why are you than written this in your starting message?
>I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

You definitly show that you want to reinvent the wheel that exist
already.

Cor

You still don't understand what I am trying to do.

I'm not trying to read HTML documents, XML, XHTML or any *TML.

The best way to explain what I am doing would be to go on any
discussion forum or web-based e-mail. There's a big, huge textbox
(textarea, or whatever). I've got one of these in a .NET 2.0
DetailsView control. I've got it bound to an ObjectDataSource. They're
all in TemplateFields, and I've got Validator controls hooked to all of
the textbox inputs. The smaller ones are tied to regex validators, with
simple expressions to allow text only, because that's all I need.

But this big textbox... I want to accept a small subset of HTML tags.
because this data... bound into a database... I want to mash together
with a MasterPage and some content, and render it back. It's not an
HTML document already. It's just text. I want to take the data and make
it PART of a web page.

Jan 23 '07 #10

Cor Ligthert [MVP] wrote:
Barry,

You still did not look at the sample I gave you or what is written here.
Yes, I did. And it has nothing to do with what I am doing.
The first part of the sample is to get a HTML page.
I don't care about that. I've said several times that I am not
interested in parsing an HTML page, so why would I want to even *get*
one?
It is mostly impossible
to give a sample withouth that you have underlaying data. The second part is
to show how you can tear a page appart accoording to its tages in different
parts.
I don't want to get a page, or tear it apart, or look at a page.

As I have said repeatedly, I'm taking input from a textbox, which is
*not* an HTML page, but may contain HTML tags. Parsing an entire HTML
page (which is what I AM NOT DOING) is a totally different concept than
the mere *detection* of a small number of tags in simple text (which is
what I AM DOING).
However, what has Regex to do with this.
Because as I have said repeatedly, I am trying to use the
RegularExpressionValidator to enforce validation rules on my textbox.
Was I not clear enough?

How about this example - perhaps this illustrates it better.

<asp:TemplateField HeaderText="Author">
<!-- Other ItemTemplate tags here. -->
<EditItemTemplate>
<asp:TextBox ID="AuthorTextBox"
runat="server" Text='<%# Bind("Author") %>' />
<asp:RegularExpressionValidator
ID="RegularExpressionValidator1" runat="server"
ControlToValidate="AuthorTextBox"

ValidationExpression="^[\w\s\.\-']{1,128}$" Text="<br />An Author's
name may only contain letters, numbers, spaces, apostrophes, hyphens or
periods." Display="Dynamic" SetFocusOnError="true" />
</EditItemTemplate>
</asp:TemplateField>
But this big textbox... I want to accept a small subset of HTML tags.
because this data... bound into a database... I want to mash together
with a MasterPage and some content, and render it back. It's not an
HTML document already. It's just text. I want to take the data and make
it PART of a web page.

For that is as well already a wheel. Why do you want so strongly to invent
your own wheels.
You know what... forget it. Someone else understood what I was trying
to do, and gave me a helpful suggestion. You seem to be stuck in the
belief that I am working on something completely different from what I
am really doing, and that's not helpful in the slightest.

Thanks for your time anyway.

Jan 23 '07 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: G. Stewart | last post by:
The objective is to extract the first n characters of text from an HTML block. I wish to preserve all HTML (links, formatting etc.), and at the same time, extend the size of the block to ensure...
5
by: Shaun Wilde | last post by:
When using the regular expression validator is there a way of failing a validation when you detect a match - I suppose a sort of anti-match. I want to detect certain things on user input and if...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
13
by: Chris Lieb | last post by:
I am trying to write a regex that will parse BBcode into HTML using JavaScript. Everything was going smoothly using the string class replace() operator with regex's until I got to the list tag....
1
by: jonnyboy6969 | last post by:
Hi All Really hoping someone can help me out here with my deficient regex skills :) I have a function which takes a string of HTML and replaces a term (word or phrase) with a link. The pupose...
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
1
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. header("Location:".$urlback); Is this the right layout the...
2
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it so the python app could use a http request to get...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and...
1
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web server and have made sure to enable curl. I get a...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...
0
by: Ricardo de Mila | last post by:
Dear people, good afternoon... I have a form in msAccess with lots of controls and a specific routine must be triggered if the mouse_down event happens in any control. Than I need to discover what...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.