By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,606 Members | 2,016 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,606 IT Pros & Developers. It's quick & easy.

Regex Validator - detect all but certain HTML tags

P: n/a

Hi all... hope someone can help out.

Not a unique situation, but my search for a solution has not yielded
what I need yet.

I'm trying to come up with a regular expression for a
RegularExpressionValidator that will allow certain HTML tags:

<a>, <b>, <blockquote>, <br>, <i>, <img>, <li>, <ol>, <p>, <quote>,
<ul>

but block others. So basically I'd like to detect "<" then look for
certain sequences (the tags above). But I also of course have to
account for any number of possible attributes. And then some of the
tags above have closing tags, others do not.

I don't fully understand regular expressions, yet would like to learn;
however, I also want to find a way to do this soon.

I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

Any help from anyone out there would be greatly appreciate. Thanks
Much!

Barry L. Camp

Jan 22 '07 #1
Share this Question
Share on Google+
10 Replies


P: n/a
Barry,

When you do not want to invent the wheel, than why do you not use the wheel
that is invented for this?

MSHTML.

http://www.vb-tips.com/dbpages.aspx?...f-56dbb63fdf1c

I hope this helps,

Cor

"Barry L. Camp" <bl****@gmail.comschreef in bericht
news:11*********************@l53g2000cwa.googlegro ups.com...
>
Hi all... hope someone can help out.

Not a unique situation, but my search for a solution has not yielded
what I need yet.

I'm trying to come up with a regular expression for a
RegularExpressionValidator that will allow certain HTML tags:

<a>, <b>, <blockquote>, <br>, <i>, <img>, <li>, <ol>, <p>, <quote>,
<ul>

but block others. So basically I'd like to detect "<" then look for
certain sequences (the tags above). But I also of course have to
account for any number of possible attributes. And then some of the
tags above have closing tags, others do not.

I don't fully understand regular expressions, yet would like to learn;
however, I also want to find a way to do this soon.

I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

Any help from anyone out there would be greatly appreciate. Thanks
Much!

Barry L. Camp

Jan 22 '07 #2

P: n/a

I need to find a regex for Regular Expression Validator to perform what
I am describing.

I'm not interested in parsing a web page. I'm trying to parse the
contents of a textbox - I have a DetailsView control, for which one
textbox is for content that may be displayed in an .aspx page. I want
to allow certain tags, but block all others. The DetailsView is bound
to an ObjectDataSource, so naturally it would be nice to have the
Validator catch everything for me if at all possible.

Any ideas (anyone)?

Thanks,

Barry
Cor Ligthert [MVP] wrote:
Barry,

When you do not want to invent the wheel, than why do you not use the wheel
that is invented for this?

MSHTML.

http://www.vb-tips.com/dbpages.aspx?...f-56dbb63fdf1c

I hope this helps,

Cor

"Barry L. Camp" <bl****@gmail.comschreef in bericht
news:11*********************@l53g2000cwa.googlegro ups.com...

Hi all... hope someone can help out.

Not a unique situation, but my search for a solution has not yielded
what I need yet.

I'm trying to come up with a regular expression for a
RegularExpressionValidator that will allow certain HTML tags:

<a>, <b>, <blockquote>, <br>, <i>, <img>, <li>, <ol>, <p>, <quote>,
<ul>

but block others. So basically I'd like to detect "<" then look for
certain sequences (the tags above). But I also of course have to
account for any number of possible attributes. And then some of the
tags above have closing tags, others do not.

I don't fully understand regular expressions, yet would like to learn;
however, I also want to find a way to do this soon.

I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

Any help from anyone out there would be greatly appreciate. Thanks
Much!

Barry L. Camp
Jan 22 '07 #3

P: n/a
"Barry L. Camp" <bl****@gmail.comwrote in news:1169470557.654043.70840
@a75g2000cwd.googlegroups.com:
I'm not interested in parsing a web page. I'm trying to parse the
contents of a textbox - I have a DetailsView control, for which one
textbox is for content that may be displayed in an .aspx page. I want
to allow certain tags, but block all others. The DetailsView is bound
to an ObjectDataSource, so naturally it would be nice to have the
Validator catch everything for me if at all possible.
You can load the textbox contents into MSHTML.

Finding a regular expression to handle all the cases of HTML will be
challenging to say the least - I suggest you take a look at MSHTML again
and see if it'll work.
Jan 22 '07 #4

P: n/a

That example is not what I am looking for. I'm not trying to grab an
entire web page, or rebuild the content that may be in the textbox. All
I am trying to do is detect whether "forbidden" HTML tags are in the
text, and prevent further processing (or in my case, prevent the user
from saving a record in the DetailsView) until the user has edited the
contents of the textbox such that they are acceptable (i.e. don't have
"forbidden" HTML tags).

I'll grant that finding a suitable regex is not easy. Ideally that
would be the best solution, though, as I have my DetailsView hooked
into an ObjectDataSource, and would like to have the validator catch
everything, in-stream. I don't want to have to instantiate MSHTML just
to parse one single textbox.
Spam Catcher wrote:
"Barry L. Camp" <bl****@gmail.comwrote in news:1169470557.654043.70840
@a75g2000cwd.googlegroups.com:
I'm not interested in parsing a web page. I'm trying to parse the
contents of a textbox - I have a DetailsView control, for which one
textbox is for content that may be displayed in an .aspx page. I want
to allow certain tags, but block all others. The DetailsView is bound
to an ObjectDataSource, so naturally it would be nice to have the
Validator catch everything for me if at all possible.

You can load the textbox contents into MSHTML.

Finding a regular expression to handle all the cases of HTML will be
challenging to say the least - I suggest you take a look at MSHTML again
and see if it'll work.
Jan 22 '07 #5

P: n/a
"Barry L. Camp" <bl****@gmail.comwrote in news:1169492660.650762.3150
@s34g2000cwa.googlegroups.com:
I'll grant that finding a suitable regex is not easy. Ideally that
would be the best solution, though, as I have my DetailsView hooked
into an ObjectDataSource, and would like to have the validator catch
everything, in-stream. I don't want to have to instantiate MSHTML just
to parse one single textbox.
Regular Expressions are not well suited to parse complex XML type documents
due to the nested nature of such documents. There are better tools for the
job.

Perhaps loading the HTML into an XML Doc and using XPath to search for
unwanted tags?

If you're set on using Regular Expressions, take a look at Community
Server's source code. I recall it had a set of regular expressions to parse
out unwanted tags.

http://communityserver.org/
Jan 22 '07 #6

P: n/a
Barry,

Why are you than written this in your starting message?
>I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)
You definitly show that you want to reinvent the wheel that exist already.

Cor

"Barry L. Camp" <bl****@gmail.comschreef in bericht
news:11********************@s34g2000cwa.googlegrou ps.com...
>
That example is not what I am looking for. I'm not trying to grab an
entire web page, or rebuild the content that may be in the textbox. All
I am trying to do is detect whether "forbidden" HTML tags are in the
text, and prevent further processing (or in my case, prevent the user
from saving a record in the DetailsView) until the user has edited the
contents of the textbox such that they are acceptable (i.e. don't have
"forbidden" HTML tags).

I'll grant that finding a suitable regex is not easy. Ideally that
would be the best solution, though, as I have my DetailsView hooked
into an ObjectDataSource, and would like to have the validator catch
everything, in-stream. I don't want to have to instantiate MSHTML just
to parse one single textbox.
Spam Catcher wrote:
>"Barry L. Camp" <bl****@gmail.comwrote in news:1169470557.654043.70840
@a75g2000cwd.googlegroups.com:
I'm not interested in parsing a web page. I'm trying to parse the
contents of a textbox - I have a DetailsView control, for which one
textbox is for content that may be displayed in an .aspx page. I want
to allow certain tags, but block all others. The DetailsView is bound
to an ObjectDataSource, so naturally it would be nice to have the
Validator catch everything for me if at all possible.

You can load the textbox contents into MSHTML.

Finding a regular expression to handle all the cases of HTML will be
challenging to say the least - I suggest you take a look at MSHTML again
and see if it'll work.

Jan 23 '07 #7

P: n/a

Cor Ligthert [MVP] wrote:
Barry,

Why are you than written this in your starting message?
I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

You definitly show that you want to reinvent the wheel that exist already.

Cor
You still don't understand what I am trying to do.

I'm not trying to read HTML documents, XML, XHTML or any *TML.

The best way to explain what I am doing would be to go on any
discussion forum or web-based e-mail. There's a big, huge textbox
(textarea, or whatever). I've got one of these in a .NET 2.0
DetailsView control. I've got it bound to an ObjectDataSource. They're
all in TemplateFields, and I've got Validator controls hooked to all of
the textbox inputs. The smaller ones are tied to regex validators, with
simple expressions to allow text only, because that's all I need.

But this big textbox... I want to accept a small subset of HTML tags.
because this data... bound into a database... I want to mash together
with a MasterPage and some content, and render it back. It's not an
HTML document already. It's just text. I want to take the data and make
it PART of a web page.

Jan 23 '07 #8

P: n/a

Spam Catcher wrote:
"Barry L. Camp" <bl****@gmail.comwrote in news:1169492660.650762.3150
@s34g2000cwa.googlegroups.com:
I'll grant that finding a suitable regex is not easy. Ideally that
would be the best solution, though, as I have my DetailsView hooked
into an ObjectDataSource, and would like to have the validator catch
everything, in-stream. I don't want to have to instantiate MSHTML just
to parse one single textbox.

Regular Expressions are not well suited to parse complex XML type documents
due to the nested nature of such documents. There are better tools for the
job.

Perhaps loading the HTML into an XML Doc and using XPath to search for
unwanted tags?

If you're set on using Regular Expressions, take a look at Community
Server's source code. I recall it had a set of regular expressions to parse
out unwanted tags.

http://communityserver.org/

Well, I'm not exactly working with entire HTML pages. But I think
you've made a great suggestion.

I am building essentially a home-grown CMS, not on the order of a
CommunityServer or DNN, but just something that I can easily add/edit
content later on. It's just to suit what I need. It's also for me to
tinker with, and help educate myself on .NET 2.0, to help prepare for
the new Cert exams. (I've already got 70-431 done).

Looks like I have a lot more reading to do, but that's fine. I
appreciate the idea. Thanks much!

Barry

Jan 23 '07 #9

P: n/a
Barry,

You still did not look at the sample I gave you or what is written here.
The first part of the sample is to get a HTML page. It is mostly impossible
to give a sample withouth that you have underlaying data. The second part is
to show how you can tear a page appart accoording to its tages in different
parts.

However, what has Regex to do with this.
But this big textbox... I want to accept a small subset of HTML tags.
because this data... bound into a database... I want to mash together
with a MasterPage and some content, and render it back. It's not an
HTML document already. It's just text. I want to take the data and make
it PART of a web page.
For that is as well already a wheel. Why do you want so strongly to invent
your own wheels.

Cor

"Barry L. Camp" <bl****@gmail.comschreef in bericht
news:11**********************@38g2000cwa.googlegro ups.com...
>
Cor Ligthert [MVP] wrote:
>Barry,

Why are you than written this in your starting message?
>I don't want to reinvent this particular "wheel" if it has already been
done before, if you know what I mean. :)

You definitly show that you want to reinvent the wheel that exist
already.

Cor

You still don't understand what I am trying to do.

I'm not trying to read HTML documents, XML, XHTML or any *TML.

The best way to explain what I am doing would be to go on any
discussion forum or web-based e-mail. There's a big, huge textbox
(textarea, or whatever). I've got one of these in a .NET 2.0
DetailsView control. I've got it bound to an ObjectDataSource. They're
all in TemplateFields, and I've got Validator controls hooked to all of
the textbox inputs. The smaller ones are tied to regex validators, with
simple expressions to allow text only, because that's all I need.

But this big textbox... I want to accept a small subset of HTML tags.
because this data... bound into a database... I want to mash together
with a MasterPage and some content, and render it back. It's not an
HTML document already. It's just text. I want to take the data and make
it PART of a web page.

Jan 23 '07 #10

P: n/a

Cor Ligthert [MVP] wrote:
Barry,

You still did not look at the sample I gave you or what is written here.
Yes, I did. And it has nothing to do with what I am doing.
The first part of the sample is to get a HTML page.
I don't care about that. I've said several times that I am not
interested in parsing an HTML page, so why would I want to even *get*
one?
It is mostly impossible
to give a sample withouth that you have underlaying data. The second part is
to show how you can tear a page appart accoording to its tages in different
parts.
I don't want to get a page, or tear it apart, or look at a page.

As I have said repeatedly, I'm taking input from a textbox, which is
*not* an HTML page, but may contain HTML tags. Parsing an entire HTML
page (which is what I AM NOT DOING) is a totally different concept than
the mere *detection* of a small number of tags in simple text (which is
what I AM DOING).
However, what has Regex to do with this.
Because as I have said repeatedly, I am trying to use the
RegularExpressionValidator to enforce validation rules on my textbox.
Was I not clear enough?

How about this example - perhaps this illustrates it better.

<asp:TemplateField HeaderText="Author">
<!-- Other ItemTemplate tags here. -->
<EditItemTemplate>
<asp:TextBox ID="AuthorTextBox"
runat="server" Text='<%# Bind("Author") %>' />
<asp:RegularExpressionValidator
ID="RegularExpressionValidator1" runat="server"
ControlToValidate="AuthorTextBox"

ValidationExpression="^[\w\s\.\-']{1,128}$" Text="<br />An Author's
name may only contain letters, numbers, spaces, apostrophes, hyphens or
periods." Display="Dynamic" SetFocusOnError="true" />
</EditItemTemplate>
</asp:TemplateField>
But this big textbox... I want to accept a small subset of HTML tags.
because this data... bound into a database... I want to mash together
with a MasterPage and some content, and render it back. It's not an
HTML document already. It's just text. I want to take the data and make
it PART of a web page.

For that is as well already a wheel. Why do you want so strongly to invent
your own wheels.
You know what... forget it. Someone else understood what I was trying
to do, and gave me a helpful suggestion. You seem to be stuck in the
belief that I am working on something completely different from what I
am really doing, and that's not helpful in the slightest.

Thanks for your time anyway.

Jan 23 '07 #11

This discussion thread is closed

Replies have been disabled for this discussion.