Re: How can I programmatically validate html ?

On 31 Jul 2008 16:22:51 GMT, "mark4asp" <ma******@gmail.comwrote:

>I am importing text from a column of a database table to display as
part of a web page in asp.net. There are about 7000 rows in the table.

It's hard to programatically validate HTML. You need to use Jade,
because Mere Mortals don't get to go near the code that does it
otherwise. Even that's not easy.

An easier way is to make valid code, then check it, just the once, by
loading it into a browser that supports validation, such as Firefox with
Marc Gueury's plugin. Provided that you code is actualy valid (in not
many attempts) then this is workable.

>About 10% of the columns have their content as html and about 10% of
those columns have badly broken html. When broke it generally uses <tr>
and <tdcontent with no enclosing <table>.

First define what the DB can contain. Match one of the HTML productions,
such as %block;, TR or (TR)+

Your content won't be HTML (according to the DTD) unless it uses <html>
as the one and only root element. It won't do this. It _can't_ do this,
not if you want to join rows together. So if you have to use a fragment,
then make it a well-defined fragment.

If the content is always one thing (e.g. %block;) then that's easy. If
it isn't alwasy the same, then work out what it is. %block; | (TR)+
is quite workable - your current content might even be valid already!
(just not valid HTML...). If you have to work with one of two entities
like this, then it makes it a little hard to assemble on reading it, but
not impossible.

I'd suggest adding another column to indicate just which content model
it follows. Querying that would be easy. If you can't do this, then
find a way to tell what the content model is, such as a regex to look
for bare <trstart tags at the front. This will be slower than
retrieving a value you calculated earlier, but still workable. The
rest is just a coding exercise.
--
Cats have nine lives, which is why they rarely post to Usenet.

Jul 31 '08 #1

Subscribe Post Reply

3292

mark4asp

Andy Dingley wrote:

On 31 Jul 2008 16:22:51 GMT, "mark4asp" <ma******@gmail.comwrote:

I am importing text from a column of a database table to display as
part of a web page in asp.net. There are about 7000 rows in the
table.

It's hard to programatically validate HTML. You need to use Jade,
because Mere Mortals don't get to go near the code that does it
otherwise. Even that's not easy.

An easier way is to make valid code, then check it, just the once, by
loading it into a browser that supports validation, such as Firefox
with Marc Gueury's plugin. Provided that you code is actualy valid
(in not many attempts) then this is workable.

About 10% of the columns have their content as html and about 10% of
those columns have badly broken html. When broke it generally uses
<tr>
and <tdcontent with no enclosing <table>.

First define what the DB can contain. Match one of the HTML
productions, such as %block;, TR or (TR)+

Your content won't be HTML (according to the DTD) unless it uses
<htmlas the one and only root element. It won't do this. It can't
do this, not if you want to join rows together. So if you have to use
a fragment, then make it a well-defined fragment.

If the content is always one thing (e.g. %block;) then that's easy. If
it isn't alwasy the same, then work out what it is. %block; | (TR)+
is quite workable - your current content might even be valid already!
(just not valid HTML...). If you have to work with one of two
entities like this, then it makes it a little hard to assemble on
reading it, but not impossible.

I'd suggest adding another column to indicate just which content model
it follows. Querying that would be easy. If you can't do this, then
find a way to tell what the content model is, such as a regex to look
for bare <trstart tags at the front. This will be slower than
retrieving a value you calculated earlier, but still workable. The
rest is just a coding exercise.

Thanks Andy,

Your post gave me a few ideas to chew over.

It doesn't solve the immediate problem; which may be unsolvable. I'll
write a sql script to fix the very badly broken code.

I already decided to add extra validation for the CMS.

By the way, this code will never validate against a DTD because it's
created by users who don't know the various definitions and are not
very technical. Some of it is copied by them from a public site which
seems to be written by a (broken) robot.

I had hopes once upon a time of having this website produce perfectly
valid xhtml, now I just want to ensure the html is not really badly
broken (i.e. that the layout is not very silly).

Aug 1 '08 #2

by: ALthePal | last post by:

Hi, I'm not sure if we are able to or even how to loop through the web forms in a VB.NET project during design time. In MSAccess we are able to go through the database -> forms collection and...

.NET Framework

Suddenly I can't Validate by File Upload

by: Dean Speir | last post by:

Hi... I've been referred to this Newsgroup by the W3C Markup Validator FAQ. I've been happily using this Validator <http://validator.w3.org> for the past 18 months with great success, but...

HTML / CSS

how to connect a validator programmatically to a bound datagrid

by: Mark Kamoski | last post by:

Everyone-- I have a DataGrid that is bound at run-time. Upon the click of the Add, a new row is added to the top and it has several types of controls, (but, it is not permissible to use the...

ASP.NET

validate aspx page.

by: Mr. x | last post by:

Hello, I know about the validator on : http://validator.w3.org , which can validate html pages. I just new to this validator. How can I validate (if it can be - by this validator) aspx pages,...

ASP.NET

Page.Validate

by: Jim Heavey | last post by:

When should you use the Page.Validate() method? I thought you would use this method if you have some Server side validation (CustomControl's) you wanted to use and this would cause them to be...

ASP.NET

Programmatically Working with Digital Signatures

by: Matt Frame | last post by:

I have a client that has asked us to get a digital signature certificate and start digitally signing all files we pass between each other. I have heard of the subject and know about the certs but...

Visual Basic .NET

How do I programmatically (javascript) check if link is valid in html?

by: Chandra | last post by:

How do I programmatically (javascript) check if link is valid in html?

ASP.NET

Tool to validate HTML code with PHP tags?

by: webrod | last post by:

Hi, I have some php pages with a lot of HTML code. I am looking for a HTML validator tool (like TIDY). TIDY is not good enough with PHP tags (it removes a lot of php code). Do you have any...