473,378 Members | 1,139 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Re: How can I programmatically validate html ?

On 31 Jul 2008 16:22:51 GMT, "mark4asp" <ma******@gmail.comwrote:
>I am importing text from a column of a database table to display as
part of a web page in asp.net. There are about 7000 rows in the table.
It's hard to programatically validate HTML. You need to use Jade,
because Mere Mortals don't get to go near the code that does it
otherwise. Even that's not easy.

An easier way is to make valid code, then check it, just the once, by
loading it into a browser that supports validation, such as Firefox with
Marc Gueury's plugin. Provided that you code is actualy valid (in not
many attempts) then this is workable.

>About 10% of the columns have their content as html and about 10% of
those columns have badly broken html. When broke it generally uses <tr>
and <tdcontent with no enclosing <table>.
First define what the DB can contain. Match one of the HTML productions,
such as %block;, TR or (TR)+

Your content won't be HTML (according to the DTD) unless it uses <html>
as the one and only root element. It won't do this. It _can't_ do this,
not if you want to join rows together. So if you have to use a fragment,
then make it a well-defined fragment.

If the content is always one thing (e.g. %block;) then that's easy. If
it isn't alwasy the same, then work out what it is. %block; | (TR)+
is quite workable - your current content might even be valid already!
(just not valid HTML...). If you have to work with one of two entities
like this, then it makes it a little hard to assemble on reading it, but
not impossible.

I'd suggest adding another column to indicate just which content model
it follows. Querying that would be easy. If you can't do this, then
find a way to tell what the content model is, such as a regex to look
for bare <trstart tags at the front. This will be slower than
retrieving a value you calculated earlier, but still workable. The
rest is just a coding exercise.
--
Cats have nine lives, which is why they rarely post to Usenet.
Jul 31 '08 #1
1 3288
Andy Dingley wrote:
On 31 Jul 2008 16:22:51 GMT, "mark4asp" <ma******@gmail.comwrote:
I am importing text from a column of a database table to display as
part of a web page in asp.net. There are about 7000 rows in the
table.

It's hard to programatically validate HTML. You need to use Jade,
because Mere Mortals don't get to go near the code that does it
otherwise. Even that's not easy.

An easier way is to make valid code, then check it, just the once, by
loading it into a browser that supports validation, such as Firefox
with Marc Gueury's plugin. Provided that you code is actualy valid
(in not many attempts) then this is workable.

About 10% of the columns have their content as html and about 10% of
those columns have badly broken html. When broke it generally uses
<tr>
and <tdcontent with no enclosing <table>.

First define what the DB can contain. Match one of the HTML
productions, such as %block;, TR or (TR)+

Your content won't be HTML (according to the DTD) unless it uses
<htmlas the one and only root element. It won't do this. It can't
do this, not if you want to join rows together. So if you have to use
a fragment, then make it a well-defined fragment.

If the content is always one thing (e.g. %block;) then that's easy. If
it isn't alwasy the same, then work out what it is. %block; | (TR)+
is quite workable - your current content might even be valid already!
(just not valid HTML...). If you have to work with one of two
entities like this, then it makes it a little hard to assemble on
reading it, but not impossible.

I'd suggest adding another column to indicate just which content model
it follows. Querying that would be easy. If you can't do this, then
find a way to tell what the content model is, such as a regex to look
for bare <trstart tags at the front. This will be slower than
retrieving a value you calculated earlier, but still workable. The
rest is just a coding exercise.

Thanks Andy,

Your post gave me a few ideas to chew over.

It doesn't solve the immediate problem; which may be unsolvable. I'll
write a sql script to fix the very badly broken code.

I already decided to add extra validation for the CMS.

By the way, this code will never validate against a DTD because it's
created by users who don't know the various definitions and are not
very technical. Some of it is copied by them from a public site which
seems to be written by a (broken) robot.

I had hopes once upon a time of having this website produce perfectly
valid xhtml, now I just want to ensure the html is not really badly
broken (i.e. that the layout is not very silly).

Aug 1 '08 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: ALthePal | last post by:
Hi, I'm not sure if we are able to or even how to loop through the web forms in a VB.NET project during design time. In MSAccess we are able to go through the database -> forms collection and...
8
by: Dean Speir | last post by:
Hi... I've been referred to this Newsgroup by the W3C Markup Validator FAQ. I've been happily using this Validator <http://validator.w3.org> for the past 18 months with great success, but...
0
by: Mark Kamoski | last post by:
Everyone-- I have a DataGrid that is bound at run-time. Upon the click of the Add, a new row is added to the top and it has several types of controls, (but, it is not permissible to use the...
4
by: Mr. x | last post by:
Hello, I know about the validator on : http://validator.w3.org , which can validate html pages. I just new to this validator. How can I validate (if it can be - by this validator) aspx pages,...
5
by: Jim Heavey | last post by:
When should you use the Page.Validate() method? I thought you would use this method if you have some Server side validation (CustomControl's) you wanted to use and this would cause them to be...
6
by: Matt Frame | last post by:
I have a client that has asked us to get a digital signature certificate and start digitally signing all files we pass between each other. I have heard of the subject and know about the certs but...
8
by: Chandra | last post by:
How do I programmatically (javascript) check if link is valid in html?
26
by: webrod | last post by:
Hi, I have some php pages with a lot of HTML code. I am looking for a HTML validator tool (like TIDY). TIDY is not good enough with PHP tags (it removes a lot of php code). Do you have any...
0
by: Lars Eighner | last post by:
In our last episode, <004f629c$0$10265$c3e8da3@news.astraweb.com>, the lovely and talented mark4asp broadcast on comp.infosystems.www.authoring.html: I have not done this, not even on...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.