>I am importing text from a column of a database table to display asIt's hard to programatically validate HTML. You need to use Jade,
part of a web page in asp.net. There are about 7000 rows in the table.
because Mere Mortals don't get to go near the code that does it
otherwise. Even that's not easy.
An easier way is to make valid code, then check it, just the once, by
loading it into a browser that supports validation, such as Firefox with
Marc Gueury's plugin. Provided that you code is actualy valid (in not
many attempts) then this is workable.
>About 10% of the columns have their content as html and about 10% ofFirst define what the DB can contain. Match one of the HTML productions,
those columns have badly broken html. When broke it generally uses <tr>
and <tdcontent with no enclosing <table>.
such as %block;, TR or (TR)+
Your content won't be HTML (according to the DTD) unless it uses <html>
as the one and only root element. It won't do this. It _can't_ do this,
not if you want to join rows together. So if you have to use a fragment,
then make it a well-defined fragment.
If the content is always one thing (e.g. %block;) then that's easy. If
it isn't alwasy the same, then work out what it is. %block; | (TR)+
is quite workable - your current content might even be valid already!
(just not valid HTML...). If you have to work with one of two entities
like this, then it makes it a little hard to assemble on reading it, but
not impossible.
I'd suggest adding another column to indicate just which content model
it follows. Querying that would be easy. If you can't do this, then
find a way to tell what the content model is, such as a regex to look
for bare <trstart tags at the front. This will be slower than
retrieving a value you calculated earlier, but still workable. The
rest is just a coding exercise.
--
Cats have nine lives, which is why they rarely post to Usenet.