473,388 Members | 1,209 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

Beta testers for an XHTML syntax checker sought


Hi all,

I have written an XHTML syntax checker, called 'Tidybot'. It is
built on top of the well-known "HTML Tidy" library.

I wrote it because I needed some very specific functionality I
couldn't easily find elsewhere. Now that it exists, I'd like to
make it available to the world, just in case others find it
useful, too.
What it does:

- Traverses one or more source directories on your hard disk
recursively, and runs all .html/.htm files it finds through
TidyLib, collecting all warnings and errors it encounters
and presenting them nicely in an XHTML report.

- You can specify files/directories to exclude from the checks,
you can specify warnings/errors to suppress in the generated
report, and you can specify 'key:value' options to pass
directly to the underlying Tidy engine. You can also tell the
generated report to use a different CSS stylesheet if you want
it to have your own look & feel.

- Comes in both a command-line version (for easy automated
scheduling) and a (functionally equivalent, but more
user-friendly) GUI version.

- Is cross-platform, running on both Unix/Linux and MS Windows
(and I daresay it will run on MacOS as well -- certainly the
command-line version should -- but I haven't been able to test
that). A one-file Installer application is available for
Windows. (On Unix, you will also need to install a number of
prerequisites.)
What it (by design) doesn't do:

- No conversion or editing of files -- it just checks files,
helping you to *keep* things tidy, rather than tidying them for
you.

- Doesn't get pages from a web server -- only static pages
available on the local file system are supported.
What I am looking for:

- People willing to give Tidybot 1.5b2 (the current beta version)
a run on their system, and then send me test reports and
feedback as detailed as they have the time and inclination for.

To clarify: Tidybot may have a rather limited functionality (when
compared to what Tidy is capable of) but it is not a quick hack,
and before I officially release it to the world I really want to
make sure it runs as flawlessly as possible. This is why all
feedback is welcome.

The Tidybot Home Page is:

<http://www.kronto.org/tidybot/>

and you can see daily updated report pages in action at:

<http://library.lspace.org/tidybot/>

Tidybot and its source code are released as free software under
the MIT License.

Many thanks in advance to anybody willing to help me out with
this.

--
Leo Breebaart <le*@kronto.org>
Jul 24 '05 #1
10 1984
On 23 Jun 2005 12:49:56 GMT, Leo Breebaart <le*@kronto.org> wrote:
I have written an XHTML syntax checker, called 'Tidybot'. It is
built on top of the well-known "HTML Tidy" library.


Poor choice, IMHO. Tidy is built on HTML and isn't a good basis for an
XML tool. What's its behaviour depending on the content-type returned ?
Does it correctly handle XHTML _as_XML_ ?
Jul 24 '05 #2
Andy Dingley <di*****@codesmiths.com> writes:
On 23 Jun 2005 12:49:56 GMT, Leo Breebaart <le*@kronto.org> wrote:
I have written an XHTML syntax checker, called 'Tidybot'. It is
built on top of the well-known "HTML Tidy" library.
Poor choice, IMHO.


Entirely possible -- as I said, I was hesitant about going public
with this utility, because I initially felt it was "just" a
wrapper around a tool that wasn't really created to be an XHTML
validator in the first place.

On the other hand, the TidyLib was *there*, I could actually use
it without too much hassle, and the result has certainly served
its purpose: I run our files through it, it flags things as
errors or warnings, I fix those, our XHTML files become neater
(and the real validators agree with that).

This is a net win for us any which way I look at it.

Tidy is built on HTML and isn't a good basis for an XML tool.
What's its behaviour depending on the content-type returned ?
I'm not sure I understand your question. Tidy (and Tidybot) work
on local files, not on HTML pages retrieved from a server, so
there is no content-type "returned" as I understand the phrase.

Also I would never claim that Tidybot was an XML tool -- I see it
more as a kind of 'lint' for XHTML files. Nothing more, nothing
less.

Does it correctly handle XHTML _as_XML_ ?


I think specifying the "input-xml:yes" to the underlying TidyLib
takes care of that, yes, but perhaps you can give me a specific
example of a situation that might not be handled correctly?

--
Leo Breebaart <le*@kronto.org>
Jul 24 '05 #3
On 23 Jun 2005 16:42:14 GMT, Leo Breebaart <le*@kronto.org> wrote:
Does it correctly handle XHTML _as_XML_ ?


I think specifying the "input-xml:yes" to the underlying TidyLib
takes care of that, yes, but perhaps you can give me a specific
example of a situation that might not be handled correctly?


Namespacing wasn't supported last time I looked. As this is one of the
few reasons for going XML over HTML, that's significant IMHO.

Jul 24 '05 #4
Andy Dingley <di*****@codesmiths.com> writes:
On 23 Jun 2005 16:42:14 GMT, Leo Breebaart <le*@kronto.org> wrote:
Does it correctly handle XHTML _as_XML_ ?


I think specifying the "input-xml:yes" to the underlying
TidyLib takes care of that, yes, but perhaps you can give me a
specific example of a situation that might not be handled
correctly?


Namespacing wasn't supported last time I looked. As this is one
of the few reasons for going XML over HTML, that's significant
IMHO.


If TidyLib does not support XML namespaces, than obviously
Tidybot won't support it either.

I can't shake the feeling that you're finding fault with Tidybot
for it not being the utility you feel it ought to be, rather than
for any deficiency in what it actually is.

I've already described the scenario in which I find Tidybot
helpful to have around, and I am really not trying to make any
claims beyond that.

--
Leo Breebaart <le*@lspace.org>
Jul 24 '05 #5
On 25 Jun 2005 10:21:58 GMT, Leo Breebaart <le*@lspace.org> wrote:
I can't shake the feeling that you're finding fault with Tidybot
for it not being the utility you feel it ought to be,


Like an XHTML syntax checker ?
Jul 24 '05 #6
Andy Dingley <di*****@codesmiths.com> writes:
On 25 Jun 2005 10:21:58 GMT, Leo Breebaart <le*@lspace.org>
wrote:
I can't shake the feeling that you're finding fault with
Tidybot for it not being the utility you feel it ought to be,


Like an XHTML syntax checker ?


Can you perhaps give me an example of an XHTML document that you
think contains one or more syntax errors which Tidy/Tidybot will
not catch?

I think we've already established to everyone's satisfaction that
I'm not the world's foremost authority on XHTML/XML matters. I am
perfectly willing to be educated (or be told where to go to
educate myself, even), but your snarky one-liners aren't exactly
the most constructive criticism I can think of.

--
Leo Breebaart <le*@lspace.org>
Jul 24 '05 #7
Leo Breebaart wrote:
Can you perhaps give me an example of an XHTML document that you
think contains one or more syntax errors which Tidy/Tidybot will
not catch?


No, but here's a good document which it will find bogus errors in:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<h:html xmlns:h="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<h:head>
<h:title>Example</h:title>
</h:head>
<h:body>
<h:h1>Example</h:h1>
<h:p>Some text.</h:p>
</h:body>
</h:html>
Jul 24 '05 #8
Leif K-Brooks <eu*****@ecritters.biz> writes:
Leo Breebaart wrote:
Can you perhaps give me an example of an XHTML document that you
think contains one or more syntax errors which Tidy/Tidybot will
not catch?
No, but here's a good document which it will find bogus errors in:


Thanks, having an actual example to work with is very helpful.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<h:html xmlns:h="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<h:head>
<h:title>Example</h:title>
</h:head>
<h:body>
<h:h1>Example</h:h1>
<h:p>Some text.</h:p>
</h:body>
</h:html>


Tidy (and therefore Tidybot) finds zero errors in this file if
you specify the "input-xml:true" flag.

Conversely, both the W3C and the WDG validators choke on this
example file just as badly as Tidy/Tidybot in normal mode does.

If I change the body in the above snippet to

<h:h1>Example
<h:p>Some text.</h:p>
</h:h1>

then Tidy in XML mode will still not complain, but in XHTML mode
the equivalent

<h1>Example
<p>Some text.</p>
</h1>

would yield a "Warning: missing </h1> before <p>".

Is that the issue you guys are getting at? XML mode only checking
for well-formedness, and not actually doing any validating?

--
Leo Breebaart <le*@lspace.org>
Jul 24 '05 #9
On 25 Jun 2005 16:24:33 GMT, Leo Breebaart <le*@lspace.org> wrote:
Can you perhaps give me an example of an XHTML document that you
think contains one or more syntax errors which Tidy/Tidybot will
not catch?


Not offhand. I'm not thinking of catching errors that are hard to catch,
so much as valid XHTML documents that are incorrectly flagged as invalid
- because they use XML features such as namespacing that aren't part of
HTML.

Jul 24 '05 #10
In article <3i************@individual.net>,
Leo Breebaart <le*@lspace.org> wrote:
Conversely, both the W3C and the WDG validators choke on this
example file just as badly as Tidy/Tidybot in normal mode does.


Those validators use an SGML parser--not an XML parser. Furthermore,
they are DTD-validators and DTDs do not support namespaces, which makes
DTD-validation inappropriate for languages that are layered on top
Namespaces in XML. RELAX NG validation is better suited for namespaced
XML.

Online DTD-validation using an XML parser (Xerces from the popup):
http://valet.webthing.com/page/
Online RELAX NG validation:
http://hsivonen.iki.fi/validator/

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 24 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: jon | last post by:
SOFTWARE BETA TESTERS REQUIRED I've got a Web Editor called Webstar that I have been working on for some time that is ready to be beta-tested and if you want to experience trying out a new...
2
by: Cary | last post by:
Hi all, I've spent the last year writing a new pricing application for my employer. Everyone within the company and my first few beta testers are doing fine, but I have one who is getting this...
22
by: Gianni Rondinini | last post by:
hi all. please excuse the misusage of some tech terms, but writing in english is not as easy as in italian :) i'm designing our new website and, since i want to do something that will last as...
0
by: Jsobel | last post by:
Hi all: I downloaded this new Personal Audio Link app. They issued a press release looking for beta testers, with a compensation offer of 6 months free Vonage service for qualified testers. ...
0
by: John_Gradian | last post by:
Hi all: I downloaded this new Personal Audio Link app. They issued a press release looking for beta testers, with a compensation offer of 6 months free Vonage service for qualified testers. ...
0
by: John_Gradian | last post by:
Hi all: I downloaded this new Personal Audio Link app. They issued a press release looking for beta testers, with a compensation offer of 6 months free Vonage service for qualified testers. ...
0
by: ucontrols.com | last post by:
Hi, we need beta testers for a WindowsForm VS2005 component. All beta testers will get a 30% off the final markup price. Just enter you email and download the beta....
1
by: ucontrols.com | last post by:
Hi, we need beta testers for a WindowsForm VS2005 component. All beta testers will get a 30% off the final markup price. Just enter you email and download the beta....
13
by: mark4asp | last post by:
When I write a url in xhtml, with an unencoded ampersand, like this: http://localhost:2063/Client/ViewReport.aspx?Ref=58&Type=SUMMARY the xhtml sytax checker correctly indicates an error,...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.