473,804 Members | 2,314 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Beta testers for an XHTML syntax checker sought


Hi all,

I have written an XHTML syntax checker, called 'Tidybot'. It is
built on top of the well-known "HTML Tidy" library.

I wrote it because I needed some very specific functionality I
couldn't easily find elsewhere. Now that it exists, I'd like to
make it available to the world, just in case others find it
useful, too.
What it does:

- Traverses one or more source directories on your hard disk
recursively, and runs all .html/.htm files it finds through
TidyLib, collecting all warnings and errors it encounters
and presenting them nicely in an XHTML report.

- You can specify files/directories to exclude from the checks,
you can specify warnings/errors to suppress in the generated
report, and you can specify 'key:value' options to pass
directly to the underlying Tidy engine. You can also tell the
generated report to use a different CSS stylesheet if you want
it to have your own look & feel.

- Comes in both a command-line version (for easy automated
scheduling) and a (functionally equivalent, but more
user-friendly) GUI version.

- Is cross-platform, running on both Unix/Linux and MS Windows
(and I daresay it will run on MacOS as well -- certainly the
command-line version should -- but I haven't been able to test
that). A one-file Installer application is available for
Windows. (On Unix, you will also need to install a number of
prerequisites.)
What it (by design) doesn't do:

- No conversion or editing of files -- it just checks files,
helping you to *keep* things tidy, rather than tidying them for
you.

- Doesn't get pages from a web server -- only static pages
available on the local file system are supported.
What I am looking for:

- People willing to give Tidybot 1.5b2 (the current beta version)
a run on their system, and then send me test reports and
feedback as detailed as they have the time and inclination for.

To clarify: Tidybot may have a rather limited functionality (when
compared to what Tidy is capable of) but it is not a quick hack,
and before I officially release it to the world I really want to
make sure it runs as flawlessly as possible. This is why all
feedback is welcome.

The Tidybot Home Page is:

<http://www.kronto.org/tidybot/>

and you can see daily updated report pages in action at:

<http://library.lspace. org/tidybot/>

Tidybot and its source code are released as free software under
the MIT License.

Many thanks in advance to anybody willing to help me out with
this.

--
Leo Breebaart <le*@kronto.org >
Jul 24 '05 #1
10 2018
On 23 Jun 2005 12:49:56 GMT, Leo Breebaart <le*@kronto.org > wrote:
I have written an XHTML syntax checker, called 'Tidybot'. It is
built on top of the well-known "HTML Tidy" library.


Poor choice, IMHO. Tidy is built on HTML and isn't a good basis for an
XML tool. What's its behaviour depending on the content-type returned ?
Does it correctly handle XHTML _as_XML_ ?
Jul 24 '05 #2
Andy Dingley <di*****@codesm iths.com> writes:
On 23 Jun 2005 12:49:56 GMT, Leo Breebaart <le*@kronto.org > wrote:
I have written an XHTML syntax checker, called 'Tidybot'. It is
built on top of the well-known "HTML Tidy" library.
Poor choice, IMHO.


Entirely possible -- as I said, I was hesitant about going public
with this utility, because I initially felt it was "just" a
wrapper around a tool that wasn't really created to be an XHTML
validator in the first place.

On the other hand, the TidyLib was *there*, I could actually use
it without too much hassle, and the result has certainly served
its purpose: I run our files through it, it flags things as
errors or warnings, I fix those, our XHTML files become neater
(and the real validators agree with that).

This is a net win for us any which way I look at it.

Tidy is built on HTML and isn't a good basis for an XML tool.
What's its behaviour depending on the content-type returned ?
I'm not sure I understand your question. Tidy (and Tidybot) work
on local files, not on HTML pages retrieved from a server, so
there is no content-type "returned" as I understand the phrase.

Also I would never claim that Tidybot was an XML tool -- I see it
more as a kind of 'lint' for XHTML files. Nothing more, nothing
less.

Does it correctly handle XHTML _as_XML_ ?


I think specifying the "input-xml:yes" to the underlying TidyLib
takes care of that, yes, but perhaps you can give me a specific
example of a situation that might not be handled correctly?

--
Leo Breebaart <le*@kronto.org >
Jul 24 '05 #3
On 23 Jun 2005 16:42:14 GMT, Leo Breebaart <le*@kronto.org > wrote:
Does it correctly handle XHTML _as_XML_ ?


I think specifying the "input-xml:yes" to the underlying TidyLib
takes care of that, yes, but perhaps you can give me a specific
example of a situation that might not be handled correctly?


Namespacing wasn't supported last time I looked. As this is one of the
few reasons for going XML over HTML, that's significant IMHO.

Jul 24 '05 #4
Andy Dingley <di*****@codesm iths.com> writes:
On 23 Jun 2005 16:42:14 GMT, Leo Breebaart <le*@kronto.org > wrote:
Does it correctly handle XHTML _as_XML_ ?


I think specifying the "input-xml:yes" to the underlying
TidyLib takes care of that, yes, but perhaps you can give me a
specific example of a situation that might not be handled
correctly?


Namespacing wasn't supported last time I looked. As this is one
of the few reasons for going XML over HTML, that's significant
IMHO.


If TidyLib does not support XML namespaces, than obviously
Tidybot won't support it either.

I can't shake the feeling that you're finding fault with Tidybot
for it not being the utility you feel it ought to be, rather than
for any deficiency in what it actually is.

I've already described the scenario in which I find Tidybot
helpful to have around, and I am really not trying to make any
claims beyond that.

--
Leo Breebaart <le*@lspace.org >
Jul 24 '05 #5
On 25 Jun 2005 10:21:58 GMT, Leo Breebaart <le*@lspace.org > wrote:
I can't shake the feeling that you're finding fault with Tidybot
for it not being the utility you feel it ought to be,


Like an XHTML syntax checker ?
Jul 24 '05 #6
Andy Dingley <di*****@codesm iths.com> writes:
On 25 Jun 2005 10:21:58 GMT, Leo Breebaart <le*@lspace.org >
wrote:
I can't shake the feeling that you're finding fault with
Tidybot for it not being the utility you feel it ought to be,


Like an XHTML syntax checker ?


Can you perhaps give me an example of an XHTML document that you
think contains one or more syntax errors which Tidy/Tidybot will
not catch?

I think we've already established to everyone's satisfaction that
I'm not the world's foremost authority on XHTML/XML matters. I am
perfectly willing to be educated (or be told where to go to
educate myself, even), but your snarky one-liners aren't exactly
the most constructive criticism I can think of.

--
Leo Breebaart <le*@lspace.org >
Jul 24 '05 #7
Leo Breebaart wrote:
Can you perhaps give me an example of an XHTML document that you
think contains one or more syntax errors which Tidy/Tidybot will
not catch?


No, but here's a good document which it will find bogus errors in:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dt d">
<h:html xmlns:h="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<h:head>
<h:title>Exampl e</h:title>
</h:head>
<h:body>
<h:h1>Example </h:h1>
<h:p>Some text.</h:p>
</h:body>
</h:html>
Jul 24 '05 #8
Leif K-Brooks <eu*****@ecritt ers.biz> writes:
Leo Breebaart wrote:
Can you perhaps give me an example of an XHTML document that you
think contains one or more syntax errors which Tidy/Tidybot will
not catch?
No, but here's a good document which it will find bogus errors in:


Thanks, having an actual example to work with is very helpful.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dt d">
<h:html xmlns:h="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<h:head>
<h:title>Exampl e</h:title>
</h:head>
<h:body>
<h:h1>Example </h:h1>
<h:p>Some text.</h:p>
</h:body>
</h:html>


Tidy (and therefore Tidybot) finds zero errors in this file if
you specify the "input-xml:true" flag.

Conversely, both the W3C and the WDG validators choke on this
example file just as badly as Tidy/Tidybot in normal mode does.

If I change the body in the above snippet to

<h:h1>Example
<h:p>Some text.</h:p>
</h:h1>

then Tidy in XML mode will still not complain, but in XHTML mode
the equivalent

<h1>Example
<p>Some text.</p>
</h1>

would yield a "Warning: missing </h1> before <p>".

Is that the issue you guys are getting at? XML mode only checking
for well-formedness, and not actually doing any validating?

--
Leo Breebaart <le*@lspace.org >
Jul 24 '05 #9
On 25 Jun 2005 16:24:33 GMT, Leo Breebaart <le*@lspace.org > wrote:
Can you perhaps give me an example of an XHTML document that you
think contains one or more syntax errors which Tidy/Tidybot will
not catch?


Not offhand. I'm not thinking of catching errors that are hard to catch,
so much as valid XHTML documents that are incorrectly flagged as invalid
- because they use XML features such as namespacing that aren't part of
HTML.

Jul 24 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1300
by: jon | last post by:
SOFTWARE BETA TESTERS REQUIRED I've got a Web Editor called Webstar that I have been working on for some time that is ready to be beta-tested and if you want to experience trying out a new product before the general public, follow the link below. I will send all the beta-testers an evaluation form in time, and you will receive in return a free licence
2
1347
by: Cary | last post by:
Hi all, I've spent the last year writing a new pricing application for my employer. Everyone within the company and my first few beta testers are doing fine, but I have one who is getting this error message: "The .Net Date OLE DB Provider (System.Data.OleDb) requires Microsoft Data Access Components (MDAC) version 2.6 or later. Version 02.10.4202.0 was found currently installed."
22
2874
by: Gianni Rondinini | last post by:
hi all. please excuse the misusage of some tech terms, but writing in english is not as easy as in italian :) i'm designing our new website and, since i want to do something that will last as long as possible and since i'm not in a hurry at all, i wanted to use the most up-to-date authoring language. i use quite a lot html 4.01 in the past, then i recently read carefully the xhtml 1.0 specifications on the w3.org website --just few...
0
1271
by: Jsobel | last post by:
Hi all: I downloaded this new Personal Audio Link app. They issued a press release looking for beta testers, with a compensation offer of 6 months free Vonage service for qualified testers. I'm posting this because they are seeking more beta testers, particularly seeking "power-users" who understand the technology.
0
1470
by: John_Gradian | last post by:
Hi all: I downloaded this new Personal Audio Link app. They issued a press release looking for beta testers, with a compensation offer of 6 months free Vonage service for qualified testers. I'm posting this because they are seeking more beta testers, particularly seeking "power-users" who understand the technology.
0
1500
by: John_Gradian | last post by:
Hi all: I downloaded this new Personal Audio Link app. They issued a press release looking for beta testers, with a compensation offer of 6 months free Vonage service for qualified testers. I'm posting this because they are seeking more beta testers, particularly seeking "power-users" who understand the technology.
0
1218
by: ucontrols.com | last post by:
Hi, we need beta testers for a WindowsForm VS2005 component. All beta testers will get a 30% off the final markup price. Just enter you email and download the beta. http://www.ucontrols.com/InterfaceProDownload.html Thank you. UControls.com
1
1223
by: ucontrols.com | last post by:
Hi, we need beta testers for a WindowsForm VS2005 component. All beta testers will get a 30% off the final markup price. Just enter you email and download the beta. http://www.ucontrols.com/InterfaceProDownload.html Thank you. UControls.com
13
5367
by: mark4asp | last post by:
When I write a url in xhtml, with an unencoded ampersand, like this: http://localhost:2063/Client/ViewReport.aspx?Ref=58&Type=SUMMARY the xhtml sytax checker correctly indicates an error, telling me that it should be: http://localhost:2063/Client/ViewReport.aspx?Ref=58&amp;Type=SUMMARY see: <http://www.htmlhelp.com/tools/validator/problems.html#amp>
0
10595
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10343
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10341
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7634
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6862
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5530
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5673
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4308
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3831
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.