By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,649 Members | 2,142 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,649 IT Pros & Developers. It's quick & easy.

Tidy trimming empty tags

P: n/a
Hi.

(this is somewhat similar to yesterday's thread about empty links)

I noticed that Tidy [0] issues warnings whenever it encounters empty
tags, and strips those tags if cleanup was requested. This is okay in
some cases (such as <tbody>), but problematic for other tags (such as
<option>). Some tags (td, th, ...) do not produce warnings when they are
empty.

The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.
A few examples:
1) empty <select> options
-----------------------------------------------------------------
<select name="x">
<option value=""></option>
<option value="foo">a foo</option>
<option value="bar">a bar</option>
</select>

[tidy] Warning: "trimming empty <option>"
I don't know how I could avoid this warning. I don't want to display any
text in the first option, and hacks like <option>&nbsp;</option> are not
acceptable.
2) Empty p, div, span, h1, a[href], ... tags
-----------------------------------------------------------------
<td class="c1"><span class="c2"></span> Some Text</td>
Constructs like this are sometimes used by our webdesigner in
combination with CSS. The span tag is technically empty, but the
stylesheet will cause an image to be displayed.

<div id="ibox"></div>
This could happen for example if "ibox" was a box containing additional
information for the main content, but there are no additional infos the
current page. The box itself should still be displayed, so the tag is
left empty.

<p class="notes"><?= $notes ?></p>
Empty tags can also occur as an artifact of server-side scripting; if
$notes is empty, so is the <p> tag (this case can be avoided, I know).
3) Empty td, th tags; script tags
-----------------------------------------------------------------
<tr><th></th></tr>
<tr><td></td></tr>

Tidy ignores empty table cells and does not try to strip them.
<script type="text/javascript" src="xxx.js"></script>
Same goes for empty script tags with src attributes.
4) Empty thead, tfoot tags
-----------------------------------------------------------------
<table>
<thead><tr><td>head</td></tr></thead>
<tfoot></tfoot>
<tbody><tr><td>body</td></tr></tbody>
</table>

Tidy issues a warning for the empty tfoot element, which is expected
because <tfoot> must never be empty. In this case I would actually
rather get an error instead of a warning, because the document does not
qualify as valid XHTML anymore.
I would like to get rid of the warnings in 1) + 2), to simplify
automated validation and to ease my mind. Is Tidy correct in issuing
warnings and stripping the tags? Should I always try to avoid empty
tags? If so, how?
Thanks in advance,
Stefan
[0] http://tidy.sourceforge.net/
Aug 29 '05 #1
Share this Question
Share on Google+
12 Replies


P: n/a
Stefan Weiss <sp******@foo.at> wrote:
The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator.
Tidy is a linter that has it's own rules about how markup should be, no
relation to a validator that checks against the DTD.
A few examples:

1) empty <select> options
-----------------------------------------------------------------
<select name="x">
<option value=""></option>
<option value="foo">a foo</option>
<option value="bar">a bar</option>
</select>

[tidy] Warning: "trimming empty <option>"
That seems reasonable, the empty option should be removed.
I don't know how I could avoid this warning. I don't want to display any
text in the first option
Stop wanting that.
, and hacks like <option>&nbsp;</option> are not
acceptable.
Good.
<td class="c1"><span class="c2"></span> Some Text</td>
Constructs like this are sometimes used by our webdesigner in
combination with CSS. The span tag is technically empty, but the
stylesheet will cause an image to be displayed.
There is no way to prevent Tidy from trimming these empty elements.

Ideally the empty elements that Tidy strips should not be in the markup,
but in the real world there are no arguments why the above type of code
shouldn't be used if used by a skilled coder.

Tidy is therefore over zealous in removing them, the solution is to do
to Tidy what it does to empty spans, get rid of it.
Tidy ignores empty table cells and does not try to strip them.
Doing so could screw up tables.
<script type="text/javascript" src="xxx.js"></script>
Same goes for empty script tags with src attributes.
Same. Tidy is stupid, but not that stupid.
4) Empty thead, tfoot tags
-----------------------------------------------------------------
<table>
<thead><tr><td>head</td></tr></thead>
<tfoot></tfoot>
<tbody><tr><td>body</td></tr></tbody>
</table>

Tidy issues a warning for the empty tfoot element, which is expected
because <tfoot> must never be empty. In this case I would actually
rather get an error instead of a warning, because the document does not
qualify as valid XHTML anymore.


Use a validator if you want to test for validity, don't use Tidy.

--
Spartanicus
Aug 29 '05 #2

P: n/a
Stefan Weiss wrote:

The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.

The actual phrasing is:
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like XHTML 1.0 Transitional
That means there is some non-strict syntax used in the document
somewhere (tidy wouldn't want so say where, of course). The W3C validator
is much better at identifying specific problems.

--
jmm dash list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)
Aug 29 '05 #3

P: n/a
Jim Moe wrote:
Stefan Weiss wrote:

The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.
The actual phrasing is:
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like XHTML 1.0 Transitional
That means there is some non-strict syntax used in the document


Um, it means no such thing. Tidy has a bad habit of thinking
valid, strict markup is "transitional".
somewhere (tidy wouldn't want so say where, of course). The W3C
validator is much better at identifying specific problems.


Indeed, a validator will tell you exactly what is allowed.
If you want the Strict/Legacy distinction highlighted more
clearly, AccessValet will do that using the 'trafficlight'
metaphor (green=good, amber=deprecated, red=invalid markup).
The amber then represents the difference between strict and
"transitional".

(Note that neither Tidy nor AccessValet is a validator.
The same is true of some tools that are marketed as "validator"s).

--
Nick Kew
Aug 29 '05 #4

P: n/a
Nick Kew wrote:

The actual phrasing is:
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like XHTML 1.0 Transitional
That means there is some non-strict syntax used in the document


Um, it means no such thing. Tidy has a bad habit of thinking
valid, strict markup is "transitional".

Hmm. Whenever I cleaned up the warnings indicated by a validator, tidy
then thought the document looked strict. Guess I've been lucky.

--
jmm dash list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)
Aug 30 '05 #5

P: n/a
Spartanicus wrote:
There is no way to prevent Tidy from trimming these empty elements.

Ideally the empty elements that Tidy strips should not be in the markup,
There's no "ideally" about it. They're valid, Tidy should keep it's
thieving paws off.

- As a general rule, "linters" should not take a dislike to things
simply because they don't understand them. It's a big world out there,
they should limit themselves to things we _know_ to be bad, not extend
this to things they simply haven't thought of a use for. Potentiality
is bigger than current actuality - the web (indeed software in general)
is always thinking of new ways to use things that are possible, even if
not intended for that. This is not necessarily a bad thing (even if it
does sometimes lead to 1x1.gif and friends).

As concrete examples, Tidy will strip an empty <div> that's later
intended for use from DHTML. It will also strip empty <div>s that are
intended as hooks for CSS background images (commonplace rounded box
code). Both of these are valid, useful and generally commendable
practices which Tidy will break.

but in the real world there are no arguments why the above type of code
shouldn't be used if used by a skilled coder.


Skilled? It's not Tidy's place to be making value judgements like
that. There's enough cruft out there in the provably invalid, without
it needing to start on the simply non-mainstream.

Aug 30 '05 #6

P: n/a
In our last episode,
<KM*****************************************@gigan ews.com>,
the lovely and talented Jim Moe
broadcast on comp.infosystems.www.authoring.html:
Stefan Weiss wrote:

The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.

The actual phrasing is:
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like XHTML 1.0 Transitional
That means there is some non-strict syntax used in the document
somewhere (tidy wouldn't want so say where, of course). The W3C validator
is much better at identifying specific problems.


In point of fact, Tidy lies. And it will change the Doctype to
something wrong without asking permission. It often identifies
documents as "proprietary" when in fact onsgmls says they
validate with the advertised standard Doctype. If you use tidy,
I'd advise you to run documents through it without a Doctype,
stream the output to an untidy script to stick the correct
Doctype on it, and break lines in a way to cater to broken
browsers like IE, then stream it through onsgmls or a similar
real validator

Here is one of the untidies I use (this for 4.01 loose:
(change path in linux)

***********************
#!/usr/local/bin/perl

$| = "flush";

while(<STDIN>){
$_ =~ s#<html#<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"\n "http://www.w3.org/TR/html4/loose.dtd">\n\n<html#;
$_ =~ s/^\n//g;
$_ =~ s/li>\s*/li\n>/g; # Mostly to keep IE from breaking
# because IE can't do lists right.
$_ =~ s/ul>\s*/ul\n>/g;
$_ =~ s$/ul>$/ul\n>$g;
$_ =~ s/>&nbsp\;</></g; # Takes out nbsp used to keep empty
# elements
$_ =~ s/<ul><li>/<ul\n><li\n>/g;
$_ =~ s#</li><li>#</li\n><li\n>#g;
print STDOUT;
}
**************************

--
Lars Eighner ei*****@io.com http://www.larseighner.com/
"Fascism should more properly be called corporatism, since it is the
merger of state and corporate power."-Benito Mussolini * When you write the
check to pay your taxes, remember there are two l's in "Halliburton."
Aug 30 '05 #7

P: n/a
On 2005-08-29 19:50, Spartanicus wrote:
Tidy is a linter that has it's own rules about how markup should be, no
relation to a validator that checks against the DTD.


Point taken. I was not aware that Tidy is not a validator, but you are
right of course. I mainly use it because it is convenient for console
use, and because there is a nifty Firefox extension called "HTML
Validator (based on Tidy)" that will automatically check each page as it
is loaded, and display the results in the status bar. Tidy also offers
some accessibility hints.

The console aspect was easily remedied by a short Perl script that
connects to validator.w3.org, and I guess I'll have to live with Tidy's
quirks. At least the Firefox extension has an option to ignore certain
warnings.
<select name="x">
<option value=""></option>
<option value="foo">a foo</option>
<option value="bar">a bar</option>
</select>

[tidy] Warning: "trimming empty <option>"


That seems reasonable, the empty option should be removed.


No it shouldn't, that would change the contents of the form:
<option value="42" selected="selected"></option>
I don't know how I could avoid this warning. I don't want to display
any text in the first option


Stop wanting that.


Why? Empty options are valid and useful. I won't run into any problems
with them unless I let Tidy "clean up" my HTML.
cheers,
stefan
Aug 31 '05 #8

P: n/a
Stefan Weiss <sp******@foo.at> wrote:
<select name="x">
<option value=""></option>
<option value="foo">a foo</option>
<option value="bar">a bar</option>
</select>

[tidy] Warning: "trimming empty <option>"
That seems reasonable, the empty option should be removed.


No it shouldn't, that would change the contents of the form:
<option value="42" selected="selected"></option>


That's not the example you provided.
I don't know how I could avoid this warning. I don't want to display
any text in the first option


Stop wanting that.


Why? Empty options are valid


Amongst many other non sensible code constructs.
and useful.


Explain the purpose of <option value=""></option>

Btw, my copy of Tidy does not remove the above construct.

--
Spartanicus
Aug 31 '05 #9

P: n/a
On 2005-08-31 16:58, Spartanicus wrote:
<option value="42" selected="selected"></option> That's not the example you provided.


No, the previous example was quoted above that. This line was intended
to demonstrate how the contents of the form would be changed by removing
empty <option> tags. Tidy strips the tag in both cases.
Explain the purpose of <option value=""></option>
Any optional <select> field that you wish to leave blank (because you
don't have the information yet, or because none of the options apply).
You could have the first option contain "(none)" or "N/A", but that
doesn't add anything except noise.

If you mean that value="" is redundant if the tag does not contain any
text, I agree, but that doesn't have anything to do with Tidy removing
the tag completely.
Btw, my copy of Tidy does not remove the above construct.


Interesting. Mine does, and it's relatively recent ("HTML Tidy for
Linux/x86 released on 12 April 2005").
cheers,
stefan
Aug 31 '05 #10

P: n/a
Stefan Weiss <sp******@foo.at> wrote:
Explain the purpose of <option value=""></option>


Any optional <select> field that you wish to leave blank (because you
don't have the information yet, or because none of the options apply).


Leave it out, it serves no purpose to the user.

--
Spartanicus
Aug 31 '05 #11

P: n/a
On Wed, 31 Aug 2005, Spartanicus wrote:
Stefan Weiss <sp******@foo.at> wrote:
I don't know how I could avoid this warning. I don't want to display
any text in the first option

Stop wanting that.
Why? Empty options are valid

[...] Explain the purpose of <option value=""></option>


I'd rather *you* explained to us (preferably citing some peer-reviewed
guideline etc.) just what you reckon is wrong with what the poster
wanted.

It looks to me like an empty option. Myself, I suppose there are
quite a number of situations where an empty option would make sense,
and I'm still puzzled why you are so vociferous about rejecting it,
apparently "out of hand".

For example, a journey planner, for an outward journey and optional
return journey, has SELECT widgets to choose the return time (the hour
and the minute, in increments of 15 minutes): leaving the time blank
(which is the form's default) means that one does not require a return
journey to be planned. Seems reasonable to me.

I presume you wanted the empty value replaced by some text - whatever
might be appropriate in context - stating that this is an empty or
do-nothing option, with corresponding code in the server-side process
to deal with it and treat it as an empty input. But I'd like to know
why you are so insistent on this, if an empty input is meaningful in
its context, and can be supplied in such an obvious way.

thanks
Aug 31 '05 #12

P: n/a
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
and I'm still puzzled why you are so vociferous about rejecting it,
apparently "out of hand".
I suggest that you add me to your kill file given that you increasingly
seem to have trouble objectively reacting to the content of my posts.
For example, a journey planner, for an outward journey and optional
return journey, has SELECT widgets to choose the return time (the hour
and the minute, in increments of 15 minutes): leaving the time blank
(which is the form's default) means that one does not require a return
journey to be planned. Seems reasonable to me.


Using the above context it should read "not applicable", "single
journey" or some such, leaving it empty fails to convey this to the
user.

--
Spartanicus
Aug 31 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.