472,798 Members | 1,192 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,798 software developers and data experts.

Tidy trimming empty tags

Hi.

(this is somewhat similar to yesterday's thread about empty links)

I noticed that Tidy [0] issues warnings whenever it encounters empty
tags, and strips those tags if cleanup was requested. This is okay in
some cases (such as <tbody>), but problematic for other tags (such as
<option>). Some tags (td, th, ...) do not produce warnings when they are
empty.

The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.
A few examples:
1) empty <select> options
-----------------------------------------------------------------
<select name="x">
<option value=""></option>
<option value="foo">a foo</option>
<option value="bar">a bar</option>
</select>

[tidy] Warning: "trimming empty <option>"
I don't know how I could avoid this warning. I don't want to display any
text in the first option, and hacks like <option>&nbsp;</option> are not
acceptable.
2) Empty p, div, span, h1, a[href], ... tags
-----------------------------------------------------------------
<td class="c1"><span class="c2"></span> Some Text</td>
Constructs like this are sometimes used by our webdesigner in
combination with CSS. The span tag is technically empty, but the
stylesheet will cause an image to be displayed.

<div id="ibox"></div>
This could happen for example if "ibox" was a box containing additional
information for the main content, but there are no additional infos the
current page. The box itself should still be displayed, so the tag is
left empty.

<p class="notes"><?= $notes ?></p>
Empty tags can also occur as an artifact of server-side scripting; if
$notes is empty, so is the <p> tag (this case can be avoided, I know).
3) Empty td, th tags; script tags
-----------------------------------------------------------------
<tr><th></th></tr>
<tr><td></td></tr>

Tidy ignores empty table cells and does not try to strip them.
<script type="text/javascript" src="xxx.js"></script>
Same goes for empty script tags with src attributes.
4) Empty thead, tfoot tags
-----------------------------------------------------------------
<table>
<thead><tr><td>head</td></tr></thead>
<tfoot></tfoot>
<tbody><tr><td>body</td></tr></tbody>
</table>

Tidy issues a warning for the empty tfoot element, which is expected
because <tfoot> must never be empty. In this case I would actually
rather get an error instead of a warning, because the document does not
qualify as valid XHTML anymore.
I would like to get rid of the warnings in 1) + 2), to simplify
automated validation and to ease my mind. Is Tidy correct in issuing
warnings and stripping the tags? Should I always try to avoid empty
tags? If so, how?
Thanks in advance,
Stefan
[0] http://tidy.sourceforge.net/
Aug 29 '05 #1
12 16938
Stefan Weiss <sp******@foo.at> wrote:
The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator.
Tidy is a linter that has it's own rules about how markup should be, no
relation to a validator that checks against the DTD.
A few examples:

1) empty <select> options
-----------------------------------------------------------------
<select name="x">
<option value=""></option>
<option value="foo">a foo</option>
<option value="bar">a bar</option>
</select>

[tidy] Warning: "trimming empty <option>"
That seems reasonable, the empty option should be removed.
I don't know how I could avoid this warning. I don't want to display any
text in the first option
Stop wanting that.
, and hacks like <option>&nbsp;</option> are not
acceptable.
Good.
<td class="c1"><span class="c2"></span> Some Text</td>
Constructs like this are sometimes used by our webdesigner in
combination with CSS. The span tag is technically empty, but the
stylesheet will cause an image to be displayed.
There is no way to prevent Tidy from trimming these empty elements.

Ideally the empty elements that Tidy strips should not be in the markup,
but in the real world there are no arguments why the above type of code
shouldn't be used if used by a skilled coder.

Tidy is therefore over zealous in removing them, the solution is to do
to Tidy what it does to empty spans, get rid of it.
Tidy ignores empty table cells and does not try to strip them.
Doing so could screw up tables.
<script type="text/javascript" src="xxx.js"></script>
Same goes for empty script tags with src attributes.
Same. Tidy is stupid, but not that stupid.
4) Empty thead, tfoot tags
-----------------------------------------------------------------
<table>
<thead><tr><td>head</td></tr></thead>
<tfoot></tfoot>
<tbody><tr><td>body</td></tr></tbody>
</table>

Tidy issues a warning for the empty tfoot element, which is expected
because <tfoot> must never be empty. In this case I would actually
rather get an error instead of a warning, because the document does not
qualify as valid XHTML anymore.


Use a validator if you want to test for validity, don't use Tidy.

--
Spartanicus
Aug 29 '05 #2
Stefan Weiss wrote:

The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.

The actual phrasing is:
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like XHTML 1.0 Transitional
That means there is some non-strict syntax used in the document
somewhere (tidy wouldn't want so say where, of course). The W3C validator
is much better at identifying specific problems.

--
jmm dash list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)
Aug 29 '05 #3
Jim Moe wrote:
Stefan Weiss wrote:

The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.
The actual phrasing is:
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like XHTML 1.0 Transitional
That means there is some non-strict syntax used in the document


Um, it means no such thing. Tidy has a bad habit of thinking
valid, strict markup is "transitional".
somewhere (tidy wouldn't want so say where, of course). The W3C
validator is much better at identifying specific problems.


Indeed, a validator will tell you exactly what is allowed.
If you want the Strict/Legacy distinction highlighted more
clearly, AccessValet will do that using the 'trafficlight'
metaphor (green=good, amber=deprecated, red=invalid markup).
The amber then represents the difference between strict and
"transitional".

(Note that neither Tidy nor AccessValet is a validator.
The same is true of some tools that are marketed as "validator"s).

--
Nick Kew
Aug 29 '05 #4
Nick Kew wrote:

The actual phrasing is:
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like XHTML 1.0 Transitional
That means there is some non-strict syntax used in the document


Um, it means no such thing. Tidy has a bad habit of thinking
valid, strict markup is "transitional".

Hmm. Whenever I cleaned up the warnings indicated by a validator, tidy
then thought the document looked strict. Guess I've been lucky.

--
jmm dash list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)
Aug 30 '05 #5
Spartanicus wrote:
There is no way to prevent Tidy from trimming these empty elements.

Ideally the empty elements that Tidy strips should not be in the markup,
There's no "ideally" about it. They're valid, Tidy should keep it's
thieving paws off.

- As a general rule, "linters" should not take a dislike to things
simply because they don't understand them. It's a big world out there,
they should limit themselves to things we _know_ to be bad, not extend
this to things they simply haven't thought of a use for. Potentiality
is bigger than current actuality - the web (indeed software in general)
is always thinking of new ways to use things that are possible, even if
not intended for that. This is not necessarily a bad thing (even if it
does sometimes lead to 1x1.gif and friends).

As concrete examples, Tidy will strip an empty <div> that's later
intended for use from DHTML. It will also strip empty <div>s that are
intended as hooks for CSS background images (commonplace rounded box
code). Both of these are valid, useful and generally commendable
practices which Tidy will break.

but in the real world there are no arguments why the above type of code
shouldn't be used if used by a skilled coder.


Skilled? It's not Tidy's place to be making value judgements like
that. There's enough cruft out there in the provably invalid, without
it needing to start on the simply non-mainstream.

Aug 30 '05 #6
In our last episode,
<KM*****************************************@gigan ews.com>,
the lovely and talented Jim Moe
broadcast on comp.infosystems.www.authoring.html:
Stefan Weiss wrote:

The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.

The actual phrasing is:
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like XHTML 1.0 Transitional
That means there is some non-strict syntax used in the document
somewhere (tidy wouldn't want so say where, of course). The W3C validator
is much better at identifying specific problems.


In point of fact, Tidy lies. And it will change the Doctype to
something wrong without asking permission. It often identifies
documents as "proprietary" when in fact onsgmls says they
validate with the advertised standard Doctype. If you use tidy,
I'd advise you to run documents through it without a Doctype,
stream the output to an untidy script to stick the correct
Doctype on it, and break lines in a way to cater to broken
browsers like IE, then stream it through onsgmls or a similar
real validator

Here is one of the untidies I use (this for 4.01 loose:
(change path in linux)

***********************
#!/usr/local/bin/perl

$| = "flush";

while(<STDIN>){
$_ =~ s#<html#<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"\n "http://www.w3.org/TR/html4/loose.dtd">\n\n<html#;
$_ =~ s/^\n//g;
$_ =~ s/li>\s*/li\n>/g; # Mostly to keep IE from breaking
# because IE can't do lists right.
$_ =~ s/ul>\s*/ul\n>/g;
$_ =~ s$/ul>$/ul\n>$g;
$_ =~ s/>&nbsp\;</></g; # Takes out nbsp used to keep empty
# elements
$_ =~ s/<ul><li>/<ul\n><li\n>/g;
$_ =~ s#</li><li>#</li\n><li\n>#g;
print STDOUT;
}
**************************

--
Lars Eighner ei*****@io.com http://www.larseighner.com/
"Fascism should more properly be called corporatism, since it is the
merger of state and corporate power."-Benito Mussolini * When you write the
check to pay your taxes, remember there are two l's in "Halliburton."
Aug 30 '05 #7
On 2005-08-29 19:50, Spartanicus wrote:
Tidy is a linter that has it's own rules about how markup should be, no
relation to a validator that checks against the DTD.


Point taken. I was not aware that Tidy is not a validator, but you are
right of course. I mainly use it because it is convenient for console
use, and because there is a nifty Firefox extension called "HTML
Validator (based on Tidy)" that will automatically check each page as it
is loaded, and display the results in the status bar. Tidy also offers
some accessibility hints.

The console aspect was easily remedied by a short Perl script that
connects to validator.w3.org, and I guess I'll have to live with Tidy's
quirks. At least the Firefox extension has an option to ignore certain
warnings.
<select name="x">
<option value=""></option>
<option value="foo">a foo</option>
<option value="bar">a bar</option>
</select>

[tidy] Warning: "trimming empty <option>"


That seems reasonable, the empty option should be removed.


No it shouldn't, that would change the contents of the form:
<option value="42" selected="selected"></option>
I don't know how I could avoid this warning. I don't want to display
any text in the first option


Stop wanting that.


Why? Empty options are valid and useful. I won't run into any problems
with them unless I let Tidy "clean up" my HTML.
cheers,
stefan
Aug 31 '05 #8
Stefan Weiss <sp******@foo.at> wrote:
<select name="x">
<option value=""></option>
<option value="foo">a foo</option>
<option value="bar">a bar</option>
</select>

[tidy] Warning: "trimming empty <option>"
That seems reasonable, the empty option should be removed.


No it shouldn't, that would change the contents of the form:
<option value="42" selected="selected"></option>


That's not the example you provided.
I don't know how I could avoid this warning. I don't want to display
any text in the first option


Stop wanting that.


Why? Empty options are valid


Amongst many other non sensible code constructs.
and useful.


Explain the purpose of <option value=""></option>

Btw, my copy of Tidy does not remove the above construct.

--
Spartanicus
Aug 31 '05 #9
On 2005-08-31 16:58, Spartanicus wrote:
<option value="42" selected="selected"></option> That's not the example you provided.


No, the previous example was quoted above that. This line was intended
to demonstrate how the contents of the form would be changed by removing
empty <option> tags. Tidy strips the tag in both cases.
Explain the purpose of <option value=""></option>
Any optional <select> field that you wish to leave blank (because you
don't have the information yet, or because none of the options apply).
You could have the first option contain "(none)" or "N/A", but that
doesn't add anything except noise.

If you mean that value="" is redundant if the tag does not contain any
text, I agree, but that doesn't have anything to do with Tidy removing
the tag completely.
Btw, my copy of Tidy does not remove the above construct.


Interesting. Mine does, and it's relatively recent ("HTML Tidy for
Linux/x86 released on 12 April 2005").
cheers,
stefan
Aug 31 '05 #10
Stefan Weiss <sp******@foo.at> wrote:
Explain the purpose of <option value=""></option>


Any optional <select> field that you wish to leave blank (because you
don't have the information yet, or because none of the options apply).


Leave it out, it serves no purpose to the user.

--
Spartanicus
Aug 31 '05 #11
On Wed, 31 Aug 2005, Spartanicus wrote:
Stefan Weiss <sp******@foo.at> wrote:
I don't know how I could avoid this warning. I don't want to display
any text in the first option

Stop wanting that.
Why? Empty options are valid

[...] Explain the purpose of <option value=""></option>


I'd rather *you* explained to us (preferably citing some peer-reviewed
guideline etc.) just what you reckon is wrong with what the poster
wanted.

It looks to me like an empty option. Myself, I suppose there are
quite a number of situations where an empty option would make sense,
and I'm still puzzled why you are so vociferous about rejecting it,
apparently "out of hand".

For example, a journey planner, for an outward journey and optional
return journey, has SELECT widgets to choose the return time (the hour
and the minute, in increments of 15 minutes): leaving the time blank
(which is the form's default) means that one does not require a return
journey to be planned. Seems reasonable to me.

I presume you wanted the empty value replaced by some text - whatever
might be appropriate in context - stating that this is an empty or
do-nothing option, with corresponding code in the server-side process
to deal with it and treat it as an empty input. But I'd like to know
why you are so insistent on this, if an empty input is meaningful in
its context, and can be supplied in such an obvious way.

thanks
Aug 31 '05 #12
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
and I'm still puzzled why you are so vociferous about rejecting it,
apparently "out of hand".
I suggest that you add me to your kill file given that you increasingly
seem to have trouble objectively reacting to the content of my posts.
For example, a journey planner, for an outward journey and optional
return journey, has SELECT widgets to choose the return time (the hour
and the minute, in increments of 15 minutes): leaving the time blank
(which is the form's default) means that one does not require a return
journey to be planned. Seems reasonable to me.


Using the above context it should read "not applicable", "single
journey" or some such, leaving it empty fails to convey this to the
user.

--
Spartanicus
Aug 31 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: vega | last post by:
How do I detect empty tags if I have the DOM document? For example: <br /> and <br></br> I tried org.w3c.dom.Node.getFirstChild(), it returns null for both <br /> and <br></br> I also tried...
0
by: heatherrai | last post by:
Hi Can anyone help me create empty tags using a schema in XMLSpy please? I have a definition for a complex element <book> which has children <author>, <date> etc. but I am just using...
2
by: nospam | last post by:
I'm trying to do a transform on some xml using the XslTransform class. The input xml contains an empty element in short format like this... <element attrib="abc"/> But during the transform...
1
by: Piper707 | last post by:
Hi, I'd like to know if there are any more ways of restricting an XML document to having only non-empty tags (containing Strings). I can think of 2 ways: 1) <xs:simpleType name="tagName">
5
by: wolf_y | last post by:
My question is simply: under what conditions will empty tags of the form <MOM></MOM> pass schema validation? Of course, the mirror question is: under what conditions will empty tags fail...
11
by: David | last post by:
Hi All, I am working on a script that is theoreticaly simple but I can not get it to work completely. I am dealing with a page spit out by .NET that leaves empty tags in the markup. I need a...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: erikbower65 | last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps: 1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal. 2. Connect to...
0
by: kcodez | last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
14
DJRhino1175
by: DJRhino1175 | last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this - If...
5
by: DJRhino | last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer) If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _ 310030356 Or 310030359 Or 310030362 Or...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
0
by: lllomh | last post by:
How does React native implement an English player?
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.