473,473 Members | 2,167 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Tidy trimming empty tags

Hi.

(this is somewhat similar to yesterday's thread about empty links)

I noticed that Tidy [0] issues warnings whenever it encounters empty
tags, and strips those tags if cleanup was requested. This is okay in
some cases (such as <tbody>), but problematic for other tags (such as
<option>). Some tags (td, th, ...) do not produce warnings when they are
empty.

The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.
A few examples:
1) empty <select> options
-----------------------------------------------------------------
<select name="x">
<option value=""></option>
<option value="foo">a foo</option>
<option value="bar">a bar</option>
</select>

[tidy] Warning: "trimming empty <option>"
I don't know how I could avoid this warning. I don't want to display any
text in the first option, and hacks like <option>&nbsp;</option> are not
acceptable.
2) Empty p, div, span, h1, a[href], ... tags
-----------------------------------------------------------------
<td class="c1"><span class="c2"></span> Some Text</td>
Constructs like this are sometimes used by our webdesigner in
combination with CSS. The span tag is technically empty, but the
stylesheet will cause an image to be displayed.

<div id="ibox"></div>
This could happen for example if "ibox" was a box containing additional
information for the main content, but there are no additional infos the
current page. The box itself should still be displayed, so the tag is
left empty.

<p class="notes"><?= $notes ?></p>
Empty tags can also occur as an artifact of server-side scripting; if
$notes is empty, so is the <p> tag (this case can be avoided, I know).
3) Empty td, th tags; script tags
-----------------------------------------------------------------
<tr><th></th></tr>
<tr><td></td></tr>

Tidy ignores empty table cells and does not try to strip them.
<script type="text/javascript" src="xxx.js"></script>
Same goes for empty script tags with src attributes.
4) Empty thead, tfoot tags
-----------------------------------------------------------------
<table>
<thead><tr><td>head</td></tr></thead>
<tfoot></tfoot>
<tbody><tr><td>body</td></tr></tbody>
</table>

Tidy issues a warning for the empty tfoot element, which is expected
because <tfoot> must never be empty. In this case I would actually
rather get an error instead of a warning, because the document does not
qualify as valid XHTML anymore.
I would like to get rid of the warnings in 1) + 2), to simplify
automated validation and to ease my mind. Is Tidy correct in issuing
warnings and stripping the tags? Should I always try to avoid empty
tags? If so, how?
Thanks in advance,
Stefan
[0] http://tidy.sourceforge.net/
Aug 29 '05 #1
12 17070
Stefan Weiss <sp******@foo.at> wrote:
The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator.
Tidy is a linter that has it's own rules about how markup should be, no
relation to a validator that checks against the DTD.
A few examples:

1) empty <select> options
-----------------------------------------------------------------
<select name="x">
<option value=""></option>
<option value="foo">a foo</option>
<option value="bar">a bar</option>
</select>

[tidy] Warning: "trimming empty <option>"
That seems reasonable, the empty option should be removed.
I don't know how I could avoid this warning. I don't want to display any
text in the first option
Stop wanting that.
, and hacks like <option>&nbsp;</option> are not
acceptable.
Good.
<td class="c1"><span class="c2"></span> Some Text</td>
Constructs like this are sometimes used by our webdesigner in
combination with CSS. The span tag is technically empty, but the
stylesheet will cause an image to be displayed.
There is no way to prevent Tidy from trimming these empty elements.

Ideally the empty elements that Tidy strips should not be in the markup,
but in the real world there are no arguments why the above type of code
shouldn't be used if used by a skilled coder.

Tidy is therefore over zealous in removing them, the solution is to do
to Tidy what it does to empty spans, get rid of it.
Tidy ignores empty table cells and does not try to strip them.
Doing so could screw up tables.
<script type="text/javascript" src="xxx.js"></script>
Same goes for empty script tags with src attributes.
Same. Tidy is stupid, but not that stupid.
4) Empty thead, tfoot tags
-----------------------------------------------------------------
<table>
<thead><tr><td>head</td></tr></thead>
<tfoot></tfoot>
<tbody><tr><td>body</td></tr></tbody>
</table>

Tidy issues a warning for the empty tfoot element, which is expected
because <tfoot> must never be empty. In this case I would actually
rather get an error instead of a warning, because the document does not
qualify as valid XHTML anymore.


Use a validator if you want to test for validity, don't use Tidy.

--
Spartanicus
Aug 29 '05 #2
Stefan Weiss wrote:

The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.

The actual phrasing is:
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like XHTML 1.0 Transitional
That means there is some non-strict syntax used in the document
somewhere (tidy wouldn't want so say where, of course). The W3C validator
is much better at identifying specific problems.

--
jmm dash list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)
Aug 29 '05 #3
Jim Moe wrote:
Stefan Weiss wrote:

The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.
The actual phrasing is:
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like XHTML 1.0 Transitional
That means there is some non-strict syntax used in the document


Um, it means no such thing. Tidy has a bad habit of thinking
valid, strict markup is "transitional".
somewhere (tidy wouldn't want so say where, of course). The W3C
validator is much better at identifying specific problems.


Indeed, a validator will tell you exactly what is allowed.
If you want the Strict/Legacy distinction highlighted more
clearly, AccessValet will do that using the 'trafficlight'
metaphor (green=good, amber=deprecated, red=invalid markup).
The amber then represents the difference between strict and
"transitional".

(Note that neither Tidy nor AccessValet is a validator.
The same is true of some tools that are marketed as "validator"s).

--
Nick Kew
Aug 29 '05 #4
Nick Kew wrote:

The actual phrasing is:
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like XHTML 1.0 Transitional
That means there is some non-strict syntax used in the document


Um, it means no such thing. Tidy has a bad habit of thinking
valid, strict markup is "transitional".

Hmm. Whenever I cleaned up the warnings indicated by a validator, tidy
then thought the document looked strict. Guess I've been lucky.

--
jmm dash list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)
Aug 30 '05 #5
Spartanicus wrote:
There is no way to prevent Tidy from trimming these empty elements.

Ideally the empty elements that Tidy strips should not be in the markup,
There's no "ideally" about it. They're valid, Tidy should keep it's
thieving paws off.

- As a general rule, "linters" should not take a dislike to things
simply because they don't understand them. It's a big world out there,
they should limit themselves to things we _know_ to be bad, not extend
this to things they simply haven't thought of a use for. Potentiality
is bigger than current actuality - the web (indeed software in general)
is always thinking of new ways to use things that are possible, even if
not intended for that. This is not necessarily a bad thing (even if it
does sometimes lead to 1x1.gif and friends).

As concrete examples, Tidy will strip an empty <div> that's later
intended for use from DHTML. It will also strip empty <div>s that are
intended as hooks for CSS background images (commonplace rounded box
code). Both of these are valid, useful and generally commendable
practices which Tidy will break.

but in the real world there are no arguments why the above type of code
shouldn't be used if used by a skilled coder.


Skilled? It's not Tidy's place to be making value judgements like
that. There's enough cruft out there in the provably invalid, without
it needing to start on the simply non-mainstream.

Aug 30 '05 #6
In our last episode,
<KM*****************************************@gigan ews.com>,
the lovely and talented Jim Moe
broadcast on comp.infosystems.www.authoring.html:
Stefan Weiss wrote:

The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.

The actual phrasing is:
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like XHTML 1.0 Transitional
That means there is some non-strict syntax used in the document
somewhere (tidy wouldn't want so say where, of course). The W3C validator
is much better at identifying specific problems.


In point of fact, Tidy lies. And it will change the Doctype to
something wrong without asking permission. It often identifies
documents as "proprietary" when in fact onsgmls says they
validate with the advertised standard Doctype. If you use tidy,
I'd advise you to run documents through it without a Doctype,
stream the output to an untidy script to stick the correct
Doctype on it, and break lines in a way to cater to broken
browsers like IE, then stream it through onsgmls or a similar
real validator

Here is one of the untidies I use (this for 4.01 loose:
(change path in linux)

***********************
#!/usr/local/bin/perl

$| = "flush";

while(<STDIN>){
$_ =~ s#<html#<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"\n "http://www.w3.org/TR/html4/loose.dtd">\n\n<html#;
$_ =~ s/^\n//g;
$_ =~ s/li>\s*/li\n>/g; # Mostly to keep IE from breaking
# because IE can't do lists right.
$_ =~ s/ul>\s*/ul\n>/g;
$_ =~ s$/ul>$/ul\n>$g;
$_ =~ s/>&nbsp\;</></g; # Takes out nbsp used to keep empty
# elements
$_ =~ s/<ul><li>/<ul\n><li\n>/g;
$_ =~ s#</li><li>#</li\n><li\n>#g;
print STDOUT;
}
**************************

--
Lars Eighner ei*****@io.com http://www.larseighner.com/
"Fascism should more properly be called corporatism, since it is the
merger of state and corporate power."-Benito Mussolini * When you write the
check to pay your taxes, remember there are two l's in "Halliburton."
Aug 30 '05 #7
On 2005-08-29 19:50, Spartanicus wrote:
Tidy is a linter that has it's own rules about how markup should be, no
relation to a validator that checks against the DTD.


Point taken. I was not aware that Tidy is not a validator, but you are
right of course. I mainly use it because it is convenient for console
use, and because there is a nifty Firefox extension called "HTML
Validator (based on Tidy)" that will automatically check each page as it
is loaded, and display the results in the status bar. Tidy also offers
some accessibility hints.

The console aspect was easily remedied by a short Perl script that
connects to validator.w3.org, and I guess I'll have to live with Tidy's
quirks. At least the Firefox extension has an option to ignore certain
warnings.
<select name="x">
<option value=""></option>
<option value="foo">a foo</option>
<option value="bar">a bar</option>
</select>

[tidy] Warning: "trimming empty <option>"


That seems reasonable, the empty option should be removed.


No it shouldn't, that would change the contents of the form:
<option value="42" selected="selected"></option>
I don't know how I could avoid this warning. I don't want to display
any text in the first option


Stop wanting that.


Why? Empty options are valid and useful. I won't run into any problems
with them unless I let Tidy "clean up" my HTML.
cheers,
stefan
Aug 31 '05 #8
Stefan Weiss <sp******@foo.at> wrote:
<select name="x">
<option value=""></option>
<option value="foo">a foo</option>
<option value="bar">a bar</option>
</select>

[tidy] Warning: "trimming empty <option>"
That seems reasonable, the empty option should be removed.


No it shouldn't, that would change the contents of the form:
<option value="42" selected="selected"></option>


That's not the example you provided.
I don't know how I could avoid this warning. I don't want to display
any text in the first option


Stop wanting that.


Why? Empty options are valid


Amongst many other non sensible code constructs.
and useful.


Explain the purpose of <option value=""></option>

Btw, my copy of Tidy does not remove the above construct.

--
Spartanicus
Aug 31 '05 #9
On 2005-08-31 16:58, Spartanicus wrote:
<option value="42" selected="selected"></option> That's not the example you provided.


No, the previous example was quoted above that. This line was intended
to demonstrate how the contents of the form would be changed by removing
empty <option> tags. Tidy strips the tag in both cases.
Explain the purpose of <option value=""></option>
Any optional <select> field that you wish to leave blank (because you
don't have the information yet, or because none of the options apply).
You could have the first option contain "(none)" or "N/A", but that
doesn't add anything except noise.

If you mean that value="" is redundant if the tag does not contain any
text, I agree, but that doesn't have anything to do with Tidy removing
the tag completely.
Btw, my copy of Tidy does not remove the above construct.


Interesting. Mine does, and it's relatively recent ("HTML Tidy for
Linux/x86 released on 12 April 2005").
cheers,
stefan
Aug 31 '05 #10
Stefan Weiss <sp******@foo.at> wrote:
Explain the purpose of <option value=""></option>


Any optional <select> field that you wish to leave blank (because you
don't have the information yet, or because none of the options apply).


Leave it out, it serves no purpose to the user.

--
Spartanicus
Aug 31 '05 #11
On Wed, 31 Aug 2005, Spartanicus wrote:
Stefan Weiss <sp******@foo.at> wrote:
I don't know how I could avoid this warning. I don't want to display
any text in the first option

Stop wanting that.
Why? Empty options are valid

[...] Explain the purpose of <option value=""></option>


I'd rather *you* explained to us (preferably citing some peer-reviewed
guideline etc.) just what you reckon is wrong with what the poster
wanted.

It looks to me like an empty option. Myself, I suppose there are
quite a number of situations where an empty option would make sense,
and I'm still puzzled why you are so vociferous about rejecting it,
apparently "out of hand".

For example, a journey planner, for an outward journey and optional
return journey, has SELECT widgets to choose the return time (the hour
and the minute, in increments of 15 minutes): leaving the time blank
(which is the form's default) means that one does not require a return
journey to be planned. Seems reasonable to me.

I presume you wanted the empty value replaced by some text - whatever
might be appropriate in context - stating that this is an empty or
do-nothing option, with corresponding code in the server-side process
to deal with it and treat it as an empty input. But I'd like to know
why you are so insistent on this, if an empty input is meaningful in
its context, and can be supplied in such an obvious way.

thanks
Aug 31 '05 #12
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
and I'm still puzzled why you are so vociferous about rejecting it,
apparently "out of hand".
I suggest that you add me to your kill file given that you increasingly
seem to have trouble objectively reacting to the content of my posts.
For example, a journey planner, for an outward journey and optional
return journey, has SELECT widgets to choose the return time (the hour
and the minute, in increments of 15 minutes): leaving the time blank
(which is the form's default) means that one does not require a return
journey to be planned. Seems reasonable to me.


Using the above context it should read "not applicable", "single
journey" or some such, leaving it empty fails to convey this to the
user.

--
Spartanicus
Aug 31 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: vega | last post by:
How do I detect empty tags if I have the DOM document? For example: <br /> and <br></br> I tried org.w3c.dom.Node.getFirstChild(), it returns null for both <br /> and <br></br> I also tried...
0
by: heatherrai | last post by:
Hi Can anyone help me create empty tags using a schema in XMLSpy please? I have a definition for a complex element <book> which has children <author>, <date> etc. but I am just using...
2
by: nospam | last post by:
I'm trying to do a transform on some xml using the XslTransform class. The input xml contains an empty element in short format like this... <element attrib="abc"/> But during the transform...
1
by: Piper707 | last post by:
Hi, I'd like to know if there are any more ways of restricting an XML document to having only non-empty tags (containing Strings). I can think of 2 ways: 1) <xs:simpleType name="tagName">
5
by: wolf_y | last post by:
My question is simply: under what conditions will empty tags of the form <MOM></MOM> pass schema validation? Of course, the mirror question is: under what conditions will empty tags fail...
11
by: David | last post by:
Hi All, I am working on a script that is theoreticaly simple but I can not get it to work completely. I am dealing with a page spit out by .NET that leaves empty tags in the markup. I need a...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.