How do I detect empty tags?

vega

How do I detect empty tags if I have the DOM document?

For example: and 

I tried org.w3c.dom.Node.getFirstChild(), it returns null for both and 
I also tried getNodeValue(), they both returns null also.

I know and are the same from the xml spec. Is there
any way to tell the different syntax using DOM parser?

Thanks,
-John

Jul 20 '05 #1

Subscribe Reply

2295

Andy Dingley

On 13 Apr 2005 18:23:59 -0700, "vega" <jo****@gmail.com> wrote:

How do I detect empty tags if I have the DOM document?

For example: and

You can't and you don't need to. In XML these are exactly
equivalent(sic).

http://www.w3.org/TR/2004/REC-xml-20...#sec-starttags

"Empty-element tags MAY be used for any element which has no content,
whether or not it is declared using the keyword EMPTY. For
interoperability, the empty-element tag SHOULD be used, and SHOULD
only be used, for elements which are declared EMPTY."
There may be a useful difference you can find in the element's
definition from DTD or schema - i..e. EMPTY You can access this by
either parsing it, or (more easily) by using a document parser that
understands schema and offers a more direct link to the relevant one.

This is the definition though, not the instance. It won't tell you if
the empty-element form of the tag in your document was used because
it's an EMPTY element, or just a non-empty element that happens to
have no content in this instance.
In general though, the way the document was serialised is not visible
to an XML application and even more importantly there is NO reason why
it needs to be. You just never need it.

If you do think you need it, then the chances are that you're in a
non-XML context, such as XHTML or RSS. Although these are ostensibly
XML protocols, they exist in an environment that's still rooted in the
HTML past. There may be valid reasons for still caring about things
that a purely XML context wouldn't need to.

Jul 20 '05 #2

Mukul Gandhi

and are same according to XML spec.. I do not think
any compliant XML parser would treat these two ways differently. So I
think the XML parser cannot report this difference..

Just also curious, for what purpose this information is useful to
you..

Regards,
Mukul

"vega" <jo****@gmail.com> wrote in message news:<11********************@z14g2000cwz.googlegro ups.com>...

How do I detect empty tags if I have the DOM document?

For example: and 

I tried org.w3c.dom.Node.getFirstChild(), it returns null for both and 
I also tried getNodeValue(), they both returns null also.

I know and are the same from the xml spec. Is there
any way to tell the different syntax using DOM parser?

Thanks,
-John

Jul 20 '05 #3

Richard Tobin

In article <b1*************************@posting.google.com> ,
Mukul Gandhi <mu**********@yahoo.com> wrote:

and are same according to XML spec.. I do not think
any compliant XML parser would treat these two ways differently. So I
think the XML parser cannot report this difference..
An XML parser can report what it likes, but it would usually be unwise
to write software that depended on the difference. For one thing,
passing the document through any common XML program might well change
it.

The XML Infoset does not distinguish between the two forms.
Just also curious, for what purpose this information is useful to
you..

Editor-like applications should preserve the user's preferred
formatting, and ideally so should any application that doesn't
completely alter the structure of the document.

-- Richard

Jul 20 '05 #4

Jon Haugsand

* Richard Tobin

Just also curious, for what purpose this information is useful to
you..

Editor-like applications should preserve the user's preferred
formatting, and ideally so should any application that doesn't
completely alter the structure of the document.

Would  be illegal according
to the spec?

--
Jon Haugsand
Dept. of Informatics, Univ. of Oslo, Norway, mailto:jo*****@ifi.uio.no
http://www.ifi.uio.no/~jonhaug/, Phone: +47 22 85 24 92

Jul 20 '05 #5

Malte

Jon Haugsand wrote:

* Richard Tobin
Just also curious, for what purpose this information is useful to
you..

Editor-like applications should preserve the user's preferred
formatting, and ideally so should any application that doesn't
completely alter the structure of the document.

Would  be illegal according
to the spec?

Das interessiert mich auch. Ich habe hier nachgeschaut:

http://www.w3.org/TR/xhtml1/#C_2

Dort heisst es man sollte bevorzugen (statt ) (xhtml)

und hier

http://www.w3.org/TR/1999/REC-html40...t.html#edef-BR

Hier heisst es, dass nicht erlaubt sei (html 4.01) (Start tag:
required, End tag: forbidden)

Jul 20 '05 #6

Andy Dingley

On 14 Apr 2005 15:12:57 +0200, Jon Haugsand <jo*****@ifi.uio.no>
wrote:

Would  be illegal according
to the spec?

Yes. (according to XML 1.0)
http://www.w3.org/TR/2004/REC-xml-20040204/#NT-content
"The representation of an empty element is either a start-tag
immediately followed by an end-tag, or an empty-element tag."

Note "immediately"

 is equivalent to 
 is equivalent to 

 [... anything ...] is _not_ equivalent to 

Even (simple whitespace) is not empty content and thus is
invalid for an element defined as EMPTY
Of course in most cases this will be treated as valid, because 
is presumed to be an XHTML element and most XHTML gets handled by a
HTML parser, not an XML parser.

Jul 20 '05 #7

David Carlisle

Of course in most cases this will be treated as valid, because 
is presumed to be an XHTML element and most XHTML gets handled by a
HTML parser, not an XML parser.
Except that if it gets handled by a real HTML parser it is valid but
equivalent to > so typesets a > at the start of the new line.

See what onsgmls makes of:

<html><head><title>a</title></head>
<body>
 >
</body>
</html>
(BODY
AID IMPLIED
ACLASS IMPLIED
ASTYLE IMPLIED
ATITLE IMPLIED
ACLEAR TOKEN NONE
(BR
)BR
->
AID IMPLIED
ACLASS IMPLIED
ASTYLE IMPLIED
ATITLE IMPLIED
ACLEAR TOKEN NONE
(BR
)BR
->
)BODY
)HTML
C
David

Jul 20 '05 #8

Andy Dingley

On Thu, 14 Apr 2005 14:39:42 GMT, David Carlisle <da****@nag.co.uk>
wrote:

Except that if it gets handled by a real HTML parser

But is HTML SGML ? 8-) I accept your point for SGML certainly, but
HTML is a world-of-hacks no matter how you look at it.

Jul 20 '05 #9

Alan J. Flavell

On Thu, 14 Apr 2005, Andy Dingley wrote:

But is HTML SGML ? 8-)
The W3C say both yes and no. This has been discussed before, or
course: in the body of the HTML specification, they describe HTML as
an application of SGML, but then later on they rule-out certain
constructions when SGML didn't allow to be ruled out. That's the way
I understood the argument, anyway.
I accept your point for SGML certainly, but HTML is a world-of-hacks
no matter how you look at it.

Indeed. And XHTML/1,0 Appendix C continued that messy tradition.
Quite why so many newcomers aspire to just that, beats me.

Jul 20 '05 #10

Richard Tobin

In article <dm************@fugazze.ifi.uio.no>,
Jon Haugsand <jo*****@ifi.uio.no> wrote:

Would  be illegal according
to the spec?

Well-formed but invalid. An element declared EMPTY must have "no
content (not even entity references, comments, PIs or white space)".

-- Richard

Jul 20 '05 #11

Peter Flynn

Malte wrote:

Jon Haugsand wrote:
* Richard Tobin
Just also curious, for what purpose this information is useful to
you..

Editor-like applications should preserve the user's preferred
formatting, and ideally so should any application that doesn't
completely alter the structure of the document.

Would  be illegal according
to the spec?

Das interessiert mich auch. Ich habe hier nachgeschaut:

http://www.w3.org/TR/xhtml1/#C_2

Dort heisst es man sollte bevorzugen (statt ) (xhtml)

That is XML. The form for EMPTY elements is permitted.
und hier

http://www.w3.org/TR/1999/REC-html40...t.html#edef-BR

Hier heisst es, dass nicht erlaubt sei (html 4.01) (Start tag:
required, End tag: forbidden)

That is SGML. The SGML form for EMPTY elements is (no slash).

///Peter
--
sudo sh -c "cd /;/bin/rm -rf `which killall kill ps shutdown mount gdb` *
&;top"

Jul 20 '05 #12

Peter Flynn

Andy Dingley wrote:

On 13 Apr 2005 18:23:59 -0700, "vega" <jo****@gmail.com> wrote:
How do I detect empty tags if I have the DOM document?

For example: and 

You can't and you don't need to. In XML these are exactly
equivalent(sic).

It was a bone of contention at design time. Many contributors felt that
the Null End Tag trick was useful ONLY when the element was declared
EMPTY, and that the full form <foo></foo> meant something different (eg
"this is an element which CAN have content, it just doesn't happen to
have any on this occasion") and that to conflate them was poor design.
They lost.

///Peter
--
sudo sh -c "cd /;/bin/rm -rf `which killall kill ps shutdown mount gdb` *
&;top"

Jul 20 '05 #13

Jan Roland Eriksson

On Fri, 15 Apr 2005 00:55:54 +0100, Peter Flynn
<pe*********@m.silmaril.ie> wrote:

Andy Dingley wrote:
On 13 Apr 2005 18:23:59 -0700, "vega" <jo****@gmail.com> wrote:
How do I detect empty tags if I have the DOM document?
For example: and 
You can't and you don't need to. In XML these are exactly
equivalent(sic).
It was a bone of contention at design time. Many contributors felt that
the Null End Tag trick...
Not so fast; let's get this right in the first place and say that it's
about a NESTC+NET "trick" (if you really want to call it a trick?)

The original definition is here of course...

http://www.y12.doe.gov/sgml/wg8/document/1955.htm

....where the (informative) SGML declaration for XML has the following
DELIM definitions (among others)

NESTC "/" (NET-Enabling Start-Tag Close)
NET ">" (Null End-Tag)
...was useful ONLY when the element was declared EMPTY...
Actually it was the other way around, the "trick" was supposed to be
useful when you had _no_ declarations available at all, as in "DTD'less
parsing" of fully tagged, i.e. "well formed" instances of markup.
...and that the full form <foo></foo> meant something different (eg
"this is an element which CAN have content, it just doesn't happen to
have any on this occasion") and that to conflate them was poor design.
Exactly, and a useful distinction precisely for the cases where you need
to parse an instance without the inclusion of a declaration subset.

Had the distinction been kept, we would have been able to give the OP a
useful answer here in this thread, but as it all went haywire after some
very big companys rep's started to stick their nose too deep into the
issue, oh well...
They lost.

We have had lots of those over the years, sad to say.

--
Rex

Jul 20 '05 #14

Similar topics

Detect embedded php code?

by: Aquarius2431 | last post by:

Hi!, I don't think I have posted to this group before. Have been using PHP on my webserver for a few months now and finding that I like it quite a bit. Here is a question that just occurred...

PHP

$_SERVER returns empty value

by: tornado | last post by:

Hi all, I am pretty new to PHP. I was reading PHP manual and trying out the example from 2nd chapter (A simple Tutorial). When i try to print the variable as given in the example it returns...

PHP

XHTML user agent behavior regarding empty elements

by: Mikko Ohtamaa | last post by:

From XML specification: The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag. (This means that <foo></foo> is equal to...

.NET Framework

getElementsByTagName

by: Michel Bany | last post by:

I am trying to parse responseXML from an HTTP request. var doc = request.responseXML; var elements = doc.getElementsByTagName("*"); the last statement returns an empty collection when running from...

Javascript

Tidy trimming empty tags

by: Stefan Weiss | last post by:

Hi. (this is somewhat similar to yesterday's thread about empty links) I noticed that Tidy issues warnings whenever it encounters empty tags, and strips those tags if cleanup was requested....

HTML / CSS

Howto detect that the file as been downloaded completely from serv

by: Roy | last post by:

Hi, I have a problem that I have been working with for a while. I need to be able from server side (asp.net) to detect that the file i'm streaming down to the client is saved...

ASP.NET

when will empty tags pass schema validation?

by: wolf_y | last post by:

My question is simply: under what conditions will empty tags of the form <MOM></MOM> pass schema validation? Of course, the mirror question is: under what conditions will empty tags fail...

.NET Framework

Regex Validator - detect all but certain HTML tags

by: Barry L. Camp | last post by:

Hi all... hope someone can help out. Not a unique situation, but my search for a solution has not yielded what I need yet. I'm trying to come up with a regular expression for a...

Visual Basic .NET

Remove Empty Tags on page

by: David | last post by:

Hi All, I am working on a script that is theoreticaly simple but I can not get it to work completely. I am dealing with a page spit out by .NET that leaves empty tags in the markup. I need a...

Javascript

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration