473,738 Members | 8,848 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to differentiate between <XX></XX> and <XX/> with SAX

Is there a simple and determinist way to make the difference
between the 2 sequences:

<XX></XX>

and

<XX/>

The EndElement callback does not provide this information.

Thanks,
Pascal.
Jul 20 '05 #1
13 2499
dp*****@yahoo.f r wrote:
Is there a simple and determinist way to make the difference
between the 2 sequences:

<XX></XX>

and

<XX/>
No. Their meaning is exactly the same. Why do you think you need that?
The EndElement callback does not provide this information.

Jul 20 '05 #2
Rolf Magnus wrote:
dp*****@yahoo.f r wrote:

Is there a simple and determinist way to make the difference
between the 2 sequences:

<XX></XX>

and

<XX/>

No. Their meaning is exactly the same. Why do you think you need that?


Doesn't the first sample have an empty text() node as first child, and
the second doesn't ?

Franck,e-

The EndElement callback does not provide this information.


Jul 20 '05 #3
In article <41************ ***********@new s.free.fr>,
Franck Guillaud <f_************ **@free.fr> wrote:
<XX></XX>

<XX/>
Doesn't the first sample have an empty text() node as first child, and
the second doesn't ?


No.

(XML itself doesn't define any such thing as a "text node". The
Infoset has character information items, and there aren't any of them
in either case. The XPath data model doesn't have a text node in either
case, and SAX parsers do not call the characters method.)

-- Richard
Jul 20 '05 #4
Rolf Magnus wrote:

No. Their meaning is exactly the same. Why do you think you need that?


Ok, everywhere, I read that they are the same.
But this is only true for XML, not for HTML, and even it if was
true for HTML, it is still not true due to the way browsers interpret it.

What I need is to parse manually written HTML.
In HTML, <BR/> is interpreted differently than <BR></BR>.

So, I have to basic reasons to do this:

- I need it, the parser must make the difference, because
it must ouput tag that it does not process like they were entered
in order for the ouput to be correctly interpreted.

- Even if it was not needed due to a technical reason, if the
developper who wrote the HTML page decided that it is <XX/>, i
prefer to output <XX/> rather than the other form. So that the
developper can easily read the output of my program, and do not have
to wonder about some "strange" conversion.

Summary;

We do no live in a perfect world, with perfect standard perfectly
implemented by perfect developper. So we need a "stable" way to
do the difference. I like standards very much (I have a networking
background, you know ISO, IETF, IEEE, ATM FORUM, FR FORUIM, EIA, etc etc
....), but I live in a non standard world. I must adapt to survive :-)

Thanks for your help.
Pascal.

Jul 20 '05 #5

Ok, everywhere, I read that they are the same.
But this is only true for XML, not for HTML, and even it if was
true for HTML, it is still not true due to the way browsers interpret it.
well for HTML (but this is after all an XML newsgroup) the situation is
completely different.
<BR/> and <BR></BR>
are _both_ syntax errors ( /> is always a syntax error in HTML, and BR
has no end tag as it is declared EMPTY in the HTML DTD, so </BR> is also
an error)

Of course a browser may or may not have some lax silent error recovery
from either of these situtations but in any case the behaviour will be
browser specific.

- Even if it was not needed due to a technical reason, if the
developper who wrote the HTML page decided that it is <XX/>, i
prefer to output <XX/> rather than the other form.


So long as you are clearly writing HTML rather than XML there's nothing
wrong with you doing that. XSLT for example, if writing html can not
distinguish the inputs of <BR/> and <BR></BR> as the input is XML and
these are the same, but in either case an "identity" transform will
produce the HTML syntax
<BR>
if the html output method is being used (which it is by default if the
top level output element is <html>.

David

Jul 20 '05 #6
Pascal Dufour wrote:
Rolf Magnus wrote:
>
> No. Their meaning is exactly the same. Why do you think you need that?
>
Ok, everywhere, I read that they are the same.
But this is only true for XML, not for HTML, and even it if was
true for HTML, it is still not true due to the way browsers interpret it.

What I need is to parse manually written HTML.
In HTML, <BR/> is interpreted differently than <BR></BR>.


you can't parse html with an xml parser ; however, you can parse html
with an sgml parser ; additionally, you can use a tool that converts
html in xml (with best effort), like Cyber Neko HTML Parser
http://www.apache.org/%7Eandyc/neko/doc/html/

So, I have to basic reasons to do this:

- I need it, the parser must make the difference, because
it must ouput tag that it does not process like they were entered
in order for the ouput to be correctly interpreted.
there's something quite confusing : you're talking about parsing like
outputing ; these 2 processes are totally opposite : parsing gives
access to a data model, and serializing (i prefer this term) renders
this data model to an xml characters form (file, char flow...)

you can't act on the xml data model because it is governed by a set of
stable specifications, but you can act on the serialization ; for this
purpose, formatter tools often provide a set of options that allow to
tune the output ; you can also write your own formatter

- Even if it was not needed due to a technical reason, if the
developper who wrote the HTML page decided that it is <XX/>, i
prefer to output <XX/> rather than the other form. So that the
developper can easily read the output of my program, and do not have
to wonder about some "strange" conversion.

Summary;

We do no live in a perfect world, with perfect standard perfectly
implemented by perfect developper. So we need a "stable" way to
do the difference. I like standards very much (I have a networking
background, you know ISO, IETF, IEEE, ATM FORUM, FR FORUIM, EIA, etc etc
...), but I live in a non standard world. I must adapt to survive :-)

Thanks for your help.
Pascal.

--
Cordialement,

///
(. .)
-----ooO--(_)--Ooo-----
| Philippe Poulard |
-----------------------
Jul 20 '05 #7
David Carlisle <da****@nag.co. uk> writes:
XSLT for example, if writing html can not
distinguish the inputs of <BR/> and <BR></BR> as the input is XML and
these are the same,


Actually, in XML, the notion "element" is not an abstract one,
but a concrete non-terminal symbol of the syntax.

Therefore, as elements, the element "<br/>" and the element
"<br></br>" are two /different/ elements, just as "<br/>" also
is a different element than "<br />".

You might say, that they have the same element type, the same
contents and the same number, names and value of attributes
(here: none). Or, possibly, that they have the same
"infoset", but the infoset specification is not part of the
XML specification.
Jul 20 '05 #8
In article <yg************ *@penguin.nag.c o.uk>,
David Carlisle <da****@nag.co. uk> wrote:

% are _both_ syntax errors ( /> is always a syntax error in HTML, and BR

Actually, it's not, although its meaning is not the same as in XML. <br />
means the same as <br>>.
--

Patrick TJ McPhee
East York Canada
pt**@interlog.c om
Jul 20 '05 #9
Actually, it's not, although its meaning is not the same as in XML. <br />
means the same as <br>>.


Ooops sorry I was thinking that was turned off in HTML's SGML decl, but
apparently not. Still (most:-) of my point holds, in fact that means
that the situation is worse than I indicated: if you rely on <br/>
working in the browser after sending the file with an html mime type you
are not just relying on lax error recovery, you are relying on
non-conformant HTML parsing.
David
Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
6827
by: Christian Schmidbauer | last post by:
Hello! I prepare my XML document like this way: ------------------------------------------------------- PrintWriter writer; Document domDocument; Element domElement; // Root tag
13
40755
by: Dan R Brown | last post by:
I have a large form that is generated dynamically in a jsp using xml / xslt. So, to break up this form into several "tabbed" sections, I break up the form using <div> tags. Each <div style="display:none"> can be displayed by setting the style attribute to "display:", or hidden with "display:none". This gives the illusion that the person filling out the form is switching from page to page...without the overhead of extra hits on the server,...
59
5215
by: Haines Brown | last post by:
I've not had a clear definition in my mind of "citation," and so have avoided it. For example, if I suggest that the reputation of the New York Times has suffered, is that a citation? I suppose it really is and I should shift to <cite> and give up the use oif a custom CSS tag such as a "title" class. However, that also brings up a question. What is the proper format for a citation? I've seen it underlined (Chicago, I guess), italicized,...
4
11913
by: bengee | last post by:
Hi First off - by the word "anchor" i DON'T mean a link, i.e. <a></a> tags I'm trying to position a <select> box inside a <div>. I can use relative positioning to set where the box should appear using "left: ?px; top: ?px;". What i'd like to do though, is set the box so that it's dead horizontally center inside the <div> (don't forget that the width of the select box can change depending on what options i have listed in it).
2
2789
by: Buck Turgidson | last post by:
I want to have a css with 2 PRE styles, one bold with large font, and another non-bold and smaller font. I am new to CSS (and not exactly an expert in HTML, for that matter). Is there a way to do this in CSS? <STYLE TYPE="text/css"> pre{ font-size:xx-large;
11
3319
by: Jamie Burns | last post by:
Hello, I just did a simple benchmark: for (xx=0;xx<100000;xx++) { rDerived* derived = dynamic_cast<rDerived*>(object); if (derived) derived->setValue(message.data.messageSetInt.value); } against:
23
16141
by: Loony | last post by:
I have got a code like this in HTML section in ASP file which includes javascript file! The script works under MS IE but doesn't with Firefox! Can anybody tell me what is wrong? <HTML> <HEAD><TITLE></TITLE> <SCRIPT LANGUAGE="JavaScript" SRC="../inc/JSfile.js"><SCRIPT> <SCRIPT> <!-- other javascript scripts working propely
11
4510
by: Richard Maher | last post by:
Hi, I have read many of the copius entries on the subject of IE performance (or the lack thereof) when populating Select Lists. I don't mind the insert performance so much, (I get 100x120byte rows inserted/sec up to 500, and 100rows/6secs up to 3000, which isn't great but then the Row Count is clicking away for the user to see and they can hit the "cancel" button at anytime, so overall I'm happy), what really disappoints me is the...
4
1801
MrPickle
by: MrPickle | last post by:
What's the difference between the two? They both appear to do the same thing but if they both just did the same thing then why have 2 things to do 1 job?
0
8969
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8788
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9476
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9208
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6053
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4825
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3279
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2745
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2193
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.