471,122 Members | 1,180 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,122 software developers and data experts.

?? Firefox 1.0.7 Chokes on   ??

Hi all,

I have a very simple page that Firefox has problems with:
www.absolutejava.com/testing.htm

First of all, this page seems to be perfectly valid XHTML Strict. Both
the W3C validator as well as Page Valet indicate it is valid. The page
is being served with the proper MIME type of "application/xhtml+xml".

Unfortunately, Firefox will not display the page due to a non-breaking
space ( ) in the XHTML. I am using a local, custom version of the
XHTML DTD as well.

Let me explain the custom DTD issue quickly -- the DTD I'm using is
*EXACTLY* the same as the XHTML Strict DTD found at
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd, except that I
allowed for a 'target' attribute on the anchor (<a>) element. But
that's not what's causing the problem. The problem is coming from the
"Character mnemonic entities" that are imported by my custom DTD. But
these are the same mnemonic entities that are imported by the real
XHTML Strict DTD, so it should be okay.

For some reason, it seems Firefox is ignoring these mnemonic entities,
such as &nbsp;. Does anyone have an idea why Firefox will not respect
the character entities imported by my DTD? Please look at the XHTML
code as well as the DTD, if you can.

There are three "Character mnemonic entities" imported by the DTD
(xhtml-lat1.ent, xhtml-symbol.ent and xhtml-special.ent), each of which
I have copied into my own DTD directory:

<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"xhtml-special.ent">
%HTMLspecial;

Oct 14 '05 #1
13 6077
rbronson1976 wrote:
a non-breaking space (&nbsp;) in the XHTML.
You obviously know what's wrong here, so why are you asking us?

First of all, this page seems to be perfectly valid XHTML Strict.
It isn't.
Both
the W3C validator as well as Page Valet indicate it is valid.
They're not XHTML validators, they're SGML validators. So they give
misleading results in this case.
The page
is being served with the proper MIME type of "application/xhtml+xml".
That's a bad idea - serve it as text/html if you actually want it to
work on the web. The web is ready for XHTML, but it's not yet ready for
XHTML served as application/xhtml+xml, even if that is more "proper".

Unfortunately, Firefox will not display the page due to
That's because it's not a well-formed page.

I am using a local, custom version of the XHTML DTD as well.


One would wonder why? And why you'd expect this to stand any chance of
working on the web?

Oct 14 '05 #2
rbronson1976 wrote:
I have a very simple page that Firefox has problems with:
www.absolutejava.com/testing.htm

First of all, this page seems to be perfectly valid XHTML Strict.
It isn't XHTML. The XHTML specification requires the use of one of three
DOCTYPE declarations, and yours isn't one of them.

It is a valid XML document, though.
Both
the W3C validator as well as Page Valet indicate it is valid.
The W3C validator says:
"This Page Is Valid -//ABSJAVA//DTD XHTML 1.0 Strict With Target//EN!"
That's pointless and misleading babble, since it just picks up a string
from the DOCTYPE declaration. It _should_ simply say that the document
is a valid XML document.
The page
is being served with the proper MIME type of "application/xhtml+xml".
It isn't a proper type, since the document isn't XHTML.
Unfortunately, Firefox will not display the page due to a non-breaking
space (&nbsp;) in the XHTML. I am using a local, custom version of the
XHTML DTD as well. - - There are three "Character mnemonic entities" imported by the DTD
(xhtml-lat1.ent, xhtml-symbol.ent and xhtml-special.ent), each of which
I have copied into my own DTD directory:


I'm afraid there are limitations in the way Firefox handles external
declarations. Besides, XML browsers aren't really required to read such
declarations.

But instead of wondering what goes wrong and how it might be fixed, why
don't you simply stop using entity references? Since you are using
UTF-8, you don't need the entities, except perhaps due to flaws in the
authoring software. You can simply write the no-break space as such. It
even saves space (two octets vs. six). If you really can't do that, you
can use the character entity reference &#xa0;.
Oct 14 '05 #3
Andy Dingley wrote:
rbronson1976 wrote:
a non-breaking space (&nbsp;) in the XHTML.
You obviously know what's wrong here, so why are you asking us?


Because using a &nbsp; *should* work. As should @copy;.
First of all, this page seems to be perfectly valid XHTML Strict.
It isn't.


Okay, technically, it's not XHTML Strict because I am referencing a
custom DTD that has been tweaked (I added a 'target' attribute to the
<a> element). But the page does not even make use of the tweak I made.
For all intents and purposes I could have copied the XHTML Strict DTD
to my local server verbatim (without the tweak) and I would have
gotten the same error.

The problem seems to be rooted in the fact that Firefox does not
resolve the external entities correctly, or at all, when it encounters
a custom DTD.
Both
the W3C validator as well as Page Valet indicate it is valid.


They're not XHTML validators, they're SGML validators. So they give
misleading results in this case.


Well, my understanding is that they will validate in XML mode when
served with an XML content-type, as I have (application/xhtml+xml).
The page
is being served with the proper MIME type of "application/xhtml+xml".


That's a bad idea - serve it as text/html if you actually want it to
work on the web. The web is ready for XHTML, but it's not yet ready for
XHTML served as application/xhtml+xml, even if that is more "proper".


I serve it as 'application/xhtml+xml' to get the validators into XML
mode. Also, Firefox handles the more proper content-type of
'application/xhtml+xml' perfectly fine. Eventually, I will switch back
to 'text/html', as a concession to IE 6.
Unfortunately, Firefox will not display the page due to


That's because it's not a well-formed page.


Sure it is. Tell me what's not well formed about it.
I am using a local, custom version of the XHTML DTD as well.


One would wonder why? And why you'd expect this to stand any chance of
working on the web?


Yes, Of course I would. The content type ('application/xhtml+xml') is
telling Firefox how to render it.

Oct 14 '05 #4
Jukka K. Korpela wrote:
rbronson1976 wrote:
Unfortunately, Firefox will not display the page due to a non-breaking
space (&nbsp;) in the XHTML. I am using a local, custom version of the
XHTML DTD as well.

The problem is that Gecko doesn't actually read the DTDs for XML
languages because it uses a non-validating XML parser. The reason
entity references actually work for XHTML 1.0 and 1.1, is because,
rather than reading the actual DTDs, it recognises the DOCTYPE
declarations and includes a pseudo-DTD catalog containing just the
entity declarations.

Because it doesn't recognise your DOCTYPE declaration, it doesn't know
that it should include that pseudo-DTD catalog and thus does not
recognise any of the entity references.

In text/html, browsers also don't read the DTD, but in the spirit of
accepting any rubbish you throw at them, they will understand entity
references regardless of the DOCTYPE you include or omit.

From the Mozilla Web Developer FAQ:
http://www.mozilla.org/docs/web-deve...html#xhtmldiff

In older versions of Mozilla as well as in old Mozilla-based
products, there is no pseudo-DTD catalog and the use of
externally defined character entities (other than the five
pre-defined ones) leads to an XML parsing error. There are
also other XHTML user agents that do not support externally
defined character entities (other than the five pre-defined
ones). Since non-validating XML processors are not required
to support externally defined character entities (other than
the five pre-defined ones), the use of externally defined
character entities (other than the five pre-defined ones)
is inherently unsafe in XML documents intended for the Web.
The best practice is to use straight UTF-8 instead of entities.
(Numeric character references are safe, too.)
Since you are using UTF-8, you don't need the entities, except perhaps due to flaws in the
authoring software. You can simply write the no-break space as such. It
even saves space (two octets vs. six). If you really can't do that, you
can use the character entity reference &#xa0;.


For most characters, encoding them in UTF-8 is usually the best option.
However, personally, I find it convenient to encode characters like
no-break space using numeric or hex character references since it helps
to distinguish it from a regular space in the source code.

Also, if you use a web-based CMS that accepts (X)HTML as input through a
textarea, then that may be the only way to include no-break spaces.
Firefox, at least, will convert any no-break spaces to regular spaces,
so a character reference is the only way to get it through in such cases.

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://GetThunderbird.com/ Reclaim your Inbox
Oct 15 '05 #5
Lachlan Hunt <sp***********@gmail.com> wrote:
The problem is that Gecko doesn't actually read the DTDs for XML
languages because it uses a non-validating XML parser.
And this implies a significant difference between an XHTML document that
conforms to the XHTML specification (including the requirement to use a
specific DOCTYPE, rather than just a specific DTD) and an XML document that
might seem to be XHTML for all practical purposes but isn't.
For most characters, encoding them in UTF-8 is usually the best option.
However, personally, I find it convenient to encode characters like
no-break space using numeric or hex character references since it helps
to distinguish it from a regular space in the source code.
I don't find &#a0; or   particularly readable. This is really an
authoring tool issue: the tools should have a function like MS Word for
showing a space as different from a no-break space. Mostly when reading
HTML (or XML source) and working with it, it's best to have no-break spaces
shown as spaces, optionally changeable to suitable indicator symbols.
Also, if you use a web-based CMS that accepts (X)HTML as input through
a textarea, then that may be the only way to include no-break spaces.
Firefox, at least, will convert any no-break spaces to regular spaces,
so a character reference is the only way to get it through in such
cases.


That sounds bad. Isn't it a direct violation of the specifications to
change input data that way?

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Oct 15 '05 #6
rbronson1976 wrote:
The problem seems to be rooted in the fact that Firefox does not
resolve the external entities correctly, or at all, when it encounters
a custom DTD.


Indeed -- but it's not just Firefox. To "resolve the external entities
correctly" you'd need to use a validating XML parser: no mainstream
browsers do.

Indeed, mainstream browsers can only handle entities like '&nbsp;' in
normal XHTML through a nasty hack or hard-coding parts of the DTD into the
browser (or browser config files).

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Oct 16 '05 #7
On 14 Oct 2005 10:22:50 -0700, "rbronson1976" <rb**********@yahoo.com>
wrote:
Okay, technically, it's not XHTML Strict because I am referencing a
custom DTD that has been tweaked
The crux of this question can be reduced to,
"Is this on topic for c.i.w.a.h?"

You're doing something that is legit XML, but it's not good practice for
HTML on the web. It's debatable as to whether it's legit HTML to do this
at all. If you do it, is it still "HTML" ? (I would think not).

In practical terms, for today, then it doesn't work. it doesn't work
because the browser's notion of "what is HTML" is based on a hard-coded
notion, not by supporting use of a a dynamic DTD.

Now is this a good thing or a bad thing? IMHO, it's a good thing. We
saw "DTD chaos" in the mid-late '90s, when the doctype disappeared from
usefulness because every new editor in town thought to "extend" the DTD
(no-one was quite sure why). The simple fact was that as browsers had no
idea how to render these new attributes, then they simply might as well
not have been there. And any notion of downloading the DTDs on demand
and using them to drive further parsing was farcical - the web just
never worked that way.

Now with CSS, things are a bit better. In fact it's possible (just not
useful) for a browser to take any XML document and some CSS and render
it perfectly competently, even if there isn't a single element in there
in common with (X)HTML. However this still isn't useful - to make a
useful "web page" the browser must know how to render things beyond
CSS' capabilities. Most obviously it must understand how to generate a
link from an <a> element.

Now by this same _capacity_ of CSS to render any content we like, it
reduces the need for it. Why invent an unknown <foo> element when <div
class="foo" > does the job just as well?
(I added a 'target' attribute to the <a> element).
In the simple case, then why not simply use the Transitional DTD
instead? It supports target, it's there for legacy support, it's there
for exactly the purpose you require.

Target isn't evil. Target wasn't removed from HTML, it was merely
deferred to a better thought-out module that we're still waiting for.

Extensible HTML will happen (one day). We've already seen it done well
and usefully with Ruby. This needs browser support though, not just
stretching the DTD and hoping CSS will take care of it. As such, it is
limited to a fairly small volume of well-known modules with some control
process for their invention - it's not a free-for-all for every page
designer. I see a situation about as fluid as Firefox extensions being
the ideal - build an extension, extend the markup, and publicise both in
a well-known place. But don't do it on a whim, per page or site.
(I added a 'target' attribute to the <a> element).
There's also the question of what you expect a browser to _do_ with a
target attribute on an <a> element. Suppose you define a "destination"
attribute on the <a> element in your expanded DTD and claim it to have
equivalent meaning to target. What should a browser do with that ? -
nothing! Because it has no semantic knowledge of what "destination"
means, then there's nothing it possibly should be expected to do with
it. of courseit might parse and render it correctly, applying CSS as
appropriate, but that's the limit.

Just the same thing applies to a target attribute added to the <a>
element. This is _NOT_ the target attribute from the Transitional DTD
(which would trigger some navigational behaviour) it's merely another
unknown attribute that you have invented yourself. Arguably it should
again be permitted, and have CSS applied to it according to the generic
rules, but it would be very much a _wrong_ action for any browser to
start making assumptions that <a target="..." > in Extended Strict had
the same meaning as <a target="..." > in Transitional.

I confess that I write exactly the same code myself, with HTML 4.01. I
use target to invoke convenient "tag soup processing" in just the way
you're after, but I don't kid myself that my document is still valid
afterwards. But "validity" is still a bit of a goal with optimisitic
rewards on the lager web.
Well, my understanding is that they will validate in XML mode when
served with an XML content-type, as I have (application/xhtml+xml).
My understanding is that they should, but don't. See past threads on the
usefulness of namespaces in XHTML and the issues of getting them
validated.
Also, Firefox handles the more proper content-type of
'application/xhtml+xml' perfectly fine. Eventually, I will switch back
to 'text/html', as a concession to IE 6.


Which means that the web still needs text/html. You could browser
sniff (and people do - read James Pickering's posts hereabouts) but why
bother if you're also being forced to still support the legacy ?
That's because it's not a well-formed page.


Sure it is. Tell me what's not well formed about it.


Interesting terminology point here, and I really don't know the answer
(Jukka?)

Is an XML document with &nbsp; in it a failure to be well-formed, or
just a failure to be valid ?

Is it's merely a validity failure, then perhaps FF _is_ being a little
trigger happy on rejecting it outright? But then is it possible to
build a "useful" "entity soup" browser that can make a useful attempt at
rendering a page containing undefined entity references? _We_ know what
they meant, and we also know that it's "trivial" enough to replace with
a "?" charcetr if needs be, but could a browser depend on this with any
degree of reliability?
--
Cats have nine lives, which is why they rarely post to Usenet.
Oct 16 '05 #8
Andy Dingley wrote:
On 14 Oct 2005 10:22:50 -0700, "rbronson1976"
<rb**********@yahoo.com> wrote:
That's because it's not a well-formed page. Sure it is. Tell me what's not well formed about it.


Is an XML document with &nbsp; in it a failure to be well-formed, or
just a failure to be valid ?


That's an interesting question, and the answer seems to be contrary to
what I believed before looking it up in the XML recommendation [1].

In a document without any DTD, a document with only an internal
DTD subset which contains no parameter entity references, or a
document with "standalone='yes'", for an entity reference that
does not occur within the external subset or a parameter entity,
the Name given in the entity reference MUST match that in an
entity declaration that does not occur within the external subset
or a parameter entity, except that well-formed documents need not
declare any of the following entities: amp, lt, gt, apos, quot.
The declaration of a general entity MUST precede any reference to
it which appears in a default value in an attribute-list
declaration.

Note that non-validating processors are not obligated to to read
and process entity declarations occurring in parameter entities
or in the external subset; for such documents, the rule that an
entity must be declared is a well-formedness constraint only if
standalone='yes'.

AIUI, a document with standalone="no" and a DOCTYPE declaration
referencing an external subset, that contains an undeclared entity
reference is well formed. Thus, if I have understood correctly, I
believe this document is well-formed:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<DOCTYPE p SYSTEM "http://www.example.com/p.dtd">
<p>&nbsp;</p>

(Since a non-validating parser won't even attempt to retrieve the DTD,
it doesn't really matter whether it exists or not. Also, note that (I
believe) the default value of standalone is "no", so it could be omitted
with the same result)
Is it's merely a validity failure, then perhaps FF _is_ being a
little trigger happy on rejecting it outright?
It would be a validity error if the entity were not declared in the
external subset or parameter entity, but, AIUI, is still well-formed.
But then is it possible to build a "useful" "entity soup" browser
that can make a useful attempt at rendering a page containing
undefined entity references? _We_ know what they meant, and we also
know that it's "trivial" enough to replace with a "?" charcetr if
needs be, but could a browser depend on this with any degree of
reliability?


I couldn't find any definition of what should be used in its place,
though I would assume that it could be replaced with U+FFFD Replacement
Character.

[1] http://www.w3.org/TR/REC-xml/#wf-entdeclared

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://GetThunderbird.com/ Reclaim your Inbox
Oct 17 '05 #9
Lachlan Hunt <sp***********@gmail.com> wrote:

[ Regarding undefined entity references ]
I couldn't find any definition of what should be used in its place,
though I would assume that it could be replaced with U+FFFD Replacement
Character.


By the Unicode Standard, U+FFFD is a correct character to be used if you
have had data in another encoding and conversion to Unicode has encountered
the problem that a character has no Unicode equivalent. On the other hand,
it might be more suitable in practice to use Private Use character, since
then you could use _different_ codes for different unconvertible
characters.

However, there are no rules in XML or HTML specifications for handling
U+FFFD. Anything may happen if you feed a browser with it. Contrary to
popular (?) misconception, XML and HTML specs do not require conformance to
the Unicode standard; they just define character references in terms of
Unicode (or the equivalent ISO 10646). Besides, conformance to Unicode
would not require any particular processing of U+FFFD, except that _if_ you
recognize it, you must treat it by its Unicode semantics.

There's a more fundamental problem in using U+FFFD even in theory. An
undefined entity reference is simply undefined. We cannot know that it was
meant to expand to a _single_ character. Although all predefined entities
in HTML expand that way, it's just a simple special case, as far as SGML
and XML are concerned. Using U+FFFD would reflect the assumption that only
one character is involved.

Thus, if you would like to replace an undefined entity reference by some
character that indicates the situation, I would suggest
U+FFFC OBJECT REPLACEMENT CHARACTER
But for it, too, browser behavior is undefined. Besides, the expansion of
an entity reference need not be an "object" in any normal sense. It could
be part of a name, for example.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Oct 17 '05 #10
On Sat, 15 Oct 2005, Jukka K. Korpela wrote:
I don't find &#a0; or   particularly readable. This is really an
authoring tool issue: the tools should have a function like MS Word for
showing a space as different from a no-break space.


Or just use MS Word as authoring tool. ;-)

--
Netscape 3.04 does everything I need, and it's utterly reliable.
Why should I switch? Peter T. Daniels in <news:sci.lang>

Oct 17 '05 #11
In article <8o************@ophelia.g5n.co.uk>,
Toby Inkster <us**********@tobyinkster.co.uk> wrote:
To "resolve the external entities
correctly" you'd need to use a validating XML parser: no mainstream
browsers do.


A non-validating XML processor may resolve external entities. Resolving
external entities in a Web browser is still an extremely bad idea from
the performance point of view.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Oct 23 '05 #12
In article <w0*******************@news-server.bigpond.net.au>,
Lachlan Hunt <sp***********@gmail.com> wrote:
Andy Dingley wrote:
On 14 Oct 2005 10:22:50 -0700, "rbronson1976"
<rb**********@yahoo.com> wrote:
That's because it's not a well-formed page.
Sure it is. Tell me what's not well formed about it.
Is an XML document with &nbsp; in it a failure to be well-formed, or
just a failure to be valid ?


That's an interesting question, and the answer seems to be contrary to
what I believed before looking it up in the XML recommendation [1].

.... Thus, if I have understood correctly, I
believe this document is well-formed:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<DOCTYPE p SYSTEM "http://www.example.com/p.dtd">
<p>&nbsp;</p>

(Since a non-validating parser won't even attempt to retrieve the DTD,
it doesn't really matter whether it exists or not. Also, note that (I
believe) the default value of standalone is "no", so it could be omitted
with the same result)
Is it's merely a validity failure, then perhaps FF _is_ being a
little trigger happy on rejecting it outright?


It would be a validity error if the entity were not declared in the
external subset or parameter entity, but, AIUI, is still well-formed.


What's actually happening is that the entity resolver gives expat a
zero-length stream as the external entity ("DTD") that was not found in
the pseudo-DTD catalog. Therefore, expat thinks it has seen the DTD as
opposed to having skipped it. Now that the DTD has been seen (although
the real one has not really been seen), a reference to an undeclared
entity is a fatal error.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Oct 23 '05 #13
Henri Sivonen <hs******@iki.fi> wrote:
What's actually happening is that the entity resolver gives expat
a zero-length stream as the external entity ("DTD") that was not
found in the pseudo-DTD catalog. Therefore, expat thinks it has
seen the DTD as opposed to having skipped it. Now that the DTD has
been seen (although the real one has not really been seen), a
reference to an undeclared entity is a fatal error.


Ahh, thanks for the explanation :-)

--
David Håsäther
Oct 23 '05 #14

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

12 posts views Thread by Tjerk Wolterink | last post: by
12 posts views Thread by Robert Mark Bram | last post: by
2 posts views Thread by Bill Green | last post: by
2 posts views Thread by Rob T | last post: by
1 post views Thread by Vikram | last post: by
5 posts views Thread by Tim | last post: by
9 posts views Thread by Jouni Karppinen | last post: by
3 posts views Thread by yawnmoth | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.