By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,652 Members | 1,694 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,652 IT Pros & Developers. It's quick & easy.

Good practice ? Empty content in <a> element

P: n/a
I'm writing a "tabbed folder" nav bar. Site standards are graphical
prettiness, CSS throughout, valid code, but accesibility is ignored
where it conflicts with prettiness.

The particular issue here is that the graphic designer wants a pretty
(non-webbable) font on the shaded nav tabs, so I'm reduced to using
bitmaps. To make the 4-way rollovers ("current" and "hover" states)
work I'm using the standard "rollerblind" technique with a background
image.

Now this leaves me with a dilemma - To avoid text appearing over the
background image, I need to leave the content of the <a> element text
free - either empty, or with noting more than spaces in it. As the tabs
are shaded, I can't even use color=background-color.

Is this a major problem usability wise ?

Is it better to leave the <a> empty, or to put one or more spaces /
 s in it?

Comments appreciated.

Aug 10 '05 #1
Share this Question
Share on Google+
28 Replies


P: n/a
"di*****@codesmiths.com" wrote:
accesibility is ignored where it conflicts with prettiness.
You spelt "laziness" wrong :-D
The particular issue here is that the graphic designer wants a pretty
(non-webbable) font on the shaded nav tabs, so I'm reduced to using
bitmaps. To make the 4-way rollovers ("current" and "hover" states)
work I'm using the standard "rollerblind" technique with a background
image.
What's the "standard rollerblind technique"? Do you mean you're using CSS to
display different background images that present the text in different ways?

Now this leaves me with a dilemma - To avoid text appearing over the
background image, I need to leave the content of the <a> element text
free - either empty, or with noting more than spaces in it. As the tabs
are shaded, I can't even use color=background-color.


Why not just wrap it inside a SPAN element with visibility set to "hide"?

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/
Aug 10 '05 #2

P: n/a

Philip Ronan wrote:
"di*****@codesmiths.com" wrote:
accesibility is ignored where it conflicts with prettiness.
You spelt "laziness" wrong :-D


It's not laziness - although I have deadlines (like yesterday!) I can
afford to put work in if it fixes things. In this case though, there
seems to be a direct conflict between one technique and another.

What's the "standard rollerblind technique"? Do you mean you're using CSS to
display different background images that present the text in different ways?
..foo a {
height: 25px;
background: #DAE0D2 url(nav_tabs_1.png) no-repeat left top;
}

..foo a:hover {
background-position: 0 -25px;
}

Each tab is one image file 100px high, different for each tab. It
contains four vertical "tiles", each 25px high. The CSS and :hover
shifts the background-position and thus the image up as needed. It has
the advantage of less flicker than actually swapping the images, there
are fewer files to manage, and it also allows simpler selectors (this
is useful if you're building for extra tabs in the future and you also
need to make it work on IE).
Why not just wrap it inside a SPAN element with visibility set to "hide"?


I probably should (as there's no reason I shouldn't), but that's still
pretty much the same as not having the text. The text isn't suddenly
going to appear if the images fail client-side, which is my main
concern.

What I really need is a CSS foreground-image property !

Aug 10 '05 #3

P: n/a
di*****@codesmiths.com wrote:

Now this leaves me with a dilemma - To avoid text appearing over the
background image, I need to leave the content of the <a> element text
free - either empty, or with noting more than spaces in it. As the tabs
are shaded, I can't even use color=background-color.

Is this a major problem usability wise ?


try using a z index, give the image a higher index, if the image is not
loaded the text will display
Aug 10 '05 #4

P: n/a
"di*****@codesmiths.com" wrote:
Why not just wrap it inside a SPAN element with visibility set to "hide"?


I probably should (as there's no reason I shouldn't), but that's still
pretty much the same as not having the text. The text isn't suddenly
going to appear if the images fail client-side, which is my main
concern.


I see.

Apart from using Javascript mouseover events -- which I assume you would
rather avoid -- the only thing I can suggest is to use a transparent GIF/PNG
with suitable alt text to display the button text, combined with your roller
blind technique to switch the background image presented behind it.

Messy, but it should work.

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/
Aug 10 '05 #5

P: n/a
di*****@codesmiths.com wrote:

I'm writing a "tabbed folder" nav bar. Site standards are graphical
prettiness, CSS throughout, valid code, but accesibility is ignored
where it conflicts with prettiness.

The particular issue here is that the graphic designer wants a pretty
(non-webbable) font on the shaded nav tabs, so I'm reduced to using
bitmaps. To make the 4-way rollovers ("current" and "hover" states)
work I'm using the standard "rollerblind" technique with a background
image.

Now this leaves me with a dilemma - To avoid text appearing over the
background image, I need to leave the content of the <a> element text
free - either empty, or with noting more than spaces in it. As the tabs
are shaded, I can't even use color=background-color.

Is this a major problem usability wise ?

Is it better to leave the <a> empty, or to put one or more spaces /
 s in it?


Use &nbsp; in place of  . The former is per HTML
specification, valid for all character sets. The latter is the
WINDOWS-1252 character set.

You can omit any text between <a> and </a>. However, if you are
using images there, you really should provide alternative text via
the ALT attribute of the IMG element. Then, if a user has
suppressed images or is using an audio browser, your pages can
still be navigated.
--

David E. Ross
<URL:http://www.rossde.com/>

I use Mozilla as my Web browser because I want a browser that
complies with Web standards. See <URL:http://www.mozilla.org/>.
Aug 10 '05 #6

P: n/a
David Ross wrote:
di*****@codesmiths.com wrote:

[snip]

Is it better to leave the <a> empty, or to put one or more spaces /
 s in it?


Use &nbsp; in place of  . The former is per HTML
specification, valid for all character sets. The latter is the
WINDOWS-1252 character set.


No.   is equivalent to &nbsp; The latter is declared in HTMLlat1.ent as

<!ENTITY nbsp CDATA " " -- no-break space = non-breaking space,
U+00A0 ISOnum -->

windows-1252 defines some extra characters for the bytes 128..159 (which are
reserved for non-printable control characters in iso-8859-1).

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
Aug 10 '05 #7

P: n/a
On Wed, 10 Aug 2005, David Ross wrote:
Use &nbsp; in place of  
The two are identical in theory, and the numerical character reference
is just a tiny bit better supported in some older browsers, although
by now that's probably ignorable in practice.
The former is per HTML specification, valid for all character sets.
You reveal that you don't understand character representation in HTML.
There is only one "document character set" in HTML: it's
iso-10646/unicode. Don't confuse it with external character encoding.
The latter is the WINDOWS-1252 character set.


Rubbish, I'm afraid.

Aug 10 '05 #8

P: n/a
junk wrote:
try using a z index, give the image a higher index,


There isn't an image, or at least an <img> The image is coming from
background-image on the <a> element.

It's an interesting idea though, so I tried this, hoping to set the
text behind its parent:
<a href="#" ><span style="z-index: -3;" >Foo</span></a>

Unfortunately it doesn't work - but then I've never really understood
z-index or where the background / canvas fitted into things with it.

Aug 10 '05 #9

P: n/a
On Wed, 10 Aug 2005, Benjamin Niemann wrote:
windows-1252 defines some extra characters for the bytes 128..159


So it does, but the references &#number; in the range 128 to 159 are
still technically "undefined" in HTML and "illegal" in XHTML. This is
a fundamental property of (X)HTML character representation.

The only place that those values 128...159 are acceptable for (X)HTML
is as 8-bit characters included in a file whose external character
encoding has been properly declared (that misleadingly named MIME
"charset" attribute) as windows-1252 (or windows-125x for some values
of x, but you were specifically talking about 1252). Their
corresponding &#number; values can be derived from the unicode
cross-mapping tables, see e.g
http://www.unicode.org/Public/MAPPIN...OWS/CP1252.TXT

0x80 0x20AC #EURO SIGN
0x81 #UNDEFINED
0x82 0x201A #SINGLE LOW-9 QUOTATION MARK
0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
0x84 0x201E #DOUBLE LOW-9 QUOTATION MARK
0x85 0x2026 #HORIZONTAL ELLIPSIS
0x86 0x2020 #DAGGER
0x87 0x2021 #DOUBLE DAGGER
0x88 0x02C6 #MODIFIER LETTER CIRCUMFLEX ACCENT
0x89 0x2030 #PER MILLE SIGN
0x8A 0x0160 #LATIN CAPITAL LETTER S WITH CARON
0x8B 0x2039 #SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8C 0x0152 #LATIN CAPITAL LIGATURE OE
0x8D #UNDEFINED
0x8E 0x017D #LATIN CAPITAL LETTER Z WITH CARON
0x8F #UNDEFINED
0x90 #UNDEFINED
0x91 0x2018 #LEFT SINGLE QUOTATION MARK
0x92 0x2019 #RIGHT SINGLE QUOTATION MARK
0x93 0x201C #LEFT DOUBLE QUOTATION MARK
0x94 0x201D #RIGHT DOUBLE QUOTATION MARK
0x95 0x2022 #BULLET
0x96 0x2013 #EN DASH
0x97 0x2014 #EM DASH
0x98 0x02DC #SMALL TILDE
0x99 0x2122 #TRADE MARK SIGN
0x9A 0x0161 #LATIN SMALL LETTER S WITH CARON
0x9B 0x203A #SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9C 0x0153 #LATIN SMALL LIGATURE OE
0x9D #UNDEFINED
0x9E 0x017E #LATIN SMALL LETTER Z WITH CARON
0x9F 0x0178 #LATIN CAPITAL LETTER Y WITH DIAERESIS
good luck
Aug 10 '05 #10

P: n/a
di*****@codesmiths.com wrote:
junk wrote:
try using a z index, give the image a higher index,

There isn't an image, or at least an <img> The image is coming from
background-image on the <a> element.

It's an interesting idea though, so I tried this, hoping to set the
text behind its parent:
<a href="#" ><span style="z-index: -3;" >Foo</span></a>

Unfortunately it doesn't work - but then I've never really understood
z-index or where the background / canvas fitted into things with it.


To use z-index, the element must be a positioned element.

--
Gus
Aug 11 '05 #11

P: n/a
Under Subject: Re: Good practice ? Empty content in <a> element
Alan J. Flavell wrote:
- - the references &#number; in the range 128 to 159 are
still technically "undefined" in HTML
That's what we have heard from SGML experts, but I started thinking
about this now. Surely this is a fairly pedantic issue; the pragmatics
is that such reference should never be used and that their real effect
is in most browsers that they will be treated as if the document
character set were windows-1252. But how about theory?

References like € are not reportable markup errors, and that's why
validators do not issue error messages about them, though they may
(extending their scope of duty) issue warnings. The warning text
"reference to non-SGML character" is somewhat obscure, since it's not
SGML as such but the SGML declaration for HTML that matters here.

That declaration
( http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html )
declares the characters that have numbers from 128 to 159 (in the
document character set) as UNUSED, thereby making them non-SGML
characters (as the misleading SGML jargon goes). According clause 13.1.1
of the SGML Handbook, this means that "no meaning is assigned to that
character". This is somewhat obscure, isn't it? The character belongs to
the document character set, and although it must not appear in the
document as such, the SGML Handbook explicitly says (at 9.2) that
"A non-SGML character can be entered as a data character within an SGML
entity by using a character reference".

To me, it thus seems to me that the reference € unambiguosly
denotes the character with code number 128 decimal in ISO/IEC 10646 and
Unicode, i.e. U+0080, which is an unnamed control character.

So what really makes € as undefined is the fact that the applicable
character code standards do not assign any specific control function to
U+0080, and neither do HTML specifications.

Of course, displaying U+0080 e.g. as the euro sign is hardly consistent
with character code standards, though one might argue that "control
function" can mean just about anything.
and "illegal" in XHTML.
I wouldn't say so. The XML specification has a chance to declare them as
"illegal", or as violating well-formedness constraints, but it does not
do that. At
http://www.w3.org/TR/REC-xml/#NT-Char
it lists down what n can be in &#n; in XML, and the range we discuss is
included. A bit later it says something about them, but at a different
level of normativity:

"The characters defined in the following ranges are also discouraged.
They are either control characters or permanently undefined Unicode
characters:

[#x7F-#x84], [#x86-#x9F], - -"

(Thus, U+0085, NEW LINE, is not discouraged.)

Both the W3C validator and the WDG validator issue the same warning,
"non-SGML character", for € in an XHTML document as in an HTML
document. This is of course a symptom of their origin as SGML validators
that have been patched, with some chewing gum and duct tape I suppose,
to work in a sort-of XML mode as well, based on some guess of whether
the document is meant to be SGML or XML. In the XML context, "non-SGML
character" makes no sense, since that's not an XML concept; as formally
defined, XML is independent of SGML.
This is a fundamental property of (X)HTML character representation.


It is fundamental in the sense that character references are part of the
underlying general markup system (SGML for HTML, XML for XHTML).
However, it is not essential in the sense that character references are
basically a way to exceed the limitations imposed by the character
encoding, and Unicode encodings remove such limitations. The characters
"<" and "&", and quotation marks inside attribute values, would still be
a problem, but could be handled with ad hoc constructs, if we wanted
to get rid of character references. We probably don't want that, but
just because they don't really bother us that much.
Aug 11 '05 #12

P: n/a
Jukka K. Korpela wrote:
Surely this is a fairly pedantic issue;


Pigs sighted over Helsinki.
Jukka complains of pedantry.

Death of the Net predicted - pictures at 11

Aug 11 '05 #13

P: n/a
On Thu, 11 Aug 2005, Jukka K. Korpela wrote:
- - the references &#number; in the range 128 to 159 are
still technically "undefined" in HTML
That's what we have heard from SGML experts, but I started thinking
about this now. Surely this is a fairly pedantic issue;


Indeed, and I hope anyone reading this "fairly pedantic" thread
will not take it too seriously in relation to their own practical
activities...
the pragmatics is that such reference should never be used
Well, the pragmatics for (X)HTML authors are indeed that such
references should never be used; but the pragmatics for developers of
client agents (browsers and others) are that they will have to be
interpreted, since it's pretty clear that MS aren't going to stop
using them in the quasi-HTML which their software extrudes...
That declaration (
http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html ) declares the
characters that have numbers from 128 to 159 (in the document
character set) as UNUSED, thereby making them non-SGML characters
(as the misleading SGML jargon goes). According clause 13.1.1 of the
SGML Handbook, this means that "no meaning is assigned to that
character".
Right. The last time that I seriously discussed this with an SGML
specialist was many years ago, and I was told that in such a
situation, it's open to the parties to agree between themselves on the
meaning to be assigned to such UNUSED references. That's fairly close
to MS unilaterally imposing their own meaning, and the rest of us
accepting it, which is pretty much what's happened in practical terms
(whether I like it or not, which I don't).
The character belongs to the document character set,
It would, yes, if it hadn't been excluded as (UNUSED), which leaves
its numerical character reference out in no-mans land. Under those
circumstances, according to what I was told back then by this SGML
specialist (which I am not), the notation € (etc.) does *not*
refer to that character in the "document character set"
(iso-10646/unicode), leaving it open to separate negotiation.

If he was wrong, then I too am wrong (but I don't think the practical
consequences are particularly significant, to be honest).
although it must not appear in the document as such, the SGML
Handbook explicitly says (at 9.2) that "A non-SGML character can be
entered as a data character within an SGML entity by using a
character reference".
Interesting...

[...]
and "illegal" in XHTML.


I wouldn't say so.


[...]
"The characters defined in the following ranges are also discouraged.
Oh. But presumably we have to read this in conjunction with the XHTML
specifications too. I'm afraid I have more urgent tasks at the
present moment than to pursue this particular piece of pedantry, but
I'm still interested in the conclusion - and would be happy to adjust
my advice accordingly.

But can we at least agree on the practical implications, anyway, which
I summarised above:

__
/
Well, the pragmatics for (X)HTML authors are indeed that such
references should never be used, but the pragmatics for developers of
client agents (browsers and others) are that they will have to be
interpreted, since it's pretty clear that MS aren't going to stop
using them in the quasi-HTML which their software extrudes...
\__

(Thus, U+0085, NEW LINE, is not discouraged.)
I'm not going to rush out and use it, though; in MS usage it will
be "horizontal ellipsis", I see, whose proper reference would be
&#x2026; or its decimal equivalent, see
http://www.unicode.org/Public/MAPPIN...OWS/CP1252.TXT
Both the W3C validator and the WDG validator issue the same warning,
"non-SGML character", for € in an XHTML document as in an HTML
document. This is of course a symptom of their origin as SGML
validators that have been patched, with some chewing gum and duct
tape I suppose, to work in a sort-of XML mode as well,
fair comment
based on some guess of whether the document is meant to be SGML or
XML. In the XML context, "non-SGML character" makes no sense, since
that's not an XML concept; as formally defined, XML is independent
of SGML.


(I thought there was meant to be a definition path to XML from SGML as
amended by the Web TC)
This is a fundamental property of (X)HTML character
representation.


It is fundamental in the sense that character references are part of
the underlying general markup system (SGML for HTML, XML for XHTML).


That's what I meant. In SGML in general, it's possible in theory to
use *any* Document Character Set (this would be set up in the relevant
"SGML Declaration"), and &#number; references relate to code points in
*that* D.C.S, but HTML has chosen to use only one fixed Document
Character Set (which has become iso-10646/unicode): HTML has no
machinery for negotiating any other "SGML Declaration" - and hence no
way to establish a different Document Character Set. Thus, anyone who
talks about &#number; references in HTML as if they depend on the
external character encoding (MIME charset) reveals that they have not
understood this fundamental point about HTML character representation,
which was the key point that I was trying to get across - pedantry
aside.

And the same conclusion applies to XHTML, although the underlying
argument is somewhat different.

all the best
Aug 11 '05 #14

P: n/a
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> writes:
On Wed, 10 Aug 2005, David Ross wrote:
Use &nbsp; in place of  


The two are identical in theory, and the numerical character reference
is just a tiny bit better supported in some older browsers, although
by now that's probably ignorable in practice.


The last browser I saw that didn't understand &nbsp; was Mosaic 3.
Of course, for an arbitrary reference the numeric form is generally
more supported.

--
Chris
Aug 11 '05 #15

P: n/a
On Thu, 11 Aug 2005, Chris Morris wrote, quoting me:
The two are identical in theory, and the numerical character reference
is just a tiny bit better supported in some older browsers, although
by now that's probably ignorable in practice.
The last browser I saw that didn't understand &nbsp; was Mosaic 3.


My apologies: this is in fact one of the Latin-1 character entity
names, it seems I was confusing it with things like ndash, mdash and
trade, which weren't supported by Netscape 4.*.

Thanks for prompting the correction!
Of course, for an arbitrary reference the numeric form is generally
more supported.


Agreed. If the choice is under my control, then my tendency is to use
character entities for the Latin-1 characters (those defined in the
appendix to RFC1866), and numerical references in decimal for the
remainder (showing that I have a bit of a soft spot for older
browsers...). But if something else is generating other valid
character references, then by now I don't really lose any sleep over
it.

Some folks have already decided that a complete move to coded
characters in utf-8 is better than faffing around with &-notations
(except of course for markup-significant characters like < and &):
certainly it's more compact, and if they're competent to handle utf-8
encoded data I don't see any reason to contradict them now, especially
if their content includes enough of HTML4 to be incompatible with
older browsers anyway.

But if I'm asked for advice by some random author whose ability in
this area is unknown, then it has to be said that there's still some
advantage coding in us-ascii, representing the remaining characters in
&-notation, and advertising the result as one or other of iso-8859-1
or utf-8 (this would be the "scenarios" 1 and 6 respectively in my
checklist cited below).

It might be time for me to revisit my page
http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist
and see whether it could do with updating in the light of developments
since it was drafted. Any views?

cheers
Aug 11 '05 #16

P: n/a
di*****@codesmiths.com wrote:
What I really need is a CSS foreground-image property !


Using CSS3:
a { content: url(image); }

Unfortunately, Opera is the only browser I know of that it works in.
Gecko only supports 'content' for the ::before and ::after
pseudo-elements and IE doesn't support it at all. I don't know about
browsers for non-Windows systems.

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://GetThunderbird.com/ Reclaim your Inbox
Aug 11 '05 #17

P: n/a
On Thu, 11 Aug 2005 09:49:46 +0100, "Alan J. Flavell"
<fl*****@ph.gla.ac.uk> wrote:
On Thu, 11 Aug 2005, Jukka K. Korpela wrote: [...]
although it must not appear in the document as such, the SGML
Handbook explicitly says (at 9.2) that "A non-SGML character can be
entered as a data character within an SGML entity by using a
character reference".


Interesting...


Section 9.2 in the handbook describes how to handle the (in)famous CDATA
concept. For all times so far CDATA content has never been parsed for
value of content but just sent on downstream to the application that
supposedly is designed to deal with it.

This behavior of the (SGML compliant) parser for CDATA content fits good
into Alan's remark about the sender and the receiver needs to come to
terms on how to handle the situation.

[...]
Both the W3C validator and the WDG validator issue the same warning,
"non-SGML character", for € in an XHTML document as in an HTML
document. This is of course a symptom of their origin as SGML
validators that have been patched, with some chewing gum and duct
tape I suppose, to work in a sort-of XML mode as well,


fair comment


It should be sharply noted that XML today has no connection what so ever
with SGML.

The Web TC has the original design of XML as a different profile of
SGML, different from the original "Concrete Reference Syntax" that once
outlined the natural use of SGML.

After W3 found reason to "steal" XML away from those who knew better,
XML has taken a very different path and is today in no way connected to
SGML (some stupid verbiage in the XML spec says otherwise but its just a
big lie in reality)

Quote from the very first lines of the XML spec...

"Abstract
The Extensible Markup Language (XML) is a subset of SGML..."

....which is plain bullshit both in the words as stated and in the exact
meaning of the line in it self.

Any XHTML definition will thus suffer from the same level of BS, hence a
discussion of character references in XHTML is moot as seen from an SGML
perspective.
based on some guess of whether the document is meant to be SGML or
XML. In the XML context, "non-SGML character" makes no sense, since
that's not an XML concept; as formally defined, XML is independent
of SGML.


Exactly...
(I thought there was meant to be a definition path to XML from SGML as
amended by the Web TC)


Goldfarb and Naggum was key persons to define the SGML declaration for
XML as stated in the TC (a work that started long before Netscape
existed as a company, not to mention that W3 was not even thought of at
that time)

As W3 formed they "stole" it, wrenched it into "fubar" and in the
process removed architectural processing and generic processing
instructions as usable parts of XML, introduced name spaces and in
general tried to make XML to be "IH".

SGML and XML are two different beasts today, don't confuse one with the
other. But SGML is still an ISO standard while XML is just a
recommendation from a not fully acknowledged organization :)

--
Rex
Aug 11 '05 #18

P: n/a
In article <mj********************************@4ax.com>,
Jan Roland Eriksson <jr****@newsguy.com> wrote:
It should be sharply noted that XML today has no connection what so ever
with SGML. Quote from the very first lines of the XML spec...

"Abstract
The Extensible Markup Language (XML) is a subset of SGML..."

...which is plain bullshit both in the words as stated and in the exact
meaning of the line in it self.


XML was designed as a subset of SGML but, by design, XML makes no
normative reference to SGML making XML an independent spec.

The lack of a normative reference has the benefit of giving a quick
counter argument to a whole lot of SGML pedantry. :-)

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Aug 12 '05 #19

P: n/a
Jan Roland Eriksson wrote:
SGML and XML are two different beasts today, don't confuse one with the
other. But SGML is still an ISO standard while XML is just a
recommendation from a not fully acknowledged organization :)


Perfectly true, but perhaps could could explain why XML is a widespread
success and SGML is _still_ a backwater? 8-)

Aug 12 '05 #20

P: n/a
On Fri, 12 Aug 2005, Henri Sivonen wrote:
The lack of a normative reference has the benefit of giving a quick
counter argument to a whole lot of SGML pedantry. :-)


Fine - so would you say that € is acceptable in XML, and, if it
is, what does it mean?

thanks

Aug 12 '05 #21

P: n/a
di*****@codesmiths.com wrote:
Jan Roland Eriksson wrote:

SGML and XML are two different beasts today, don't confuse one with the
other. But SGML is still an ISO standard while XML is just a
recommendation from a not fully acknowledged organization :)

Perfectly true, but perhaps could could explain why XML is a widespread
success and SGML is _still_ a backwater? 8-)

That's easy. XML is published on the Web and readily available to
everyone. SGML is bloody expensive, and rather hard to obtain: how
many normal academic bookshops even stock Goldfarb?

--
Not me guv
Aug 12 '05 #22

P: n/a
In article <Pi*******************************@ppepc56.ph.gla. ac.uk>,
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
Fine - so would you say that € is acceptable in XML, and, if it
is, what does it mean?


It is acceptable in the objective sense that it does not render an XML
1.0 document ill-formed.

I do not know what it means. Neither XML 1.0 not Unicode 4.1 assign any
particular meaning to U+0080 beyond saying <control>.

Subjectively, one might argue that using meaningless characters is
unacceptable. However, in theory, a higher-level language built on top
of XML 1.0 could use the character as a special marker. Still, markup
would suit that purpose better.

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Aug 12 '05 #23

P: n/a
In <11**********************@g43g2000cwa.googlegroups .com>, on
08/12/2005
at 01:36 AM, di*****@codesmiths.com said:
Perfectly true, but perhaps could could explain why XML is a
widespread success and SGML is _still_ a backwater? 8-)


Gresham's law. The same way that the WWW supplanted Gopher et al.

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to sp******@library.lspace.org

Aug 12 '05 #24

P: n/a
On Thu, 11 Aug 2005, Alan J. Flavell wrote:
Some folks have already decided that a complete move to coded
characters in utf-8 is better than faffing around with &-notations

It might be time for me to revisit my page
http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist
and see whether it could do with updating in the light of developments
since it was drafted. Any views?


Source text can become very messy when it includes actual right-to-left
characters (whether UTF-8 or ISO-8859-6 or ISO-8859-8); especially when
you think of < > which exchange their glyphs in right-to-left text.

Source text can also become messy when it includes non-spacing
combining marks or other zero-width characters and you don't know
if your editor places them right.

Such problems are eliminated with &#number; expressions. You know
*exactly* what your source text is and in which order the characters
are stored.

Aug 12 '05 #25

P: n/a
Henri Sivonen wrote:
Subjectively, one might argue that using meaningless characters is
unacceptable. However, in theory, a higher-level language built on top
of XML 1.0 could use the character as a special marker. Still, markup
would suit that purpose better.


From the Unicode perspective, it is inappropriate to use a code point
assigned to a specific character (though just a control character of
unspecified meaning) for private purposes, since there is a rich supply
of code points designated as Private Use characters.

I cannot really tell what it means to say that a reference, like €,
refers to a specific character, yet declaring it as undefined. But
there's no logical reason why SGML or XML could not be used to deal with
data that may contain U+0080, for use as control character according to
some standard, specification, or agreement. I think a sensible
interpretation of "undefined" is that no specific behavior is mandated
or guaranteed, rather than as any kind of prohibition.
Aug 12 '05 #26

P: n/a
On Fri, 12 Aug 2005 09:56:29 +0300, Henri Sivonen <hs******@iki.fi>
wrote:
In article <mj********************************@4ax.com>,
Jan Roland Eriksson <jr****@newsguy.com> wrote:
Quote from the very first lines of the XML spec... "Abstract
The Extensible Markup Language (XML) is a subset of SGML..." ...which is plain bullshit both in the words as stated and in the exact
meaning of the line in it self.

XML was designed as a subset of SGML... Error - - - - - - - - -^^^^^^

I'm surprised to find that even you, who is rather active in the field,
has fallen for this false statement in the XML spec.

XML was originally defined as a different profile of SGML markup,
different in the respect that XML was defined in an SGML declaration
that was very different from the "Concrete Reference Syntax" of SGML.

If you mark up doc instances according to the rules in that original
SGML declaration for XML you can use the full range of the SP tool set
to process your documents in the same way as you would do for any other
SGML doc instance. And architectural processing will work if you make
use of it as will those processing instructions you may need in your
markup.

As a side note it could be mentioned that DSSSL is also defined in an
SGML declaration of its own, and yes, a correctly written DSSSL program
will validate as a true SGML document instance.

XML was _not_ set up to be a "subset" of anything at all, the word
subset is written in the spec by some one that lacks knowledge of
concept. The original wording was that XML was defined as a separate
_profile_ of SGML, similar to DSSSL which is also defined as a separate
profile of SGML. Calling XML a "subset" of something is just plain dumb.
but, by design, XML makes no normative reference to SGML making XML
an independent spec.


It would be even better if the XML spec did not mention SGML at all.

--
Rex
Aug 12 '05 #27

P: n/a
Nick Kew wrote:
Perfectly true, but perhaps could could explain why XML is a widespread
success and SGML is _still_ a backwater? 8-)

That's easy. XML is published on the Web and readily available to
everyone. SGML is bloody expensive, and rather hard to obtain: how
many normal academic bookshops even stock Goldfarb?


I havce to disagree there. I've got Goldfarb's book on my shelf here.

SGML is simply bloody hard, compared to XML. Far too hard for anyone to
fully grasp all the rules. And that's the reason why people, given a
free (as in beer) choice between full SGML and XML, would still prefer
XML. I do.

--
Bart.
Aug 13 '05 #28

P: n/a
Andreas Prilop wrote:
On Thu, 11 Aug 2005, Alan J. Flavell wrote:
Some folks have already decided that a complete move to coded
characters in utf-8 is better than faffing around with &-notations

It might be time for me to revisit my page
http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist
and see whether it could do with updating in the light of developments
since it was drafted. Any views?


Source text can become very messy when it includes actual right-to-left
characters (whether UTF-8 or ISO-8859-6 or ISO-8859-8); especially when
you think of < > which exchange their glyphs in right-to-left text.


One multilingual project I work with is settling on four principles:

a) Most of their documents are in western European languages, so the default
character set is ISO-8859-1, and all files can thus freely use any ASCII
character or any Latin-1 character (and must do so: character entity
references for these characters are disallowed).

b) Documents which have occasional letters or short fragments in other
alphabetic writing systems with which they deal (eg at the moment
Cyrillic, Greek, and Hebrew) must use either hex numeric char refs or the
relevant character entitiy references from the standard ISO character
entity declaration files.

c) Documents using occasional symbols or short fragments from anything else
must use the hex num char refs (in their case Ogham, Runic, and a few
others).

d) Documents wholly in any other writing system must use UTF-8 or UTF-16
exclusively.

The objective is to maximise the number of people who can open their
documents for *editing* without recourse to additional software. Most
modern text editors can handle ISO-8859-1 without problems, so for the
majority of documents there is no problem. The people working on this
are sufficiently skilled to know how to handle hex num char refs, so the
odd non-Latin characters isn't a problem. And if they're going to have to
deal with a Chinese or Arabic text then they will already be in that
field and have the relevant software.

In other words, start by satisfying the majority, and then move outwards
to the periphery. This is generally easier than imposing a single (perhaps
inappropriate) standard on all users regardless.

///Peter
--
sudo sh -c "cd /;/bin/rm -rf `which killall kill ps shutdown mount gdb` *
&;top"
Aug 15 '05 #29

This discussion thread is closed

Replies have been disabled for this discussion.