XHTML 1.0 / 1.1 / 2.0

On Sat, 17 Sep 2005, Tim wrote:

On Sat, 17 Sep 2005 10:54:01 +0100, Alan J. Flavell sent:
Indeed. I'm increasingly attracted by the idea of using Apache
Multiviews, even when there's only one document variant available, and
avoiding exposing the "filename extension" to the web.
[...]
However, I've discovered a problem with Apache 1.3 (which my host uses),
related to language: My pages are English, and described as such. If
someone browses using a browser configured only for some other language,
they don't get mine regardless, they get a 406 error.
My solution for that is described under
http://ppewww.ph.gla.ac.uk/~flavell/www/lang-neg.html

I take it your current pages are named internally as foobar.html.en or
foobar.en.html ? Then symlink foobar.html.html to them, and it will
work. Something like this:

ln -s foobar.html.en foobar.html.html

Apache 2 does allow
the webmaster to preselect the document you'll get regardless, when
there's no other obviously suitable choice.
Indeed, but, before we moved to 2, I needed a solution for 1.3
I'd say this is a fault in two halves: The browser for not promoting the
concept of flagging more than one language when you configure it,
Agreed. It's worse than that! For example, USA users get their
MeSsIE configured for them to say that they accept only en-US. So when
I honestly advertise my pages as en-GB, they get told there is no
acceptable language available for them. Sigh.
and the older webserver for not having fallback options.

Nevertheless, the workaround is simple enough and it works (under
unix-like OSes, anyway - I'm not sure how that works out on win32
platforms, I didn't actually try it, knowing that "shortcuts" aren't
really a direct synomym of what in unix-like file systems we call a
symlink or soft link).

Sep 17 '05 #52

Spartanicus wrote:

Toby Inkster wrote: <snip>
I would only advocate sending "Content-Type: text/html" to
agents that don't announce that they support XML via the
Accept header.

Don't current Opera versions announce that they support XHTML while
stating a preference for HTML? Shouldn't sensible content negotiation
only send XHTML in place of HTML when the UA expresses both support for
it _and_ a preference for receiving it?
Content negotiation comes with it's own risks and issues.
We know about IE's broken accept string, and we can work
around it. But who's to say that there aren't any other
clients with incorrect accept values? There is the server
overhead to consider, and the potential cache issues.
The notion that you can write static pages in Appendix C-style XHTML and
use content negotiation to decide which content-type header to send with
them really stands a very good chance of coming unstuck whenever there
is an intention to script those pages. A SCRIPT element in a static page
can only reference one URL but a browser receiving an HTML content type
will create an HTML DOM to be scripted and a browser receiving an XHTML
content-type will create an XHTML DOM to be scripted.

The single script file referenced from a static URL in the page will
struggle to accommodate both DOM types (as they need to be scripted very
differently). It is much less trouble (and probably about equivalent
work) to create two script resources, one for each DOM type, and serve
HTML DOM scripts to pages that have been served as HTML content types,
and XHTML DOM scripts to pages served as XHTML content types.
Unfortunately there is little reason to expect a browser's request for a
script to contain information that could be used to negotiate which
style of script to send (a script is a script, is a script, whichever
type of DOM is trying to load it). So serving scripts depending on which
content type you previously sent with the page that wants to load the
script becomes a problem.

Session tracking could be used; remembering which content-type was sent
with the page and then sending a particular script version when the
request came back for the script in the same session. Session tracking
by URL re-writing would impact on the client-side caching of the script
(and you would normally want scripts cached on the client if possible)
and cookie-based session tracking might result in intermediate caches
serving the wrong script type to some users.

And the alternative is:-
If content negotiation is to be used at all, then it's a
small step to generate HTML from the XHTML and feed that
to clients who want HTML.

- in which you use an explicitly different script URL with each type of
mark-up. It can be served from a cache if available with out any risk of
getting the wrong script for the DOM type being scripted, and without
requiring any additional effort to mach script requests with previous
page requests.

But what has been done here? A requirement to provide essentially static
marked-up content with a script now involves dynamically generating (or
pre-processing two versions of) the pages, writing/testing two versions
of the same script and including server-side scripts (or a considerably
more involved server configuration)(with an inevitable increase in
server load) just to make it possible to serve the same content as HTML
and XHTML, with no perceivable difference to the user's experience of
the results. And that is assuming that all the added complexity works
the way it was intended 100% of the time.

So, we can content negotiate but what is the reward for all of that
extra trouble and expense?

Richard.

Sep 17 '05 #53

"Richard Cornford" <Ri*****@litotes.demon.co.uk> wrote:

Don't current Opera versions announce that they support XHTML while
stating a preference for HTML?
<= 7.2x do IIRC, more recent versions don't.
Shouldn't sensible content negotiation
only send XHTML in place of HTML when the UA expresses both support for
it _and_ a preference for receiving it?
It shouldn't matter theoretically, if practicality is an issue (as it
should be), then the whole XHTML exercise is imo at best pointless.
The notion that you can write static pages in Appendix C-style XHTML and
use content negotiation to decide which content-type header to send with
them really stands a very good chance of coming unstuck whenever there
is an intention to script those pages.
Indeed, I've got a link on one of my pages to your fine explanation of
the scripting issue a while back in alt.html.
A SCRIPT element in a static page
can only reference one URL but a browser receiving an HTML content type
will create an HTML DOM to be scripted and a browser receiving an XHTML
content-type will create an XHTML DOM to be scripted.

The single script file referenced from a static URL in the page will
struggle to accommodate both DOM types (as they need to be scripted very
differently). It is much less trouble (and probably about equivalent
work) to create two script resources, one for each DOM type, and serve
HTML DOM scripts to pages that have been served as HTML content types,
and XHTML DOM scripts to pages served as XHTML content types.
Unfortunately there is little reason to expect a browser's request for a
script to contain information that could be used to negotiate which
style of script to send (a script is a script, is a script, whichever
type of DOM is trying to load it). So serving scripts depending on which
content type you previously sent with the page that wants to load the
script becomes a problem.
I don't think you covered this specific issue in the alt.html post, I'll
add another link to this further explanation :)
But what has been done here? A requirement to provide essentially static
marked-up content with a script now involves dynamically generating (or
pre-processing two versions of) the pages, writing/testing two versions
of the same script and including server-side scripts (or a considerably
more involved server configuration)(with an inevitable increase in
server load) just to make it possible to serve the same content as HTML
and XHTML, with no perceivable difference to the user's experience of
the results. And that is assuming that all the added complexity works
the way it was intended 100% of the time.

So, we can content negotiate but what is the reward for all of that
extra trouble and expense?

Indeed.

--
Spartanicus

Sep 17 '05 #54

Toby Inkster

Richard Cornford wrote:

Don't current Opera versions announce that they support XHTML while
stating a preference for HTML?
Current ones: no.
Shouldn't sensible content negotiation only send XHTML in place of HTML
when the UA expresses both support for it _and_ a preference for
receiving it?

It should definitely take browser preference into account, but not
necessarily treat it as a gold standard.

For example, say I'm running a site with mulitple languages: English,
French and German. I create a new page for my site; my French translator
translates it into French; my German translator is on holiday, so I run
the page through an automatic translator to create a temporary poor German
translation, which will be replaced by a good translation at a later date.

If somebody visits my site using:

Accept-Language: de;q=1.0, en;q=0.9, it;q=0.1

Then it might be more sensible to send the English than the German. Apache
allows you to specify such things using a type-map file and the 'qs'
parameter.

URI: mypage.html

URI: mypage.en.html
Content-Language: en; q=1.0

URI: mypage.fr.html
Content-Language: fr; q=0.9

URI: mypage.de.html
Content-Language: de; q=0.2

Apache would multiply each q value with qs values to give:

de: 1.0 * 0.2 = 0.2
en: 0.9 * 1.0 = 0.9
fr: 0.0 * 0.9 = 0.0
it: 0.1 * 0.0 = 0.0

and serve up English.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Sep 18 '05 #55

On Sun, 18 Sep 2005, Toby Inkster wrote:

Richard Cornford wrote:
Shouldn't sensible content negotiation only send XHTML in place of
HTML when the UA expresses both support for it _and_ a preference
for receiving it?

What does the interworking specification say?

There may be some fine-details that aren't quite clear in the
specification, but I don't think any of them prevent us from answering
any of the questions raised on this thread.
It should definitely take browser preference into account,
There /is/ a published interworking specification for content-type
negotiation. It isn't exactly new! Client agents get what they asked
for[1]. If it isn't what they wanted, they shouldn't have asked for
it! If they unilaterally make requests over which their user has no
control, the user might be well advised to get a better browser.
but not necessarily treat it as a gold standard.
Sure. I'd recommend always offering alternative ways of accessing the
other variants.
For example, say I'm running a site with mulitple languages:
English, French and German. I create a new page for my site; my
French translator translates it into French; my German translator is
on holiday, so I run the page through an automatic translator to
create a temporary poor German translation, which will be replaced
by a good translation at a later date.

[snip details of qs= negotiation]

Indeed. But I repeat (and I say this also on my page
http://ppewww.ph.gla.ac.uk/~flavell/www/lang-neg.html ) that
negotiation should not be relied on as the sole way of accessing
the desired variant: alternative ways (i.e usually explicit links)
for users to choose the variants should also be provided.
[1] I note that in IE's case it means (seeing that our campus standard
MS Windows installation includes MS Office) that they always get MS
Word format, if available, in preference to HTML. Well, if that's
what they want, who am I to argue? :-}

Sep 18 '05 #56

Spartanicus wrote:

"Richard Cornford" <Ri*****@litotes.demon.co.uk> wrote:
Don't current Opera versions announce that they support
XHTML while stating a preference for HTML?
<= 7.2x do IIRC, more recent versions don't.

I haven't looked at Opera's Accept header recently so I can accept that
I am out of date here.

Shouldn't sensible content negotiation only send
XHTML in place of HTML when the UA expresses both support
for it _and_ a preference for receiving it?

It shouldn't matter theoretically,

It is possibly interesting to note that while early versions of Opera 7
would render XHTML they would not script it, and intermediate versions
would allow the scripting of intrinsic event attributes but would not
recognise (or at least act upon) SCRIPT elements, while the latest
versions allow XHTML to be fully scripted. Given that Opera 7+'s HTML
DOM was always scriptable (subject to scripting being enabled) that may
have been part of the reason for the browser expressing a preference for
HTML over XHTML in the past. It may also be a reason for other browsers
to express the same preference in the future and so a good practical
reason for content negotiators to observe the expressed preference.
if practicality is an issue (as it should be),
then the whole XHTML exercise is imo at best pointless.

Even though my expressed attitude towards XHTML may seem negative I am
not of the opinion that it is a bad idea as such. As a programmer I
quite like the idea of a more formally rigid mark-up language, where a
syntax error (or its equivalent) is fatal and final (and so needs to be
fixed on the spot). I would like to see a more disciplined approach to
web authoring where the results of guesswork are not more often as not
perceived as "working", and must be replaced by decision making based on
a technical understanding. I am not entirely convinced that XHTML will
deliver that in reality but I am not opposed to the experiment.

(Of course HTML does not deny the possibility of formal rigour or
informed authoring decisions, but an awful lot of web sites are created
without either)

My concerns almost entirely stem form practicalities; in a public
commercial context IE is so significant that it must be accommodated,
and IE doesn't understand XHTML at all. And so if XHTML is to be used at
all it must be XHTML for those that understand it and HTML (possibly
formally malformed HTML in the guise of Appendix C style XHTML) for IE
(or formally malformed HTML in the guise of XHTML for all). And that
strikes me as introducing so many issues (and particularly in my own
area of browser scripting), and so few rewards (none as far as I am
concerned) then using XHTML now seems like bad idea.

If, at some future time, it is possible to serve XHTML as
application/xhtml+xml with the expectation that all (or at least the
vast majority of) web browsers understand it, then the practical issues
become insignificant. (Issues around how well designed XHTML is remain)

The decision as to whether that future may eventually become a reality
is entirely down to Microsoft. If they introduce a browser that renders
XHTML then it may become viable in a commercial context and a switchover
might be perceived as having benefits. However, if they do that but take
their usual attitude and use a parser that error-corrects any old
nonsense into something useable/renderable, and compromise their XHTML
DOM implementation with all of the shortcuts, etc. that are in their
HTML DOM, then XHTML is lost forever. Becoming a different flavour of
tag soup, with all of Microsoft's competitors having to similarly
compromise their XHTML implementations in order not to seem broken
alongside this future IE version.

In the meanwhile HTML predictably delivers what is wanted with the least
trouble, effort and issues. The time for XHTML may come, but certainly
not yet.

<snip>

... . So
serving scripts depending on which content type you
previously sent with the page that wants to load the
script becomes a problem.

I don't think you covered this specific issue in the alt.html post,
I'll add another link to this further explanation :)

The alt.html post was in response to the notion of writing Appendix C
XHTML and only serving it as text/html (with the mistaken idea that it
may be possible to switch to application/xhtml+xml at some future time
without any consequences). So the issues arising from dynamically
choosing a content type based on Accept headers, and of content
negotiation to serve alternative mark-up, were not that relevant.

Richard.

Sep 18 '05 #57

Toby Inkster wrote:

Richard Cornford wrote:
Don't current Opera versions announce that they support
XHTML while stating a preference for HTML?
Current ones: no.

Fair enough, I haven't looked at Opera's Accept headers recently. (As
you have probably guessed, writing/serving XHTML does not occupy much of
my web authoring time).

Shouldn't sensible content negotiation only send XHTML in
place of HTML when the UA expresses both support for it
_and_ a preference for receiving it?

It should definitely take browser preference into account,
but not necessarily treat it as a gold standard.

For example, say I'm running a site with mulitple languages:
English, French and German. I create a new page for my site;

... a temporary poor German translation, which will be replaced
by a good translation at a later date.

If somebody visits my site using:

Accept-Language: de;q=1.0, en;q=0.9, it;q=0.1

Then it might be more sensible to send the English
than the German.

<snip>

OK, the quality of the resource is a factor, but we were talking about
sending the same content as HTML or XHTML depending on the user agent's
expression of support for XHTML. If it is the same content then it is
difficult to see how XHTML could be regarded as having a higher quality
by virtue of nothing more than being marked-up as XHTML.

The relevant part of my preceding response to you was mostly intended to
question the logic of saying; only send text/html when the browser's
Accept header does not announce support for XHTML. With the implication
that the browser's preferences would not be a factor in the decision. If
the content negotiation is to be done in accordance with the HTTP
specification (subject to correcting for IE's unhelpful Accept header)
then the browser's expressed preference is a factor in the decision.

I accept that your expression of your strategy may not have included
full details of your practice in content negotiation.

However, I am concerned that simplistic expressions of content
negotiation are resulting in actively bad manifestations of "content
negotiation" server-scripts. For example, my attention has recently been
drawn to two examples of PHP scripts that attempt to serve HTML or XHTML
based on a UA's Accept header by doing no more than searching the header
for the substring "application/xhtml+xml" and serve XHTML if it is
found. That is so far short of content negotiation that a UA that is, by
HTTTP standards, expressing an absolute rejection of XHTML will be
served XHTML.

Any growth in (or popularisation of) this type of stupidly simplistic
"content negotiation" must stand some chance of doing to the Accept
header what server-side browser detection has done to the User Agent
header; turn it from a potentially useful source of information that
could maximise the user's experience, into a meaningless sequence of
characters expediently chosen to do the browser manufactures least harm
in the face of incompetent web developers.

Richard.

Sep 18 '05 #58

Tim

Alan J. Flavell sent:

Indeed. I'm increasingly attracted by the idea of using Apache
Multiviews, even when there's only one document variant available, and
avoiding exposing the "filename extension" to the web.

Tim:
However, I've discovered a problem with Apache 1.3 (which my host uses),
related to language: My pages are English, and described as such. If
someone browses using a browser configured only for some other language,
they don't get mine regardless, they get a 406 error.
Alan J. Flavell:

My solution for that is described under
http://ppewww.ph.gla.ac.uk/~flavell/www/lang-neg.html
The whole page? (It's quite comprehensive.)
I take it your current pages are named internally as foobar.html.en or
foobar.en.html ? Then symlink foobar.html.html to them, and it will work.
Something like this:

ln -s foobar.html.en foobar.html.html
Actually, mine are currently named simply as pagename.html, with
(default/add) language defined in the config file. Though, upon
reflection, the server probably isn't using that in the manner I'd need it
to.

I don't think I'll go around symlinking every single file that's on the
server, that'd be a few hundred operations I'd have to do, never mind the
mess it'd create of maintaining the server over time.

It's a while since I experimented with it, but I don't recall neither
pagename.html or pagename.html.en helping any. I would have thought the
former would have provided some default page, with the latter being more
problematic as an unacceptable language variant. But they both seem as
bad as each other.
Apache 2 does allow the webmaster to preselect the document you'll get
regardless, when there's no other obviously suitable choice. Indeed, but, before we moved to 2, I needed a solution for 1.3
I've been having a little war of words with my webhost, virtually accusing
them of not knowing what they were doing (for various reasons, along with
prior comments asking about when were they going to update to Apache 2),
resulting in being given SSH access without having to pay extra. I think
it's their way of saying to me, "well fix it yourself if you think you
know better".

I'd say this is a fault in two halves: The browser for not promoting
the concept of flagging more than one language when you configure it, Agreed. It's worse than that! For example, USA users get their MeSsIE
configured for them to say that they accept only en-US. So when I honestly
advertise my pages as en-GB, they get told there is no acceptable language
available for them. Sigh.
They're not the only one. I'm Australian, and it's common for the
defaults to be en-US, not en-AU, despite the system asking for your
location when first installing (along with various other stupid things
that ignore your location, requiring individual manual configuration).

Linux seems less stupid than Windows, in that regard, but the latest
version of Fedora (4) seems to think that all English speaking people are
American. It didn't even bother to offer an option for localisation, and
it seems difficult to do post-installation.
and the older webserver for not having fallback options.

Nevertheless, the workaround is simple enough and it works (under
unix-like OSes, anyway - I'm not sure how that works out on win32
platforms, I didn't actually try it, knowing that "shortcuts" aren't
really a direct synomym of what in unix-like file systems we call a
symlink or soft link).

My host is using Linux, I'm sure shortcuts wouldn't cut it (they haven't
worked in other ways in the webserver files where a symlink would work,
when I've tried it), but I really can't see myself making a symlink for
every file. If my host won't lift its game, I'll be looking for a new
one, especially as I'm nearing the end of my first year (it bills
annually, in advance).

For some strange reason I seem to have a lot of French referrals to a page
on my website (decrying a few stupid website issues), and they mostly get
406 errors. Since I haven't managed to solve the problem, nor be able to
customise the 406 error in what I consider the sensible way (offer the
unacceptable variant, with an understandable explanation before the link),
I've written one with a fair bit of detail about the problem, and
suggesting that if the reader can read this message, they should add
English to the languages their browser accepts, it'll help them with my
site and many others.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please destroy some files yourself.

Sep 18 '05 #59

On Sun, 18 Sep 2005, Tim wrote:

Alan J. Flavell:
My solution for that is described under
http://ppewww.ph.gla.ac.uk/~flavell/www/lang-neg.html
The whole page?

No, the detail which I included into the posting (discussed in more
detail somewhere on that page, though).
(It's quite comprehensive.)
I'm not sure if that's a gripe or a compliment ;-)
Actually, mine are currently named simply as pagename.html, with
(default/add) language defined in the config file. Though, upon
reflection, the server probably isn't using that in the manner I'd
need it to.
I don't think that is compatible with what I'm suggesting.

I take it your current pages are named internally as foobar.html.en or
foobar.en.html ? Then symlink foobar.html.html to them, and it will work.
Something like this:

ln -s foobar.html.en foobar.html.html

I don't think I'll go around symlinking every single file that's on
the server,
Up to you. It's the best I came up with for my requirements.
The alternative is to write typemap files.

If I wanted to do either, I'd write a Makefile, and run make on the
site, with a little Perl script to do the business.
that'd be a few hundred operations I'd have to do,
Well, not you, but your script!
never mind the mess it'd create of maintaining the server over time.
Oh, hardly! Moving to Makefiles would rate to make the server
more maintainable than before, IMHO.
It's a while since I experimented with it, but I don't recall neither
pagename.html or pagename.html.en helping any.

I think it should, as long as you're not referencing the .html in your
URLs. If you /are/ so doing, then the .html.html trick in the
filenames will fix that, as I discuss in the related part of my page.

good luck.

Sep 18 '05 #60

"Richard Cornford" <Ri*****@litotes.demon.co.uk> wrote:

It is possibly interesting to note that while early versions of Opera 7
would render XHTML they would not script it, and intermediate versions
would allow the scripting of intrinsic event attributes but would not
recognise (or at least act upon) SCRIPT elements, while the latest
versions allow XHTML to be fully scripted. Given that Opera 7+'s HTML
DOM was always scriptable (subject to scripting being enabled) that may
have been part of the reason for the browser expressing a preference for
HTML over XHTML in the past.
Another possible reason for Opera <=7.2 to declare a lesser ability with
regard to XHTML was that prior to V7.5 Opera's XML parser was not able
to render character entity references.
Even though my expressed attitude towards XHTML may seem negative I am
not of the opinion that it is a bad idea as such. As a programmer I
quite like the idea of a more formally rigid mark-up language, where a
syntax error (or its equivalent) is fatal and final (and so needs to be
fixed on the spot). I would like to see a more disciplined approach to
web authoring where the results of guesswork are not more often as not
perceived as "working", and must be replaced by decision making based on
a technical understanding. I am not entirely convinced that XHTML will
deliver that in reality but I am not opposed to the experiment.
On that issue all that true XHTML has to offer is a check for well
formedness. Well formed code can still be invalid and most importantly
dreadful markup. Well formedness is imo merely a technical requirement
for code to be parsed with an XML parser, it has no other benefits. The
notion that browsers throwing parsing errors when they encounter mall
formed code will somehow make authors write better markup is imo wishful
thinking.

Imo a potential benefit of true XHTML is mixed name space documents.

There's also the potential advantage that parsing true XHTML uses
significantly less resources, this would be an advantage on for example
mobile platforms. The snag here is content. A mobile phone with only an
XML parser will not be able to use the vast amount of text/html content
currently on the web. To utilize the resource advantage new content
would have to be created for such clients. If I was a phone manufacturer
I would not be keen to make such a device, people wouldn't be able to do
much with it.
If, at some future time, it is possible to serve XHTML as
application/xhtml+xml with the expectation that all (or at least the
vast majority of) web browsers understand it, then the practical issues
become insignificant.
Maybe, UAs in general would have to become compatible, not just
browsers.
The decision as to whether that future may eventually become a reality
is entirely down to Microsoft. If they introduce a browser that renders
XHTML then it may become viable in a commercial context and a switchover
might be perceived as having benefits.

Even if IE7 will have that capability, it will take a long time for the
number of users who's browser is not capable of rendering true XHTML to
fall below a level where it becomes viable to ignore them.

--
Spartanicus

Sep 18 '05 #61

Spartanicus wrote:

Richard Cornford wrote: <snip>
... . I would like to see a more disciplined approach to
web authoring where the results of guesswork are not more
often as not perceived as "working", and must be replaced
by decision making based on a technical understanding. I
am not entirely convinced that XHTML will deliver that in
reality but I am not opposed to the experiment.

On that issue all that true XHTML has to offer is a check
for well formedness.

There is well-formedness in XML terms and there is the possibility of
enforcing the DTD/Schema rules as to which elements may contain which
other elements, eliminating one type of tag soup nonsense.
Well formed code can still be invalid and most importantly
dreadful markup. Well formedness is imo merely a technical
requirement for code to be parsed with an XML parser, it
has no other benefits.
Much as following the syntax rules in a programming language in no way
ensures that the results will be good, or even functional. I think of it
more in terms of the possible influence on the attitude of the creators
of marked-up documents. Any unavoidable increase in the formal
requirements may encourage individuals who are forced to learn syntax
rules to go on to learn the full set of applicable rules.

As it stands almost any sequence of characters will produced a 'result'
of some sort or another in at least one tag soup browser. And that seems
to allow some individuals to work in web development for years without
apparently ever realising that HTML has anything objectively knowable
behind it.

People learning programming languages don't sustain the notion that what
they think they should be able to do has any relevance for very long, at
least in part because the imposition of syntax rules stop them short if
they try to make it up off the top of their heads. A switch to reading
and understanding documentation, specifications, etc. is easily seen as
the only productive way forward in that context. The practicalities
encourage a particular attitude towards the task.
The notion that browsers throwing parsing errors when they
encounter mall formed code will somehow make authors write
better markup is imo wishful thinking.
(: At the very least optimistic thinking.
Imo a potential benefit of true XHTML is mixed name space
documents.
Yes, one of the reasons that I will not dismiss XHTML out of hand. In my
own work there would be huge advantages in being able to include
architectural CAD drawings in documents. Mixing SVG and XHTML may allow
that, in the meanwhile there is lots of fun to be had with SVG plug-ins.

<snip>
The decision as to whether that future may eventually
become a reality is entirely down to Microsoft. ...

<snip> Even if IE7 will have that capability, it will take a long
time for the number of users who's browser is not capable
of rendering true XHTML to fall below a level where it
becomes viable to ignore them.

Yes, two to eight years if the history of the widespread adoption of web
technologies has anything to say on the subject. And I have seen nothing
to suggest that IE 7 will represent Microsoft moving in this direction.

Richard.

Sep 18 '05 #62

"Richard Cornford" <Ri*****@litotes.demon.co.uk> wrote:

On that issue all that true XHTML has to offer is a check
for well formedness.
There is well-formedness in XML terms and there is the possibility of
enforcing the DTD/Schema rules as to which elements may contain which
other elements, eliminating one type of tag soup nonsense.

That is equally possible for HTML. But I hope that you are not seriously
suggesting that browsers should have a validator added to them. Apart
from the resource issue, users should not be bothered with such issues.

Not that it would do much good anyway, first because validity is also of
little consequence for the quality of the code as experienced by the
user. Secondly because a browser's HTML parser will continue to need
their extensive error recovery mechanism modeled after the behaviour of
IE to deal with the legacy content out there. So even if browsers were
to be equipped with a validator, and authors would then suddenly all
start producing valid code (why should they, it works no?), it would
have little perceivable benefit.

Well formed code can still be invalid and most importantly
dreadful markup. Well formedness is imo merely a technical
requirement for code to be parsed with an XML parser, it
has no other benefits.

Much as following the syntax rules in a programming language in no way
ensures that the results will be good, or even functional. I think of it
more in terms of the possible influence on the attitude of the creators
of marked-up documents. Any unavoidable increase in the formal
requirements

For it to be "unavoidable", a browser's validating error should result
in a no show, not just a status line message saying "this document is
invalid". That would impede user access to legacy content. If it doesn't
result in a no show, but only produces a status bar message, then I'm
not optimistic about the effect on authors.
may encourage individuals who are forced to learn syntax
rules to go on to learn the full set of applicable rules.
I don't share your optimism on this.
As it stands almost any sequence of characters will produced a 'result'
of some sort or another in at least one tag soup browser. And that seems
to allow some individuals to work in web development for years without
apparently ever realising that HTML has anything objectively knowable
behind it.
Imo this has in no small measure contributed to lowering the threshold
for the lay person to publicize on the web. This I value much more than
a technically sound construct.
People learning programming languages don't sustain the notion that what
they think they should be able to do has any relevance for very long, at
least in part because the imposition of syntax rules stop them short if
they try to make it up off the top of their heads.
In this context there is considerable merit in teaching to be strict in
what to produce, but lenient in what to accept. I'm not in favor of a
comparison with programming, it often fails imo. For one programming
syntax errors are produced to the author only, not the user (for
compiled code anyway).

Imo a potential benefit of true XHTML is mixed name space
documents.

Yes, one of the reasons that I will not dismiss XHTML out of hand. In my
own work there would be huge advantages in being able to include
architectural CAD drawings in documents. Mixing SVG and XHTML may allow
that

But SVG can be used as it stands, even in this case the benefits are
afaics mainly of a technical nature only relevant to authors.
in the meanwhile there is lots of fun to be had with SVG plug-ins.

Being able to author mixed name space documents would not have an effect
on the browser's ability to render SVG.

Even if IE7 will have that capability, it will take a long
time for the number of users who's browser is not capable
of rendering true XHTML to fall below a level where it
becomes viable to ignore them.

Yes, two to eight years if the history of the widespread adoption of web
technologies has anything to say on the subject.

I'd consider eight years as the minimum, no IE7 for anyone except folk
running at least XP with the latest service packs afaik.

--
Spartanicus

Sep 19 '05 #63

Brian

Alan J. Flavell wrote:

There /is/ a published interworking specification for content-type
negotiation. It isn't exactly new! Client agents get what they
asked for[1]. [...] [1] I note that in IE's case it means (seeing that our campus
standard MS Windows installation includes MS Office) that they always
get MS Word format, if available, in preference to HTML. Well, if
that's what they want, who am I to argue? :-}

It's precisely this behavior that made me give up content negotiation
for a contest entry application that I make available in HTML markup,
plain text, and MS Word. Content negotiation in conjunction with
MSIE/Win caused the MS Word version to load. As you well know, the HTML
version was last in line. The confusion that was likely to cause IE/Win
users was unacceptable to me.

But since I have explicit links to the other variants anyways, was
content negotiation really necessary? I'm not sure if users would
explicitly configure their user agent to retrieve an plain text or MS
Word variant over an HTML one.

--
Brian

Sep 19 '05 #64

Spartanicus wrote:

Richard Cornford wrote:
On that issue all that true XHTML has to offer is
a check for well formedness.
There is well-formedness in XML terms and there is the
possibility of enforcing the DTD/Schema rules as to which
elements may contain which other elements, eliminating one
type of tag soup nonsense.

That is equally possible for HTML. But I hope that you
are not seriously suggesting that browsers should have
a validator added to them.

If 'seriously' means my having any expectation that it would ever
happen, then no, I am not seriously suggesting it.
Apart from the resource issue, users should not be
bothered with such issues.
If IE responded to structurally incorrect mark-up by putting up an big
dialog saying "This page has been incorrectly authored" (or anything
else that unambiguously blamed the web author for errors/faults) then
the user would never be bothered by it, because the people writing the
pages would be too embarrassed to publish pages that caused it to show.

<snip>
Much as following the syntax rules in a programming language
in no way ensures that the results will be good, or even
functional. I think of it more in terms of the possible
influence on the attitude of the creators of marked-up
documents. Any unavoidable increase in the formal requirements

For it to be "unavoidable", a browser's validating error
should result in a no show, not just a status line message
saying "this document is invalid".

It doesn't have to represent a no-show (though unless progressive
rendering happened that would be a consequence), and that would tend to
make the browser/user's computer look broken. A nice obvious dialog box
that could not be switched off, and placed the blame where it belonged,
would be sufficient.
That would impede user access to legacy content.
There is legacy content in XHTML? :)
If it doesn't result in a no show, but only produces
a status bar message, then I'm not optimistic about
the effect on authors.
Yes, and in practice a status bar message is probably about the most
extreme reaction we can expect from browsers.

<snip>
As it stands almost any sequence of characters will
produced a 'result' of some sort or another in at
least one tag soup browser. <snip>
Imo this has in no small measure contributed to lowering
the threshold for the lay person to publicize on the web.
This I value much more than a technically sound construct.
It is quite nice that HTML + web browsers are friendly enough to allow
virtually anyone to create a 'web page'. It is just a bit irritating to
find people who would describe themselves as professionals using that as
an excuse for never actually understanding what they are doing.

People learning programming languages don't sustain the
notion that what they think they should be able to do has
any relevance for very long, at least in part because the
imposition of syntax rules stop them short if they try to
make it up off the top of their heads. <snip> ... . I'm not in favor of a comparison with programming,
it often fails imo. For one programming syntax errors are
produced to the author only, not the user (for compiled
code anyway).

Which is in part what I am getting at. If the browser's response to
errors was sufficiently obvious then they would only be a matter for
authors as the authors would be well motivated to correct them before
the user had a chance to experience them.

<snip>
Even if IE7 will have that capability, it will take
a long time for the number ...

<snip>Yes, two to eight years if the history of the widespread
adoption of web technologies has anything to say on the
subject.

I'd consider eight years as the minimum, no IE7 for anyone
except folk running at least XP with the latest service
packs afaik.

However long it may take XHTML is not for the present. Its future might
be a subject for idle speculation but that is about all.

Richard.

Sep 21 '05 #65

Nick Kew

Richard Cornford wrote:

Apart from the resource issue, users should not be
bothered with such issues.

If IE responded to structurally incorrect mark-up by putting up an big
dialog saying "This page has been incorrectly authored" (or anything
else that unambiguously blamed the web author for errors/faults) then
the user would never be bothered by it, because the people writing the
pages would be too embarrassed to publish pages that caused it to show.

No need to go that far. Just a little bright red broken-!! tucked away
somewhere in the manner of that favicon, or the SSL padlock. And
another one for when the browser error-corrects bogus HTTP (as if).

Users needn't be bothered with what it means at all. Some will, and
the manual will tell them. Maybe clicking any of the "broken" icons
could pop up an explanation. It's all simple UI design.

Lynx has flagged Bad Markup while also rendering it for many years.

--
Nick Kew

Sep 21 '05 #66

On Wed, 21 Sep 2005, Nick Kew wrote:

No need to go that far. Just a little bright red broken-!! tucked
away somewhere in the manner of that favicon, or the SSL padlock.
And another one for when the browser error-corrects bogus HTTP (as
if).
I've little hope of that. This is a browser-like object whose primary
selling point seems to be to the *makers* of web pages, not to the
recipients. IMHO one only has to compare the documentation offered
for the two types of user to come to a conclusion about *that*.

I don't believe that the developers would do anything much to
embarrass the makers of web pages "du jour", no matter how much it
rated to inform the users. The best one can hope for is something
aimed at *malicious* web pages, but when e.g the user message for
javascript from untrusted web pages says nothing more threatening than
"Scripts are usually safe"[1], I don't expect a lot. If you want a
browser that's aimed at users, choose a competing product.

[1] My translation: scripts are occasionally disastrous, and executing
untrusted scripts has been known to result in trashing not only the
browser but the entire OS. There's been lots of sticking plaster
applied since I last saw that happen with IE, but I've seen few fixes
where I thought the fundamental issue had really been addressed.

One of our users happily infected himself with a Trojan yesterday. If
he'd observed the local recommendations and used a www-compatible
browser instead of IE, this wouldn't have happened. Fortunately, the
anti-virus product spotted what he'd done, but that isn't something
that one can rely on.
Users needn't be bothered with what it means at all. Some will, and
the manual will tell them. Maybe clicking any of the "broken" icons
could pop up an explanation. It's all simple UI design.
It needs more than UI design - it needs motivation. Are IE users
telling MS in no uncertain terms that they demand this or else they
use a competing product? I doubt that many of those who chose a
competing product would put that reason very near the top of their
list, TBH, even if asked the question directly. I wish it were
otherwise, but that's how it seems to me.
Lynx has flagged Bad Markup while also rendering it for many years.

To be fair, it's flagged *some kinds of* bad markup. The absence of a
warning doesn't by any means guarantee the absence of errors, in my
experience.

all the best

Sep 21 '05 #67

On Wed, 21 Sep 2005, Richard Cornford wrote:

There is legacy content in XHTML? :)
Oh yes :-{

There's been many years already for those attracted by sexy XHTML,
while caring nothing for the W3C's hopes and prayers, to accrete a
considerable volume of XHTML-flavoured tag soup to accompany the
existing legacy of HTML-flavoured tag soup.
It is quite nice that HTML + web browsers are friendly enough to
allow virtually anyone to create a 'web page'.

True, but it's even more unfortunate that it seems to need a real
expert to produce true simplicity. The average page that I've seen
from untrained beginners has been awash with incredibly complex hacks
that they seem to have inherited from somewhere without the slightest
comprehension of what they're doing.

Sep 21 '05 #68

"Richard Cornford" <Ri*****@litotes.demon.co.uk> wrote:

If IE responded to structurally incorrect mark-up by putting up an big
dialog saying "This page has been incorrectly authored" (or anything
else that unambiguously blamed the web author for errors/faults) then
the user would never be bothered by it, because the people writing the
pages would be too embarrassed to publish pages that caused it to show.

Only for new content authored by people who use that particular version
of IE. A huge part of the content on the web is legacy content that
isn't and won't be updated.

To weigh down a browser like that and pester users with issues that are
none of their concern, nor within their control to fix in the hope that
a tiny percentage amongst those users are the people you want to shame
into learning how to appease a dumb bot that checks for rigid syntax
rules that have neither much influence on the quality of the code as
perceived by a user, or necessarily any effect on browser rendering is,
ridiculous.

That would impede user access to legacy content.

There is legacy content in XHTML? :)

Is your case for adding a validator to browsers intended to apply only
to true XHTML and not to XHTML served as text/html?

... . I'm not in favor of a comparison with programming,
it often fails imo. For one programming syntax errors are
produced to the author only, not the user (for compiled
code anyway).

Which is in part what I am getting at. If the browser's response to
errors was sufficiently obvious then they would only be a matter for
authors as the authors would be well motivated to correct them before
the user had a chance to experience them.

(Leaving out the resource consideration for the sake of discussing this
specific point) Syntax errors often found in code in interpreted
languages such as HTML, CSS and javascript could be reduced
significantly if there was a requirement on the parser to present an
error message to the user. But for that to work it would have to be a
mandatory part of such parsers from the beginning. If it isn't, then
introducing it as an after thought can only result in frustrating end
users.

--
Spartanicus

Sep 21 '05 #69

Michael Winter

On 18/09/2005 14:24, Richard Cornford wrote:

[snip]

The decision as to whether [widespread XHTML] may eventually become a
reality is entirely down to Microsoft. If they [...] take their usual
attitude and use a parser that error-corrects any old nonsense into
something useable/renderable, and compromise their XHTML DOM
implementation with all of the shortcuts, etc. that are in their HTML
DOM, then XHTML is lost forever.

I recently read an article from ICEsoft that seems to suggest that they
have no interest in making ICEbrowser a conforming XML parser.

<URL:http://support.icesoft.com/jive/entry!default.jspa?categoryID=20&entryID=23>

Rather depressingly, the reasons they seem to be using are based solely
on their perception of HTML, rather than the fact that they are dealing
with a new language that can be treated differently (when served as
XHTML, of course). I can agree that they don't want to implement a
validating parser[1], but that they don't want to enforce XML syntax
rules is very disappointing.

Does this article actually apply to documents served as
application/xhtml+xml? It isn't specific in this regard.

[snip]

Mike
[1] I would /like/ to think that such an implementation was a good
idea. However, it's a shame that Spartanicus is correct in
saying that validation in no way implies good authoring
practices. After all, validity is frequently declared like a
badge of honour, rather than something that should be expected
(barring any overriding reasons).

--
Michael Winter
Prefix subject with [News] before replying by e-mail.

Sep 21 '05 #70

Toby Inkster

Richard Cornford wrote:

However, I am concerned that simplistic expressions of content
negotiation are resulting in actively bad manifestations of "content
negotiation" server-scripts.

http://groups.google.com/groups?q=au...r+substr_count

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Sep 21 '05 #71

Stewart Gordon

Alan J. Flavell wrote:

On Fri, 16 Sep 2005, Stewart Gordon wrote: <snip> It's close enough to real life for me. If you insist on having two
URLs that differ only in letter case, then you'd probably have
difficulties, I suppose. I never tried it. Best read the
win32-specific release notes if you want to know the sordid details.

<snip>

If you're developing under Windows, it goes without saying that you'd
avoid having two URLs that differ only in case but point to different
pages. But you still might want to check that you've done all the links
correctly.

But then again, a simple link checker tool would probably do this
checking for you. And personally, my routine is to use only lowercase
for names of files that I'm going to put on the web (except when the
names are generated by a program such as Doxygen...).

Stewart.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M d- s:- C++@ a->--- UB@ P+ L E@ W++@ N+++ o K-@ w++@ O? M V? PS-
PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y
------END GEEK CODE BLOCK------

My e-mail is valid but not my primary mailbox. Please keep replies on
the 'group where everyone may benefit.

Sep 22 '05 #72

Nick Kew wrote:

Richard Cornford wrote:
Apart from the resource issue, users should not be
bothered with such issues.
If IE responded to structurally incorrect mark-up by putting
up an big dialog saying "This page has been incorrectly
authored" (or anything else that unambiguously blamed the
web author for errors/faults) then the user would never be
bothered by it, because the people writing the pages would
be too embarrassed to publish pages that caused it to show.

No need to go that far. Just a little bright red broken-!!
tucked away somewhere in the manner of that favicon, or the
SSL padlock. And another one for when the browser
error-corrects bogus HTTP (as if).

When a script error happens in IE a small yellow symbol appears in the
status bar. That just isn't enough. Because scripting browsers is a
significant proportion of what I do I invariably have IE configured to
actually pop-up its error dialog in addition to showing that symbol. And
when I browse the Internet that error dialog pops up on a daily basis
(google being a constant offender (but only because I use google quite a
lot). If the authors were seeing that error dialog (and expecting the
user to see them) those faulty scripts would not be being exposed on the
public Internet, but the little yellow symbol is apparently easy for an
author who doesn't care to disregard.
Users needn't be bothered with what it means at all.

<snip>

In principle users should never even see them because they should never
be provoked. And in that case the nature of the statement is not an
issue for the user, only the developer, who should not be able to avoid
them.

Richard.

Sep 25 '05 #73

Alan J. Flavell wrote:

On Wed, 21 Sep 2005, Richard Cornford wrote:
There is legacy content in XHTML? :)

Oh yes :-{

There's been many years already for those attracted by
sexy XHTML, while caring nothing for the W3C's hopes and
prayers, to accrete a considerable volume of XHTML-flavoured
tag soup to accompany the existing legacy of HTML-flavoured
tag soup.

But when does XHTML-flavoured tag soup (served as text/html) become
XHTML. We see plenty of incidence of XHTML-style "<br />" appearing in
documents that have HTML doctypes, but wouldn't claim that those are
XHTMl documents. So there are a number of possible criteria for
considering a document to be XHTML when it is not being served as
XHTML:-

1. The author thinks he/she is writing XHTML
2. The document has an XHTML doctype (and possible exclusively XML/XHTML
(but probably Appendix C) style mark-up).
3. A (preferably bug free) validator can be persuaded to declare the
document valid XHTML

If a document is served as text/html the browser will treat it as HTML
tag soup so if it is considered to be XHTML it is only considered to be
such in the mind of some observer.

It is quite nice that HTML + web browsers are friendly
enough to allow virtually anyone to create a 'web page'.

True, but it's even more unfortunate that it seems to need
a real expert to produce true simplicity. The average page
that I've seen from untrained beginners has been awash with
incredibly complex hacks that they seem to have inherited
from somewhere without the slightest comprehension of what
they're doing.

Yes; authoring by mystical incantation. And justified because, by some
criteria, it "works". Unfortunately it is not a practice that is limited
to the beginner.

Richard.

Sep 25 '05 #74

Spartanicus wrote:

"Richard Cornford" <Ri*****@litotes.demon.co.uk> wrote:

<snip>

That would impede user access to legacy content.

There is legacy content in XHTML? :)

Is your case for adding a validator to browsers intended to
apply only to true XHTML and not to XHTML served as text/html?

Yes, it could never be practical to retroactively attempt to impose any
additional restrictions on HTML, and 'XHTML' served as text/html is HTML
(as far as the receiving software is concerned).

... . I'm not in favor of a comparison with programming,
it often fails imo. For one programming syntax errors are
produced to the author only, not the user (for compiled
code anyway).

Which is in part what I am getting at. If the browser's
response to errors was sufficiently obvious then they
would only be a matter for authors as the authors would
be well motivated to correct them before the user had a
chance to experience them.

(Leaving out the resource consideration for the sake of
discussing this specific point) Syntax errors often found
in code in interpreted languages such as HTML, CSS and
javascript could be reduced significantly if there was a
requirement on the parser to present an error message to
the user. But for that to work it would have to be a
mandatory part of such parsers from the beginning. If it
isn't, then introducing it as an after thought can only
result in frustrating end users.

Yes, from the begging, or sufficiently close to the begging. Which is
why it can be the subject of speculation concerning a future adoption of
XHTML. It isn't going to happen though, is it? ;)

Richard.

Sep 25 '05 #75

Michael Winter wrote:

On 18/09/2005 14:24, Richard Cornford wrote:
The decision as to whether [widespread XHTML] may eventually
become a reality is entirely down to Microsoft. If they [...]
take their usual attitude and use a parser that error-corrects
any old nonsense into something useable/renderable, and
compromise their XHTML DOM implementation with all of the
shortcuts, etc. that are in their HTML DOM, then XHTML is
lost forever.
I recently read an article from ICEsoft that seems to
suggest that they have no interest in making ICEbrowser
a conforming XML parser.

<URL:http://support.icesoft.com/jive/entry!default.jspa?categoryID=20&en
tryID=23>
Rather depressingly, the reasons they seem to be using are
based solely on their perception of HTML, rather than the
fact that they are dealing with a new language that can be
treated differently (when served as XHTML, of course).

I can agree that they don't want to implement a
validating parser[1], but that they don't want to enforce
XML syntax rules is very disappointing.
Over the years ice soft have put a lot of effort into making their
browser behave like IE, for fairly obvious reasons of expedience. They
have done such a good job of doing that that many are not aware
IceBrowser exists, and are not going to be taking any measures to
accommodate its peculiarities. That may mean making IceBrowser
ridiculously tolerant.
Does this article actually apply to documents served as
application/xhtml+xml? It isn't specific in this regard.

<snip>

It certainly isn't clear, and I don't have a working evaluation version
of IceBrowser to check its Accept header. If they are announcing
acceptance of application/xhtml+xml then this decision may be in
anticipation of how they expect Microsoft to act, and I would not be
surprised to find an XHTML comprehending future version of IE being
exactly as tolerant.

Richard.

Sep 25 '05 #76

"Richard Cornford" <Ri*****@litotes.demon.co.uk> wrote:

Is your case for adding a validator to browsers intended to
apply only to true XHTML and not to XHTML served as text/html?
Yes, it could never be practical to retroactively attempt to impose any
additional restrictions on HTML, and 'XHTML' served as text/html is HTML
(as far as the receiving software is concerned).

(Leaving out the resource consideration for the sake of
discussing this specific point) Syntax errors often found
in code in interpreted languages such as HTML, CSS and
javascript could be reduced significantly if there was a
requirement on the parser to present an error message to
the user. But for that to work it would have to be a
mandatory part of such parsers from the beginning. If it
isn't, then introducing it as an after thought can only
result in frustrating end users.

Yes, from the begging, or sufficiently close to the begging. Which is
why it can be the subject of speculation concerning a future adoption of
XHTML.

The same problems occur when you only apply it to true XHTML, since
validation isn't part of current day XHTML parsers, true XHTML currently
found on the net is likely to be full of validation errors. There is far
less true XHTML on the net compared to stuff served as text/html, but
the principle flaw remains.
It isn't going to happen though, is it? ;)

Let's hope not, it makes no sense at all.

--
Spartanicus

Sep 25 '05 #77

Toby Inkster

Richard Cornford wrote:

But when does XHTML-flavoured tag soup (served as text/html) become
XHTML.

If a dcoument uses an XHTML namespace, XHTML doctype and validates against
the XHTML doctype supplied, then it is XHTML.

The Content-Type header is used by one possible transport mechanism and is
not part of the document itself.

Talk of "true XHTML documents must use application/xhtml+xml" becomes
laughable when you consider that the documents may very well be served
over FTP, via the local file system, or via some other transport mechanism
that doesn't specify a content type.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Now Playing ~ ./coldplay/parachutes/04_sparks.ogg

Sep 25 '05 #78

Norman L. DeForest

On Fri, 16 Sep 2005, Alan J. Flavell wrote:

On Fri, 16 Sep 2005, Stewart Gordon wrote:
Alan J. Flavell wrote: [...]

Is it possible/easy to get hold of a staging server for Windows that
emulates a Unix webserver, or vice versa?

see below re win32 Apache, close enough to the Apache-based servers
which most service providers seem to use. Certainly I've never had
any real problems relative to the linux-based production Apache server
which we run at our research group.
Stuff like case-sensitivity....

It's close enough to real life for me. If you insist on having two

^^^^^^^^^^^^^^^^^^^^^^^^^^^ URLs that differ only in letter case, then you'd probably have ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ difficulties, I suppose. I never tried it. Best read the ^^^^^^^^^^^^^^^^^^^^^^^^ win32-specific release notes if you want to know the sordid details.

[snip]

See:
http://www.chebucto.ns.ca/~af380/Profile.html
and:
http://www.chebucto.ns.ca/~af380/profile.html

The first filename is the default home page for users at my ISP.
The second solved the problem of users mistyping the filename for
the first one in all lower-case. "I tried to go to your home page
but all I got was a 404 error."

(No, they aren't fancy or pretty and maybe not even valid. My site just
sort of grew (like fungus) and only recently have I had graphical access
to see my own pages with anything other than Lynx. Conversion to valid
and graphically pleasing pages depends on finding the spare time needed
for the job. Maybe by 2015....)

--
``Why don't you find a more appropiate newsgroup to post this tripe into?
This is a meeting place for a totally differnt kind of "vision impairment".
Catch my drift?'' -- "jim" in alt.disability.blind.social regarding an
off-topic religious/political post, March 28, 2005

Sep 25 '05 #79

Toby Inkster wrote:

Richard Cornford wrote:
But when does XHTML-flavoured tag soup (served as text/html)
become XHTML.
If a dcoument uses an XHTML namespace, XHTML doctype and
validates against the XHTML doctype supplied, then it is XHTML.

An objective criteria, and eliminating any concern that legacy
XHTML-flavoured tag soup might suffer from any possible more ridged
handling of XHTML by excluding any of it that may be problematic.
The Content-Type header is used by one possible transport
mechanism and is not part of the document itself.

Talk of "true XHTML documents must use application/xhtml+xml"
becomes laughable when you consider that the documents may very
well be served over FTP, via the local file system, or via some
other transport mechanism that doesn't specify a content type.

It clearly is ridiculous to say that XHTML that is not served as
application/xhtml+xml is not true XHTML. However, it is completely
reasonable to assert that XHTML (and XHTML-like tag soup) served as
text/html is tag soup HTML. It is HTML because the receiving browser
will treat it as (tag soup) HTML (whether it supports XHTML or not). If
a document is destined to be interpreted as (error-filled) HTML it seems
reasonable to question the circumstances under which it might also be
regarded as XHTML.

Richard.

Sep 25 '05 #80

On Sun, 25 Sep 2005, Norman L. DeForest wrote:

On Fri, 16 Sep 2005, Alan J. Flavell wrote:
On Fri, 16 Sep 2005, Stewart Gordon wrote:
[question about using a Win32 Apache for local verification
before uploading to a unix-style system...]
see below re win32 Apache, close enough to the Apache-based servers
which most service providers seem to use. Certainly I've never had
any real problems relative to the linux-based production Apache server
which we run at our research group.
Stuff like case-sensitivity....
It's close enough to real life for me. If you insist on having two

^^^^^^^^^^^^^^^^^^^^^^^^^^^
URLs that differ only in letter case, then you'd probably have

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^
difficulties, I suppose. I never tried it. Best read the

^^^^^^^^^^^^^^^^^^^^^^^^
win32-specific release notes if you want to know the sordid details.

[snip]

Having reviewed the Win32 notes, I have to admit that they don't
seem to be as helpful as I had hoped. However, at this URL:

http://httpd.apache.org/docs-2.0/sec...l#file-and-web

there are some useful notes aimed at drawing the distinction between
between URL paths and file system paths on a case-insensitive file
system, and the unfortunate effect of making an inappropriate choice
of <Location> directive for controlling access to a particular
resource.

Naturally, even Win32 Apache can see case differences in /URLs/ ,
and take appropriate actions via directives: it's only "if and when" a
URL finally hits a file path that the case insensitivity comes to
light.

See:
http://www.chebucto.ns.ca/~af380/Profile.html
Apparently a unix-based Apache 1.3 server, which does not
exhibit the case-insensitive behaviour that is under discussion.
and:
http://www.chebucto.ns.ca/~af380/profile.html

The first filename is the default home page for users at my ISP.
Is it? Then its advertised URL ought, according to normal good
practice, to be http://www.chebucto.ns.ca/~af380/ , without the
specific file path at the end. However, that URL seems to go to a
different page, so I'm not sure in what sense Profile.html is the
"default home page".
The second solved the problem of users mistyping the filename for
the first one in all lower-case.
This is an unfortunate choice, since the second URL leads to a
different web page and returns 200 OK, meaning that to an indexing
robot it appears to be substantive content, rather than an anomalous
(error) condition. I would strongly recommend handling such issues
with some kind of status that indicates that the URL is irregular:
this could, for example, be done with a redirection status (301 would
be appropriate), or with a custom error page.

In fact, enabling mod_speling will handle this automatically, and for
all other pages too.
"I tried to go to your home page but all I got was a 404 error."

The 404 error status *was* correct, since the requested page isn't
supposed to exist (indeed it didn't until you put something there!).
Better practice would be a 404 error page which suggested going to the
corrected URL. In fact your present page would seem good enough, but
it ought IMHO to be delivered with an error status (404 is
appropriate), instead of the 200 OK that's happening presently.

That could of course be done with an ErrorDocument directive in your
..htaccess, assuming that the provider hasn't disabled that facility.

So those IMHO are two choices that are better than what's currently
happening:

1. status 301 redirection to the corrected URL (or enable
mod_speling); or

2. status 404 to a helpful error page with link to the corrected URL.
Coming back to the original issue, though, I'm not sure what your
point was about reviewing pages on a case-insensitive file system
prior to uploading them to a case-sensitive server.

best regards

Sep 26 '05 #81

Norman L. DeForest

On Mon, 26 Sep 2005, Alan J. Flavell wrote:

On Sun, 25 Sep 2005, Norman L. DeForest wrote:
On Fri, 16 Sep 2005, Alan J. Flavell wrote:
On Fri, 16 Sep 2005, Stewart Gordon wrote:
[question about using a Win32 Apache for local verification
before uploading to a unix-style system...]
see below re win32 Apache, close enough to the Apache-based servers
which most service providers seem to use. Certainly I've never had
any real problems relative to the linux-based production Apache server
which we run at our research group.

> Stuff like case-sensitivity....

It's close enough to real life for me. If you insist on having two ^^^^^^^^^^^^^^^^^^^^^^^^^^^
URLs that differ only in letter case, then you'd probably have

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^
difficulties, I suppose. I never tried it. Best read the

^^^^^^^^^^^^^^^^^^^^^^^^
win32-specific release notes if you want to know the sordid details.

[snip]

[snip] Naturally, even Win32 Apache can see case differences in /URLs/ ,
and take appropriate actions via directives: it's only "if and when" a
URL finally hits a file path that the case insensitivity comes to
light.

See:
http://www.chebucto.ns.ca/~af380/Profile.html
Apparently a unix-based Apache 1.3 server, which does not
exhibit the case-insensitive behaviour that is under discussion.
and:
http://www.chebucto.ns.ca/~af380/profile.html

The first filename is the default home page for users at my ISP.

Is it? Then its advertised URL ought, according to normal good
practice, to be http://www.chebucto.ns.ca/~af380/ , without the
specific file path at the end. However, that URL seems to go to a
different page, so I'm not sure in what sense Profile.html is the
"default home page".

Each user gets a home page named "Profile.html" automatically generated
and a Lynx shortcut "go profile" allowed a user to edit their home page
without knowing too much about navigating their filespace. Users had to
create their own "index.html" file (the default file for the fileserver)
if they wanted one. That way, it reduced the number of times a
user-support person had to ask the user to temporarily rename their
"index.html" file in response to a support question of the type "What
did I do wrong with my web page?" or "Why don't the links on my web page
work?" or "Why can't anyone access my images?" Being able to view a
user's directory was often necessary to see if the user had done something
such as upload images in lower-case but use upper- or mixed-case in the
URLs (or just misspelled the URL). It was assumed that users who knew
enough to create their own "index.html" file to hide the directory listing
would also be less likely to make such simple errors and need less
hand-holding.

New Privacy Laws have required my ISP to stop the practice but there
used to be a script available to users only (and not outside visitors)
with a Lynx shortcut, "go who", that provided links to the home pages of
the users logged in at the time. I found many interesting sites that way.
The default file accessed with that script was "Profile.html".

The second solved the problem of users mistyping the filename for
the first one in all lower-case.
This is an unfortunate choice, since the second URL leads to a
different web page and returns 200 OK, meaning that to an indexing
robot it appears to be substantive content, rather than an anomalous
(error) condition. I would strongly recommend handling such issues

There are no links to that file so search-engines are unlikely to find it.
with some kind of status that indicates that the URL is irregular:
this could, for example, be done with a redirection status (301 would
be appropriate), or with a custom error page.

In fact, enabling mod_speling will handle this automatically, and for
all other pages too.
User scripts, custom server configuration and custom error pages are not
available for users here.

"I tried to go to your home page but all I got was a 404 error."
The 404 error status *was* correct, since the requested page isn't
supposed to exist (indeed it didn't until you put something there!).
Better practice would be a 404 error page which suggested going to the
corrected URL. In fact your present page would seem good enough, but
it ought IMHO to be delivered with an error status (404 is
appropriate), instead of the 200 OK that's happening presently.

That's not under my control. Creating a new page with a lower-case name
*was* under my control.

That could of course be done with an ErrorDocument directive in your
.htaccess, assuming that the provider hasn't disabled that facility.
Users here cannot create or modify dot files (except for those indirectly
modified by applications such as (for example) pine modifying .pinerc,
..newsrc, and .addressbook).

So those IMHO are two choices that are better than what's currently
happening:

1. status 301 redirection to the corrected URL (or enable
mod_speling); or

2. status 404 to a helpful error page with link to the corrected URL.
Coming back to the original issue, though, I'm not sure what your
point was about reviewing pages on a case-insensitive file system
prior to uploading them to a case-sensitive server.

best regards

I was commenting on the statement, "If you insist on having two URLs that
differ only in letter case, then you'd probably have difficulties, I
suppose."

There *was* a reason for my having two filenames differing only in case
and it does cause difficulties in mirroring my site on my Windows machine.
(I had to rename the lower-case name to "profile2.html" in order to have
both files in the same Windows directory. Fortunately, nothing links to
"profile.html" so that's relatively harmless.) However, there *was* a
practical reason for having such case-differing filenames on my ISP's
system and it solved an even worse problem.

--
``Why don't you find a more appropiate newsgroup to post this tripe into?
This is a meeting place for a totally differnt kind of "vision impairment".
Catch my drift?'' -- "jim" in alt.disability.blind.social regarding an
off-topic religious/political post, March 28, 2005

Sep 26 '05 #82