By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,473 Members | 3,598 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,473 IT Pros & Developers. It's quick & easy.

When plain text page is treated as HTML

P: n/a
This may be too far off topic, however I was looking at this page
http://www.hixie.ch/advocacy/xhtml about XHTML problems by Ian Hickson.

It is served as text/plain, according to Firefox
Response Headers - http://www.hixie.ch/advocacy/xhtml

Date: Wed, 23 Nov 2005 21:36:06 GMT
Server: Apache/1.3.33 (Unix) DAV/1.0.3 mod_fastcgi/2.4.2
mod_gzip/1.3.26.1a PHP/4.3.10 mod_ssl/2.8.22 OpenSSL/0.9.7e
Vary: Accept-Encoding,User-agent
X-Pingback: http://tracking.damowmow.com/
Content-Language: en-GB-Hixie
Last-Modified: Sat, 17 Sep 2005 12:16:19 GMT
Etag: "17063c7-4a12-432c0913"
Accept-Ranges: bytes
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/plain; charset=utf-8
Content-Encoding: gzip
Content-Length: 7452

200 OK

The page displays in Firefox, and in Opera as if the text were
surrounded by pre tags. In Safari 2, the page displays as a single long
(but word wrapped) string, as if Safari were treating it as HTML markup.

The interesting point to me is that the displayed contents are
incomplete. Safari has the contents, as looking at source confirms.
The places where the contents are not displayed are

<script type="text/javascript"><!--//--><![CDATA[//><!--
...
//--><!]]></script>

which is replaced by *

<script type="text/javascript"><!--//--><![CDATA[//><!--
...
//--><!]]></script>

which is not replaced by anything.

The document as displayed truncates on the next paragraph, when it
encounters

<script> and <style>

Given that the script element never closes, it seems reasonable to hide
the contents.

So my question is, should a browser display a file served as text/plain
the way Firefox and Opera do, or should a browser look deep inside the
file for HTML (or other tags) the way Safari does?

Or should it use some heuristic to second guess the server, given the
number of servers that do not correctly identify content-type?

If a browser pays attention only to the content-type as provided by the
server, what should it do about a file.css served as text/html instead
of text/css? Or isn't that a problem when the css file could be
considered to be included in the html file that calls it?

--
http://www.ericlindsay.com
Nov 23 '05 #1
Share this Question
Share on Google+
10 Replies


P: n/a
On Thu, 24 Nov 2005, Eric Lindsay wrote:
This may be too far off topic, however I was looking at this page
http://www.hixie.ch/advocacy/xhtml about XHTML problems by Ian Hickson.
I've often seen plain-text documents from Hixie, but I must admit
I hadn't looked at their headers.
It is served as text/plain, according to Firefox
Response Headers - http://www.hixie.ch/advocacy/xhtml
[...] Vary: Accept-Encoding,User-agent [...] Content-Type: text/plain; charset=utf-8
Which is at least *suggestive* that there might be other variants
available, although we don't know what they are...

But a visit to http://www.hixie.ch/advocacy/ shows a conventional
directory listing. If there's any alternative version served out to
other browsers or in other character encodings, it would have to be
done by some kind of server conversion...? *Do* note that
accept-language is *not* one of the negotiation dimensions according
to that Vary header, even though there appears to be a French
translation available in the directory listing.
The page displays in Firefox, and in Opera as if the text were
surrounded by pre tags.
Well no, it displays "as plain text". There are big differences
between the two assertions, when the material contains markup and
&-notations - which this does.
In Safari 2, the page displays as a single long (but word wrapped)
string, as if Safari were treating it as HTML markup.
Booooooh!
The interesting point to me is that the displayed contents are
incomplete. Safari has the contents, as looking at source confirms.
The places where the contents are not displayed are

<script type="text/javascript"><!--//--><![CDATA[//><!--
...
//--><!]]></script>
This is fun stuff, but you really mustn't let yourself be so grossly
diverted from making real web pages, or you'll risk ending up like me
- posting too much about pedantic detail, and never getting around to
updating my sadly obsolescent web pages. Not good.
So my question is, should a browser display a file served as
text/plain the way Firefox and Opera do,
Of course.
or should a browser look deep inside the
file for HTML (or other tags) the way Safari does?
Sigh. I've been battering on about the mandate of RFC2616, but
somehow it doesn't seem to have sunk home. See the notes below the
table at
http://ppewww.ph.gla.ac.uk/~flavell/....html#browconf ,
which now take you directly to the relevant section of (the W3C's
HTML-ised copy of) RFC2616 -
http://www.w3.org/Protocols/rfc2616/....html#sec7.2.1
Or should it use some heuristic to second guess the server,
Absolutely and utterly not. RFC2616 forbids it.
given the number of servers that do not correctly identify
content-type?
It would still be permissible for a browser to say to its user "excuse
me, this content seems to be the wrong type. At some risk to your
security, I could try to guess this, are you prepared to take that
chance?". What RFC2616 is ruling out is that a client agent should
take it upon itself to unilaterally second-guess, without informed
consent from its user. That's my best interpretation, anyway.
If a browser pays attention only to the content-type as provided by the
server, what should it do about a file.css served as text/html instead
of text/css?


Per RFC2616, it's mandated to ignore it, i.e to render the HTML
without it, and Mozilla does so[1]: that's correct behaviour.
Unfortunately, some other browsers are not so cautious. The web would
be a better place if they were.

[1] at least in its Standards mode.
Nov 23 '05 #2

P: n/a
On Wed, 23 Nov 2005, Alan J. Flavell wrote:
Vary: Accept-Encoding,User-agent
If there's any alternative version served out to other browsers or
in other character encodings, it would have to be done by some kind
of server conversion...?


Sorry, I shot my mouth off too quickly on that point. It wasn't
"accept-charset" in that header, it was "accept-encoding". That's why
his server has sent gzip-ed content, because the browser said it was
willing to accept that encoding. Nothing to do with
character-encoding ("charset"). Sorry for that - spotted my mistake
just too late!

--
Post in haste, repent at leisure...

Nov 23 '05 #3

P: n/a
Eric Lindsay <NO**********@ericlindsay.com> writes:
The page displays in Firefox, and in Opera as if the text were
surrounded by pre tags. In Safari 2, the page displays as a single long
(but word wrapped) string, as if Safari were treating it as HTML markup.


Filed bug #4353871, at: <http://bugreporter.apple.com>.

sherm--

--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
Nov 24 '05 #4

P: n/a
Alan J. Flavell said the following on 11/24/2005 00:04 +0200:
But a visit to http://www.hixie.ch/advocacy/ shows a conventional
directory listing. If there's any alternative version served out to
other browsers or in other character encodings, it would have to be
done by some kind of server conversion...? *Do* note that
accept-language is *not* one of the negotiation dimensions according
to that Vary header, even though there appears to be a French
translation available in the directory listing.


Which is a directory, which contains a French version in HTML.

Speaking of language, what's it with his "Content-Language:
en-GB-Hixie"? Is it a valid Content-Language?

--
Regards
Harrie
Nov 24 '05 #5

P: n/a
In article <Pi*******************************@ppepc56.ph.gla. ac.uk>,
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
It is served as text/plain, according to Firefox
Response Headers - http://www.hixie.ch/advocacy/xhtml
Which is at least *suggestive* that there might be other variants
available, although we don't know what they are...
Like you, I couldn't see anything in the directory to indicate that some
user agent may have received a different version. But that was why I
used Firefox rather than curl when I looked at the header. Now that you
folks have pointed out that dynamically generated sites may be doing all
sorts of things according to user agent (or spider), I am no longer sure
of anything.
The page displays in Firefox, and in Opera as if the text were
surrounded by pre tags.


Well no, it displays "as plain text". There are big differences
between the two assertions, when the material contains markup and
&-notations - which this does.


So now I need to chase up what really happens with pre? That was
another tag I thought I could just ignore.
This is fun stuff, but you really mustn't let yourself be so grossly
diverted from making real web pages, or you'll risk ending up like me
- posting too much about pedantic detail, and never getting around to
updating my sadly obsolescent web pages. Not good.


I already have a large collection of sadly obsolescent web pages. I
just hope when I understand this a bit better I eventually get around to
putting together a content management system and update them.
So my question is, should a browser display a file served as
text/plain the way Firefox and Opera do,


Of course.


Good, that is what I thought.
or should a browser look deep inside the
file for HTML (or other tags) the way Safari does?


Sigh. I've been battering on about the mandate of RFC2616, but
somehow it doesn't seem to have sunk home. See the notes below the
table at
http://ppewww.ph.gla.ac.uk/~flavell/....html#browconf ,
which now take you directly to the relevant section of (the W3C's
HTML-ised copy of) RFC2616 -
http://www.w3.org/Protocols/rfc2616/....html#sec7.2.1


Thanks for that direct link Alan. That certainly is a clear demand that
it not be done. I was pretty sure Safari was wrong, but given they seem
to keep working on it, I thought maybe they knew something I didn't.
Bug report sent.
If a browser pays attention only to the content-type as provided by the
server, what should it do about a file.css served as text/html instead
of text/css?


Per RFC2616, it's mandated to ignore it, i.e to render the HTML
without it, and Mozilla does so[1]: that's correct behaviour.
Unfortunately, some other browsers are not so cautious. The web would
be a better place if they were.

[1] at least in its Standards mode.


I had noticed that Mozilla said it ignored incorrectly served CSS files.
I wasn't actually sure that was really the case, because I originally
checked servers with curl, and in one case found my css file served as
text/html

curl --head www.sheltersrus.com.au/sheltersrus.css
HTTP/1.1 302 Found
Date: Thu, 24 Nov 2005 03:06:05 GMT
Server: Apache/1.3.29 Sun Cobalt (Unix) mod_ssl/2.8.16 OpenSSL/0.9.6m
PHP/4.3.4 mod_auth_pam_external/0.1 FrontPage/5.0.2.2510 mod_perl/1.26
Location: http://site.sheltersrus.com.au/sheltersrus.css
Content-Type: text/html; charset=iso-8859-1

However look at the different header for the same file from Firefox

Response Headers - http://site.sheltersrus.com.au/sheltersrus.css
Date: Thu, 24 Nov 2005 03:07:34 GMT
Server: Apache/1.3.29 Sun Cobalt (Unix) mod_ssl/2.8.16 OpenSSL/0.9.6m
PHP/4.3.4 mod_auth_pam_external/0.1 FrontPage/5.0.2.2510 mod_perl/1.26
Last-Modified: Sat, 19 Nov 2005 06:58:32 GMT
Etag: "844310-524-437ecd18"
Accept-Ranges: bytes
Content-Length: 1316
Keep-Alive: timeout=15
Connection: Keep-Alive
Content-Type: text/css
200 OK

The charset variation is also interesting. They almost look like
different files. Opps. I think they are. Look at this.

curl --head site.sheltersrus.com.au/sheltersrus.css
HTTP/1.1 200 OK
Date: Thu, 24 Nov 2005 03:14:51 GMT
Server: Apache/1.3.29 Sun Cobalt (Unix) mod_ssl/2.8.16 OpenSSL/0.9.6m
PHP/4.3.4 mod_auth_pam_external/0.1 FrontPage/5.0.2.2510 mod_perl/1.26
Last-Modified: Sat, 19 Nov 2005 06:58:32 GMT
ETag: "844310-524-437ecd18"
Accept-Ranges: bytes
Content-Length: 1316
Content-Type: text/css

So I guess I need to check every site for 302 responses instead of 200.
I thought the thing was named www.sheltersrus.com.au, not
site.sheltersrus.com.au

--
http://www.ericlindsay.com
Nov 24 '05 #6

P: n/a
Harrie wrote in message news:43***********************@news.xs4all.nl...
Alan J. Flavell said the following on 11/24/2005 00:04 +0200:
But a visit to http://www.hixie.ch/advocacy/ shows a conventional
directory listing. If there's any alternative version served out to
other browsers or in other character encodings, it would have to be
done by some kind of server conversion...? *Do* note that
accept-language is *not* one of the negotiation dimensions according
to that Vary header, even though there appears to be a French
translation available in the directory listing.


Which is a directory, which contains a French version in HTML.

Speaking of language, what's it with his "Content-Language:
en-GB-Hixie"? Is it a valid Content-Language?


In theory, yes.

http://www.faqs.org/rfcs/rfc3066.html
"2.1 Language tag syntax

The language tag is composed of one or more parts: A primary language
subtag and a (possibly empty) series of subsequent subtags."

*series of subsequent subtags*
Practically, no.

"2.2 Language tag sources

The namespace of language tags is administered by the Internet
Assigned Numbers Authority (IANA) [RFC 2860] according to the rules
in section 3 of this document."

http://www.iana.org/assignments/language-tags
doesn't show "en-GB-Hixie" to be registered.
Nov 24 '05 #7

P: n/a
On Thu, 24 Nov 2005, Eric Lindsay wrote:
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
Which is at least *suggestive* that there might be other variants
available, although we don't know what they are...
Like you, I couldn't see anything in the directory to indicate that
some user agent may have received a different version.


Right - other than maybe sending the file compressed (e.g gzip) if the
client agent says via Accept-encoding that it accepts that.
The page displays in Firefox, and in Opera as if the text were
surrounded by pre tags.


Well no, it displays "as plain text". There are big differences
between the two assertions, when the material contains markup and
&-notations - which this does.


So now I need to chase up what really happens with pre?


Just normal HTML parsing!
checked servers with curl, and in one case found my css file served
as text/html

curl --head www.sheltersrus.com.au/sheltersrus.css
HTTP/1.1 302 Found
Date: Thu, 24 Nov 2005 03:06:05 GMT
Server: Apache/1.3.29 Sun Cobalt (Unix) mod_ssl/2.8.16 OpenSSL/0.9.6m
PHP/4.3.4 mod_auth_pam_external/0.1 FrontPage/5.0.2.2510 mod_perl/1.26
Location: http://site.sheltersrus.com.au/sheltersrus.css
Content-Type: text/html; charset=iso-8859-1
No, that's a redirection response. If you want curl to follow
redirection responses, you need this option:

-L/--location
(HTTP/HTTPS) If the server reports that the requested page has a
different location (indicated with the header line Location:)
this flag will let curl attempt to reattempt the get on the new
place.

See its man page.

When you get a status 30x redirection response, it usually comes
with a text/html body part. Most normal client agents however will
respond to the 30x status by proceeding to the new URL given in the
Location: header of the response. (RFC2616 for details).

In this case, the server is recognising the request for
http://www.sheltersrus.com.au/sheltersrus.css and redirecting
the request to http://site.sheltersrus.com.au/sheltersrus.css
(a rather curious thing to do - I would rather have expected
the opposite, seeing that www.sheltersrus.com.au is likely to
be the human-expected name for the site).
However look at the different header for the same file from Firefox

Response Headers - http://site.sheltersrus.com.au/sheltersrus.css
Exactly!
So I guess I need to check every site for 302 responses instead of 200.


As I say, you can use the -L option on curl.

Nov 24 '05 #8

P: n/a
Harrie wrote:
Speaking of language, what's it with his "Content-Language:
en-GB-Hixie"? Is it a valid Content-Language?


There's a spec for it [1], but as Hixie has previously admitted [2],
it's mostly just for fun.

[1] http://ian.hixie.ch/bible/english
[2]
http://lists.whatwg.org/htdig.cgi/wh...er/005003.html
(see the last paragraph of that post)

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://GetThunderbird.com/ Reclaim your Inbox
Nov 24 '05 #9

P: n/a
In article <Pi*******************************@ppepc56.ph.gla. ac.uk>,
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
So I guess I need to check every site for 302 responses instead of 200.


As I say, you can use the -L option on curl.


Thanks Alan. curl -L --head URL works fine, and reports the redirect
and the actual page just fine. I'll use that as my default set of
options in future, so I actually notice redirects.

I tend to get a bit lost in the options in curl. I had only really
looked at it for doing something like uploading web pages. Having
looked at curl, I used a here document with the command line ftp
instead. Seemed a whole heap easier to understand.

--
http://www.ericlindsay.com
Nov 24 '05 #10

P: n/a
In article <m2************@Sherm-Pendleys-Computer.local>,
Sherm Pendley <sh***@dot-app.org> wrote:
Eric Lindsay <NO**********@ericlindsay.com> writes:
The page displays in Firefox, and in Opera as if the text were
surrounded by pre tags. In Safari 2, the page displays as a single long
(but word wrapped) string, as if Safari were treating it as HTML markup.


Filed bug #4353871, at: <http://bugreporter.apple.com>.


Thanks Sherm. I'm not a developer, so I can only use the bug reporting
menu item in Safari.

--
http://www.ericlindsay.com
Nov 24 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.