473,320 Members | 1,848 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

When plain text page is treated as HTML

This may be too far off topic, however I was looking at this page
http://www.hixie.ch/advocacy/xhtml about XHTML problems by Ian Hickson.

It is served as text/plain, according to Firefox
Response Headers - http://www.hixie.ch/advocacy/xhtml

Date: Wed, 23 Nov 2005 21:36:06 GMT
Server: Apache/1.3.33 (Unix) DAV/1.0.3 mod_fastcgi/2.4.2
mod_gzip/1.3.26.1a PHP/4.3.10 mod_ssl/2.8.22 OpenSSL/0.9.7e
Vary: Accept-Encoding,User-agent
X-Pingback: http://tracking.damowmow.com/
Content-Language: en-GB-Hixie
Last-Modified: Sat, 17 Sep 2005 12:16:19 GMT
Etag: "17063c7-4a12-432c0913"
Accept-Ranges: bytes
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/plain; charset=utf-8
Content-Encoding: gzip
Content-Length: 7452

200 OK

The page displays in Firefox, and in Opera as if the text were
surrounded by pre tags. In Safari 2, the page displays as a single long
(but word wrapped) string, as if Safari were treating it as HTML markup.

The interesting point to me is that the displayed contents are
incomplete. Safari has the contents, as looking at source confirms.
The places where the contents are not displayed are

<script type="text/javascript"><!--//--><![CDATA[//><!--
...
//--><!]]></script>

which is replaced by *

<script type="text/javascript"><!--//--><![CDATA[//><!--
...
//--><!]]></script>

which is not replaced by anything.

The document as displayed truncates on the next paragraph, when it
encounters

<script> and <style>

Given that the script element never closes, it seems reasonable to hide
the contents.

So my question is, should a browser display a file served as text/plain
the way Firefox and Opera do, or should a browser look deep inside the
file for HTML (or other tags) the way Safari does?

Or should it use some heuristic to second guess the server, given the
number of servers that do not correctly identify content-type?

If a browser pays attention only to the content-type as provided by the
server, what should it do about a file.css served as text/html instead
of text/css? Or isn't that a problem when the css file could be
considered to be included in the html file that calls it?

--
http://www.ericlindsay.com
Nov 23 '05 #1
10 3436
On Thu, 24 Nov 2005, Eric Lindsay wrote:
This may be too far off topic, however I was looking at this page
http://www.hixie.ch/advocacy/xhtml about XHTML problems by Ian Hickson.
I've often seen plain-text documents from Hixie, but I must admit
I hadn't looked at their headers.
It is served as text/plain, according to Firefox
Response Headers - http://www.hixie.ch/advocacy/xhtml
[...] Vary: Accept-Encoding,User-agent [...] Content-Type: text/plain; charset=utf-8
Which is at least *suggestive* that there might be other variants
available, although we don't know what they are...

But a visit to http://www.hixie.ch/advocacy/ shows a conventional
directory listing. If there's any alternative version served out to
other browsers or in other character encodings, it would have to be
done by some kind of server conversion...? *Do* note that
accept-language is *not* one of the negotiation dimensions according
to that Vary header, even though there appears to be a French
translation available in the directory listing.
The page displays in Firefox, and in Opera as if the text were
surrounded by pre tags.
Well no, it displays "as plain text". There are big differences
between the two assertions, when the material contains markup and
&-notations - which this does.
In Safari 2, the page displays as a single long (but word wrapped)
string, as if Safari were treating it as HTML markup.
Booooooh!
The interesting point to me is that the displayed contents are
incomplete. Safari has the contents, as looking at source confirms.
The places where the contents are not displayed are

<script type="text/javascript"><!--//--><![CDATA[//><!--
...
//--><!]]></script>
This is fun stuff, but you really mustn't let yourself be so grossly
diverted from making real web pages, or you'll risk ending up like me
- posting too much about pedantic detail, and never getting around to
updating my sadly obsolescent web pages. Not good.
So my question is, should a browser display a file served as
text/plain the way Firefox and Opera do,
Of course.
or should a browser look deep inside the
file for HTML (or other tags) the way Safari does?
Sigh. I've been battering on about the mandate of RFC2616, but
somehow it doesn't seem to have sunk home. See the notes below the
table at
http://ppewww.ph.gla.ac.uk/~flavell/....html#browconf ,
which now take you directly to the relevant section of (the W3C's
HTML-ised copy of) RFC2616 -
http://www.w3.org/Protocols/rfc2616/....html#sec7.2.1
Or should it use some heuristic to second guess the server,
Absolutely and utterly not. RFC2616 forbids it.
given the number of servers that do not correctly identify
content-type?
It would still be permissible for a browser to say to its user "excuse
me, this content seems to be the wrong type. At some risk to your
security, I could try to guess this, are you prepared to take that
chance?". What RFC2616 is ruling out is that a client agent should
take it upon itself to unilaterally second-guess, without informed
consent from its user. That's my best interpretation, anyway.
If a browser pays attention only to the content-type as provided by the
server, what should it do about a file.css served as text/html instead
of text/css?


Per RFC2616, it's mandated to ignore it, i.e to render the HTML
without it, and Mozilla does so[1]: that's correct behaviour.
Unfortunately, some other browsers are not so cautious. The web would
be a better place if they were.

[1] at least in its Standards mode.
Nov 23 '05 #2
On Wed, 23 Nov 2005, Alan J. Flavell wrote:
Vary: Accept-Encoding,User-agent
If there's any alternative version served out to other browsers or
in other character encodings, it would have to be done by some kind
of server conversion...?


Sorry, I shot my mouth off too quickly on that point. It wasn't
"accept-charset" in that header, it was "accept-encoding". That's why
his server has sent gzip-ed content, because the browser said it was
willing to accept that encoding. Nothing to do with
character-encoding ("charset"). Sorry for that - spotted my mistake
just too late!

--
Post in haste, repent at leisure...

Nov 23 '05 #3
Eric Lindsay <NO**********@ericlindsay.com> writes:
The page displays in Firefox, and in Opera as if the text were
surrounded by pre tags. In Safari 2, the page displays as a single long
(but word wrapped) string, as if Safari were treating it as HTML markup.


Filed bug #4353871, at: <http://bugreporter.apple.com>.

sherm--

--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
Nov 24 '05 #4
Alan J. Flavell said the following on 11/24/2005 00:04 +0200:
But a visit to http://www.hixie.ch/advocacy/ shows a conventional
directory listing. If there's any alternative version served out to
other browsers or in other character encodings, it would have to be
done by some kind of server conversion...? *Do* note that
accept-language is *not* one of the negotiation dimensions according
to that Vary header, even though there appears to be a French
translation available in the directory listing.


Which is a directory, which contains a French version in HTML.

Speaking of language, what's it with his "Content-Language:
en-GB-Hixie"? Is it a valid Content-Language?

--
Regards
Harrie
Nov 24 '05 #5
In article <Pi*******************************@ppepc56.ph.gla. ac.uk>,
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
It is served as text/plain, according to Firefox
Response Headers - http://www.hixie.ch/advocacy/xhtml
Which is at least *suggestive* that there might be other variants
available, although we don't know what they are...
Like you, I couldn't see anything in the directory to indicate that some
user agent may have received a different version. But that was why I
used Firefox rather than curl when I looked at the header. Now that you
folks have pointed out that dynamically generated sites may be doing all
sorts of things according to user agent (or spider), I am no longer sure
of anything.
The page displays in Firefox, and in Opera as if the text were
surrounded by pre tags.


Well no, it displays "as plain text". There are big differences
between the two assertions, when the material contains markup and
&-notations - which this does.


So now I need to chase up what really happens with pre? That was
another tag I thought I could just ignore.
This is fun stuff, but you really mustn't let yourself be so grossly
diverted from making real web pages, or you'll risk ending up like me
- posting too much about pedantic detail, and never getting around to
updating my sadly obsolescent web pages. Not good.


I already have a large collection of sadly obsolescent web pages. I
just hope when I understand this a bit better I eventually get around to
putting together a content management system and update them.
So my question is, should a browser display a file served as
text/plain the way Firefox and Opera do,


Of course.


Good, that is what I thought.
or should a browser look deep inside the
file for HTML (or other tags) the way Safari does?


Sigh. I've been battering on about the mandate of RFC2616, but
somehow it doesn't seem to have sunk home. See the notes below the
table at
http://ppewww.ph.gla.ac.uk/~flavell/....html#browconf ,
which now take you directly to the relevant section of (the W3C's
HTML-ised copy of) RFC2616 -
http://www.w3.org/Protocols/rfc2616/....html#sec7.2.1


Thanks for that direct link Alan. That certainly is a clear demand that
it not be done. I was pretty sure Safari was wrong, but given they seem
to keep working on it, I thought maybe they knew something I didn't.
Bug report sent.
If a browser pays attention only to the content-type as provided by the
server, what should it do about a file.css served as text/html instead
of text/css?


Per RFC2616, it's mandated to ignore it, i.e to render the HTML
without it, and Mozilla does so[1]: that's correct behaviour.
Unfortunately, some other browsers are not so cautious. The web would
be a better place if they were.

[1] at least in its Standards mode.


I had noticed that Mozilla said it ignored incorrectly served CSS files.
I wasn't actually sure that was really the case, because I originally
checked servers with curl, and in one case found my css file served as
text/html

curl --head www.sheltersrus.com.au/sheltersrus.css
HTTP/1.1 302 Found
Date: Thu, 24 Nov 2005 03:06:05 GMT
Server: Apache/1.3.29 Sun Cobalt (Unix) mod_ssl/2.8.16 OpenSSL/0.9.6m
PHP/4.3.4 mod_auth_pam_external/0.1 FrontPage/5.0.2.2510 mod_perl/1.26
Location: http://site.sheltersrus.com.au/sheltersrus.css
Content-Type: text/html; charset=iso-8859-1

However look at the different header for the same file from Firefox

Response Headers - http://site.sheltersrus.com.au/sheltersrus.css
Date: Thu, 24 Nov 2005 03:07:34 GMT
Server: Apache/1.3.29 Sun Cobalt (Unix) mod_ssl/2.8.16 OpenSSL/0.9.6m
PHP/4.3.4 mod_auth_pam_external/0.1 FrontPage/5.0.2.2510 mod_perl/1.26
Last-Modified: Sat, 19 Nov 2005 06:58:32 GMT
Etag: "844310-524-437ecd18"
Accept-Ranges: bytes
Content-Length: 1316
Keep-Alive: timeout=15
Connection: Keep-Alive
Content-Type: text/css
200 OK

The charset variation is also interesting. They almost look like
different files. Opps. I think they are. Look at this.

curl --head site.sheltersrus.com.au/sheltersrus.css
HTTP/1.1 200 OK
Date: Thu, 24 Nov 2005 03:14:51 GMT
Server: Apache/1.3.29 Sun Cobalt (Unix) mod_ssl/2.8.16 OpenSSL/0.9.6m
PHP/4.3.4 mod_auth_pam_external/0.1 FrontPage/5.0.2.2510 mod_perl/1.26
Last-Modified: Sat, 19 Nov 2005 06:58:32 GMT
ETag: "844310-524-437ecd18"
Accept-Ranges: bytes
Content-Length: 1316
Content-Type: text/css

So I guess I need to check every site for 302 responses instead of 200.
I thought the thing was named www.sheltersrus.com.au, not
site.sheltersrus.com.au

--
http://www.ericlindsay.com
Nov 24 '05 #6
Harrie wrote in message news:43***********************@news.xs4all.nl...
Alan J. Flavell said the following on 11/24/2005 00:04 +0200:
But a visit to http://www.hixie.ch/advocacy/ shows a conventional
directory listing. If there's any alternative version served out to
other browsers or in other character encodings, it would have to be
done by some kind of server conversion...? *Do* note that
accept-language is *not* one of the negotiation dimensions according
to that Vary header, even though there appears to be a French
translation available in the directory listing.


Which is a directory, which contains a French version in HTML.

Speaking of language, what's it with his "Content-Language:
en-GB-Hixie"? Is it a valid Content-Language?


In theory, yes.

http://www.faqs.org/rfcs/rfc3066.html
"2.1 Language tag syntax

The language tag is composed of one or more parts: A primary language
subtag and a (possibly empty) series of subsequent subtags."

*series of subsequent subtags*
Practically, no.

"2.2 Language tag sources

The namespace of language tags is administered by the Internet
Assigned Numbers Authority (IANA) [RFC 2860] according to the rules
in section 3 of this document."

http://www.iana.org/assignments/language-tags
doesn't show "en-GB-Hixie" to be registered.
Nov 24 '05 #7
On Thu, 24 Nov 2005, Eric Lindsay wrote:
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
Which is at least *suggestive* that there might be other variants
available, although we don't know what they are...
Like you, I couldn't see anything in the directory to indicate that
some user agent may have received a different version.


Right - other than maybe sending the file compressed (e.g gzip) if the
client agent says via Accept-encoding that it accepts that.
The page displays in Firefox, and in Opera as if the text were
surrounded by pre tags.


Well no, it displays "as plain text". There are big differences
between the two assertions, when the material contains markup and
&-notations - which this does.


So now I need to chase up what really happens with pre?


Just normal HTML parsing!
checked servers with curl, and in one case found my css file served
as text/html

curl --head www.sheltersrus.com.au/sheltersrus.css
HTTP/1.1 302 Found
Date: Thu, 24 Nov 2005 03:06:05 GMT
Server: Apache/1.3.29 Sun Cobalt (Unix) mod_ssl/2.8.16 OpenSSL/0.9.6m
PHP/4.3.4 mod_auth_pam_external/0.1 FrontPage/5.0.2.2510 mod_perl/1.26
Location: http://site.sheltersrus.com.au/sheltersrus.css
Content-Type: text/html; charset=iso-8859-1
No, that's a redirection response. If you want curl to follow
redirection responses, you need this option:

-L/--location
(HTTP/HTTPS) If the server reports that the requested page has a
different location (indicated with the header line Location:)
this flag will let curl attempt to reattempt the get on the new
place.

See its man page.

When you get a status 30x redirection response, it usually comes
with a text/html body part. Most normal client agents however will
respond to the 30x status by proceeding to the new URL given in the
Location: header of the response. (RFC2616 for details).

In this case, the server is recognising the request for
http://www.sheltersrus.com.au/sheltersrus.css and redirecting
the request to http://site.sheltersrus.com.au/sheltersrus.css
(a rather curious thing to do - I would rather have expected
the opposite, seeing that www.sheltersrus.com.au is likely to
be the human-expected name for the site).
However look at the different header for the same file from Firefox

Response Headers - http://site.sheltersrus.com.au/sheltersrus.css
Exactly!
So I guess I need to check every site for 302 responses instead of 200.


As I say, you can use the -L option on curl.

Nov 24 '05 #8
Harrie wrote:
Speaking of language, what's it with his "Content-Language:
en-GB-Hixie"? Is it a valid Content-Language?


There's a spec for it [1], but as Hixie has previously admitted [2],
it's mostly just for fun.

[1] http://ian.hixie.ch/bible/english
[2]
http://lists.whatwg.org/htdig.cgi/wh...er/005003.html
(see the last paragraph of that post)

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://GetThunderbird.com/ Reclaim your Inbox
Nov 24 '05 #9
In article <Pi*******************************@ppepc56.ph.gla. ac.uk>,
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
So I guess I need to check every site for 302 responses instead of 200.


As I say, you can use the -L option on curl.


Thanks Alan. curl -L --head URL works fine, and reports the redirect
and the actual page just fine. I'll use that as my default set of
options in future, so I actually notice redirects.

I tend to get a bit lost in the options in curl. I had only really
looked at it for doing something like uploading web pages. Having
looked at curl, I used a here document with the command line ftp
instead. Seemed a whole heap easier to understand.

--
http://www.ericlindsay.com
Nov 24 '05 #10
In article <m2************@Sherm-Pendleys-Computer.local>,
Sherm Pendley <sh***@dot-app.org> wrote:
Eric Lindsay <NO**********@ericlindsay.com> writes:
The page displays in Firefox, and in Opera as if the text were
surrounded by pre tags. In Safari 2, the page displays as a single long
(but word wrapped) string, as if Safari were treating it as HTML markup.


Filed bug #4353871, at: <http://bugreporter.apple.com>.


Thanks Sherm. I'm not a developer, so I can only use the bug reporting
menu item in Safari.

--
http://www.ericlindsay.com
Nov 24 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: J. Alan Rueckgauer | last post by:
Hello. I'm looking for a simple way to do the following: We have a database that serves-up content to a website. Some of those items are events, some are news articles. They're stored in the...
14
by: Akseli Mäki | last post by:
Hi, Hopefully this is not too much offtopic. I'm working on a FAQ. I want to make two versions of it, plain text and HTML. I'm looking for a tool that will make a plain text doc out of the...
2
by: Mike Bridge | last post by:
Is there any way to get Internet explorer to treat a text/plain .net page as plain text using asp.net? It seems like IE doesn't trust text/plain as a mime type, and so it (ironically) displays it...
8
by: Doominato | last post by:
good day, I was just wondering how can I download a web page as plain text from a certain web site. I have tried to use the OpenURL() method from INET control in my VB.NET app, but it returns...
7
by: toby989 | last post by:
Hi All Sorry for reposting...the entries of the post from 11/23/2005 by Eric Lindsay have been removed from the server already and I am seeing only the header. So, I have the problem of...
2
by: Tim_Mac | last post by:
hi, i have a tricky problem and my regex expertise has reached its limit. i have read other posts on this newsgroup that pull out the plain text from a html string, but that won't work for me...
0
by: Rey | last post by:
Howdy all. Am using visual web developer 2005 (vb), xp pro sp2. In testing of the system.net.mail to send email from an aspx page where I'm pulling the email contents from a textbox, find that...
5
by: John Nagle | last post by:
This, which is from a real web site, went into BeautifulSoup: <param name="movie" value="/images/offersBanners/sw04.swf?binfot=We offer fantastic rates for selected weeks or days!!&blinkt=Click...
1
by: Billy | last post by:
Hi All, I'm attempting to use the MapNetworkDrive <snippedbelow from entire code below with very poor performance results. Basically, I have very small 73kb text files that are rewritten daily...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.