473,396 Members | 1,938 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

unicode meta tag, http header

If Http headers specify the character encoding, what is the point of
the Meta tag specifying it?
Jul 20 '05 #1
30 4798
On 8 Jul 2004 10:04:14 -0700, Anon <q_***********@yahoo.co.uk> wrote:
If Http headers specify the character encoding, what is the point of
the Meta tag specifying it?

It's literally an HTTP-EQUIV. In cases like a free webspace, where you
cannot modify the actual headers, this is an equivalent. Doing it
server-level is preferable.
Jul 20 '05 #2
Anon wrote:
If Http headers specify the character encoding, what is the point of
the Meta tag specifying it?


1) As an inferior method of specifying it for people who can't or won't
get their server to send proper HTTP headers.

2) Because some authoring tool excreted this code into the crappy HTML
it created.

3) Because the page author saw other pages using it and mindlessly
imitated them in cargo-cult manner.

4) Because the page needs to be used in non-HTTP contexts, such as being
browsed locally on a hard disk or CD-ROM, or uploaded to be validated,
where no HTTP headers are present so the META tag must do, inferior as
it may be.

What's really fun is when a page is served with contradictory
information in the HTTP headers and its META tag; in that case, some
browsers (e.g., Mozilla) follow the standards properly and respect the
HTTP headers, ignoring the contradictory META information, while others
(e.g., MSIE) do the opposite, leading to pages that seem to "work" in
one browser but come out garbled in others.
Jul 20 '05 #3
Neal wrote:
On 8 Jul 2004 10:04:14 -0700, Anon <q_***********@yahoo.co.uk> wrote:
If Http headers specify the character encoding, what is the point of
the Meta tag specifying it?


It's literally an HTTP-EQUIV. In cases like a free webspace, where you
cannot modify the actual headers, this is an equivalent. Doing it
server-level is preferable.


Of course, it also stays if the page is saved if it's in a META element.

Sometimes there isn't a server (e.g. html files on a CD) so it should be
used there.

--
Matt
-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----
Jul 20 '05 #4
"Daniel R. Tobias" <da*@tobias.name> wrote:
What's really fun is when a page is served with contradictory
information in the HTTP headers and its META tag; in that case, some
browsers (e.g., Mozilla) follow the standards properly and respect
the HTTP headers, ignoring the contradictory META information, while
others (e.g., MSIE) do the opposite, leading to pages that seem to
"work" in one browser but come out garbled in others.


Are you sure? We know that MSIE violates the HTTP protocol, but does it
do that in this respect? It seems to me that IE 6 correctly gives
preference to the charset parameter in an actual HTTP header, when such a
header is in conflict with a corresponding META tag.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #5
On Thu, 8 Jul 2004, Jukka K. Korpela wrote:
"Daniel R. Tobias" <da*@tobias.name> wrote:
What's really fun is when a page is served with contradictory
information in the HTTP headers and its META tag; in that case, some
browsers (e.g., Mozilla) follow the standards properly and respect
the HTTP headers, ignoring the contradictory META information, while
others (e.g., MSIE) do the opposite, leading to pages that seem to
"work" in one browser but come out garbled in others.


Are you sure? We know that MSIE violates the HTTP protocol, but does it
do that in this respect?


The last browser which I recall to have been violating the specified
priority order in this regard was Netscape 4.* in its earlier
versions; but that was corrected at some point, and was then
OK on later versions.

As this has been a special interest of mine for a considerable time, I
*think* I'd have been aware of any IE version which got it wrong.
(Although the little grey cells aren't so good as they used to be...)

(There -have- been some oddities of IE behaviour in the reload/refresh
area.)
Jul 20 '05 #6
Alan J. Flavell wrote:
As this has been a special interest of mine for a considerable time, I
*think* I'd have been aware of any IE version which got it wrong.
(Although the little grey cells aren't so good as they used to be...)


Try these pages:
http://arbiter.wipo.int/domains/deci...2001-0915.html
http://www.quicktopic.com/25/D/cD8dwc52A3p.html
http://www.25hoursaday.com/draft-oba...scheme-01.html

They all have contradictory HTTP and meta information regarding their
character encoding. When I tried them in Mozilla, they were rendered in
accordance with their HTTP-announced character encoding, which caused
some gibberish to appear scattered throughout the document where
characters were intended to be which were actually encoded in the manner
suggested by the meta tag. However, MSIE rendered all of these
documents "correctly" (in the DWIM sense, not the standards-compliant one).

--
== Dan ==
Dan's Mail Format Site: http://mailformat.dan.info/
Dan's Web Tips: http://webtips.dan.info/
Dan's Domain Site: http://domains.dan.info/
Jul 20 '05 #7
On 8 Jul 2004, Anon wrote:
Subject: unicode meta tag, http header
Why "unicode"?
If Http headers specify the character encoding, what is the point of
the Meta tag specifying it?


I wonder what's the point of specifying a nonsense <meta charset> in
these examples from Google's cache:
<http://google.com/search?q=cache:www.unics.uni-hannover.de/nhtcapri/cyrillic.win>
<http://google.com/search?q=cache:www.unics.uni-hannover.de/nhtcapri/greek.html7>

Google is getting sicker and sicker. They are more concerned about
useless mirrors in other countries (see sig) than about indexing
non-Latin-1 documents correctly.

--
No longer evil: http://www.google.com.ly

Jul 20 '05 #8
On Thu, 8 Jul 2004, Daniel R. Tobias wrote:
Try these pages:
OK, I'm looking at this one:
http://www.quicktopic.com/25/D/cD8dwc52A3p.html They all have contradictory HTTP and meta information regarding their
character encoding.
This one has ISO-8859-1 on the HTTP header, and utf-8 in the meta, and
is evidently really in utf-8.
When I tried them in Mozilla, they were rendered in
accordance with their HTTP-announced character encoding,
Confirmed. (Win Moz 1.7).
However, MSIE rendered all of these documents "correctly"
Perhaps "as intended" is a clearer way of putting it, since
your "correctly" means "not to specification".
(in the DWIM sense, not the standards-compliant one).


Not here. I'm seeing the same clutter as on Mozilla. And
View->Encoding shows "Western European (ISO)". This is Win IE6 (with
current security fixes applied).

If I manually set the encoding to utf-8, then it looks as intended.
But if I reload, the encoding reverts to iso-8859-1. I'd say that
(for all its other faults) this IE is performing to specification in
this respect.

Jul 20 '05 #9
JRS: In article <61**************************@posting.google.com >, seen
in news:comp.infosystems.www.authoring.html, Anon
<q_***********@yahoo.co.uk> posted at Thu, 8 Jul 2004 10:04:14 :
If Http headers specify the character encoding, what is the point of
the Meta tag specifying it?


(a) There may be no HTTP headers, for example if the file is displayed
locally without using a server;

(b) The HTTP headers may not be under author control, for example in
sites such as mine;

(c) It is more logical to have the meaning of a file depend only on what
is within the file, and not on auxiliary variable data.

Accepted practice may, of course, differ.

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk DOS 3.3, 6.20; Win98.
Web <URL:http://www.merlyn.demon.co.uk/> - FAQqish topics, acronyms & links.
PAS EXE TXT ZIP via <URL:http://www.merlyn.demon.co.uk/programs/00index.htm>
My DOS <URL:http://www.merlyn.demon.co.uk/batfiles.htm> - also batprogs.htm.
Jul 20 '05 #10
On Fri, 9 Jul 2004 13:16:20 +0100, Dr John Stockton
<sp**@merlyn.demon.co.uk> wrote:
JRS: In article <61**************************@posting.google.com >, seen
in news:comp.infosystems.www.authoring.html, Anon
<q_***********@yahoo.co.uk> posted at Thu, 8 Jul 2004 10:04:14 :
If Http headers specify the character encoding, what is the point of
the Meta tag specifying it?

(a) There may be no HTTP headers, for example if the file is displayed
locally without using a server;
And why would any one want to do that? Simple http servers for local use
comes for free by the dozen at e.g. Tucows, and so does Apache too (for
several platforms, Windoze 95 and all up to XP included)

Given today's de facto standard of well in excess of 20GB hard drives,
not even the Apache footprint will be noticed among all other bloat ware
that gets installed.
(b) The HTTP headers may not be under author control, for example in
sites such as mine;
Hmmm...

from <URL:http://www.merlyn.demon.co.uk/batfiles.htm>

HTTP/1.0 200 OK
Date: Fri, 09 Jul 2004 22:45:47 GMT
Server: thttpd/1.00.disbu
Content-type: text/html
Content-length: 30504
Last-modified: Fri, 18 Jun 2004 22:51:58 GMT

....who is it that keeps you locked into the 'merlyn' sub domain at
'demon'?

I mean, we are living in the year of 2004 and your docs get served as
HTTP/1.0 ??? from a server that names itself as 'thttpd/1.00.disbu' what
on earth is that?

Suggestion; instruct your 'merlyn.demon' provider to give you means to
configure your server space to correct status, or switch to an Apache
based provider that allows you to use .htaccess for your own
convenience.

I'm in no way affiliated with any one of the following two but have used
their services for some 6-7 years now and have not experienced any
problems at all with that...

<URL:http://www.he.net>

This is where e.g. <URL:http://www.css.nu> is hosted, all Apache based.
Hurricane Electric runs their operation out of the upper floors of the
same building that houses the mae.west Internet router system. They do
have several very short Tx fibers running straight into mae.west :-)

<URL:http://www.newsguy.com>

My usenet news provider who gives me more than ample room for www
documents on an Apache server, because I have opted to pay approx $70 on
a yearly basis for my usenet news access with them.

Both of my ISP's above do allow me to set up my own .htaccess
configuration of course. And I live in Sweden so you can see that there
is no need to be fixed into ones own country for ISP services.
(c) It is more logical to have the meaning of a file depend only on what
is within the file, and not on auxiliary variable data.
You did put this in to "light the fire", right ? :-)

Well, your line of thinking may seem reasonable at first but it breaks
down when we try to apply it to a very loose network where arbitrary
clients requests data from arbitrary servers (i.e. the Internet).

The original idea behind the <META HTTP-EQUIV...> construct was that
servers should scan documents before serving them out on the network and
use that META info (if found) to adjust its real HTTP header info for
the document.

After that time in history we have now literally billions of www
resources available on millions of servers. There is no way that it can
be allowed today for a server to scan docs before serving them, unless
it is explicitly instructed to do so through configuration of course,
the excess processing time can, in a vast majority of cases, not be
motivated.

The situation we have landed in is that it is now the client that reads
HTTP-EQUIV metas and sometimes makes use of it to alter its own
interpretation of the resource content.

And the "snag" here is then that the client has to start with some
initial assumption of what kind of char encoding it is looking at and
then, after reading several lines of markup, it finds out that the part
of the reading has been done is in error and it must restart from top.

Partly lucky situation is that there is no way to include a _correctly_
written HTTP-EQUIV meta in any other char repertoire than US-ASCII, so a
correctly written meta info in this area should be correctly interpreted
regardless of whatever default char encoding the client is set to use
initially.

Still I'm pretty sure that, if I put my mind to it, I would be able to
"hack up" some meta markup in a document that would totally confuse
WIN-IE, while the same doc would work good in Mozilla, served identical
to both with correct HTTP headers of course :-)
Accepted practice may, of course, differ.


The thing is that lots and lots of people have already clinged their
minds together to come up with interoperable solutions to most of the
problems that adheres to the "loose network". The result of those
peoples input is available as 'RFC' documents at...

<URL:http://www.ietf.org/>

....where those docs that are labeled as "Category: Standards Track" do
represent final consensus on how things are supposed to work for the
issue(s) addressed by that particular RFC.

This one is a classic example among the available crowd...

<URL:http://www.ietf.org/rfc/rfc1866.txt>

It should be noted that ISO/IEC does not take direct active part in the
standardization of Internet procedures, but has at one point in time
appointed the IETF as the party that has to do what is best for Internet
as a whole. "Category: Standards Track" documents are the best we will
get to define how to make use of this "random media" but they have
served good purpose and produced good results so far, there's no need to
abandon them just yet.

--
Rex

Jul 20 '05 #11
Jan Roland Eriksson <jr****@newsguy.com> wrote in
news:lj********************************@4ax.com:
On Fri, 9 Jul 2004 13:16:20 +0100, Dr John Stockton
<sp**@merlyn.demon.co.uk> wrote:
JRS: In article <61**************************@posting.google.com >, seen
in news:comp.infosystems.www.authoring.html, Anon
<q_***********@yahoo.co.uk> posted at Thu, 8 Jul 2004 10:04:14 :

If Http headers specify the character encoding, what is the point of
the Meta tag specifying it?

(a) There may be no HTTP headers, for example if the file is displayed
locally without using a server;


And why would any one want to do that?


Simple - what if I have an HTML file(e.g. example.html) on the
hard drive of my Windows-based PC, and a file association that
associates .html files with my Mozilla browser, and I tell
Windows to open the file.

--
Dave Patton
Canadian Coordinator, Degree Confluence Project
http://www.confluence.org/
My website: http://members.shaw.ca/davepatton/
Jul 20 '05 #12
On Sat, 10 Jul 2004 07:23:38 +0200, Jan Roland Eriksson
<jr****@newsguy.com> wrote:
On Fri, 9 Jul 2004 13:16:20 +0100, Dr John Stockton
<sp**@merlyn.demon.co.uk> wrote:

(a) There may be no HTTP headers, for example if the file is displayed
locally without using a server;


And why would any one want to do that? Simple http servers for local use
comes for free by the dozen at e.g. Tucows, and so does Apache too (for
several platforms, Windoze 95 and all up to XP included)


How about the common user who doesn't even know what a server is? How will
the browser know the charset?
Jul 20 '05 #13
Neal wrote:
Jan Roland Eriksson <jr****@newsguy.com> wrote:
Dr John Stockton wrote:

(a) There may be no HTTP headers, for example if the file is
displayed locally without using a server;


And why would any one want to do that? Simple http servers for
local use comes for free by the dozen


How about the common user who doesn't even know what a server is?
How will the browser know the charset?


Under the meta http-equiv hack, where the info is *in* the file, how
will the client know what the charset is when it starts to read the
characters? Meta http-equiv is not a good solution when the file
contents are to be transferred over the network.

--
Brian (remove ".invalid" to email me)
http://www.tsmchughs.com/
Jul 20 '05 #14
On Sat, 10 Jul 2004, Neal wrote:
How about the common user who doesn't even know what a server is? How will
the browser know the charset?


Well, the same way that you get to know the "charset" for text/plain,
or text/tab-separated-values etc.

HTTP is the natural protocol for the WWW, and it has the option (and,
since the security alert CERT CA-2000-02, it is strongly recommended
to *use* this option, correctly) for informing the client of the
character coding (that unfortunately-named MIME "charset" attribute).

The fact that many people won't do that, claiming variously that their
"server won't let them" or "service provider doesn't support it" and
so on, doesn't change those recommendations. Those people simply
aren't playing their proper part in the WWW concordat.

But you ask about protocols which don't have their own means of
informing the recipient about character codings, such as ftp, direct
file access, and so forth? The answer, at least for text/plain,
tab-separated-values etc. is that the recipient has to be informed
"out of band", and has to take appropriate steps to pass this
out-of-band knowledge into the client agent, e.g by setting the
view->encoding menu for the individual document, or temporarily
changing the client's default character coding.

And this technique works equally well (or, one might say, "equally
badly", since it is obviously very inconvenient and error-prone) for
HTML.

The idea of having the *client agent* parsing HTML to locate meta
charset settings is frankly a kludge. Perhaps, in a way, it's
unfortunate that the places where it causes real distress (on-the-fly
text/* transcoding proxies, for example - see also Russian Apache) are
few and far between, so those who favour the easy way rather than the
architecturally-sound way are sure to win this dispute by force of
numbers; but please don't try to tell me that makes it the right
answer.

On the other hand, this meta thingy for HTML works for HTML and
nothing else: not plain text, not XML, not tab-separated-values, not
XHTML (aside from the limited provisions of appendix-C compatibility).
And XML introduces yet another (bad) solution for this problem, but
that's another story: at least the recognition of the <?xml ...
charset thingy at the head of an XML-based document doesn't call for a
transcoding proxy to use a full-scale XML parser in order to be sure
of getting the right answer.

Whereas when you've worked out how to properly control your HTTP
server, the solution works for all kinds of textual content; it's not
limited to HTML.

In a WWW context, I'd suggest we do best to concentrate on an
architecturally-sound solution that works for the WWW context:
worrying about the coding of local files, ftp protocol etc. is a
distraction from that principle IMHO. (Thank goodness I haven't had
to mess with PCNFS for some years now. CP850/iso-8859-1 transcoding,
bleagh.)

best regards
Jul 20 '05 #15
On Sat, 10 Jul 2004 10:36:37 -0400, Brian
<us*****@julietremblay.com.invalid> wrote:

Under the meta http-equiv hack, where the info is *in* the file, how
will the client know what the charset is when it starts to read the
characters? Meta http-equiv is not a good solution when the file
contents are to be transferred over the network.


It relies on the assumption that most character sets share the first
128 codepoints with ASCII. This assumption holds in quite a few cases,
fortunately.

Best regards,
-Claire
Jul 20 '05 #16
Anon wrote:
If Http headers specify the character encoding, what is the point of
the Meta tag specifying it?


The real fun comes when specifications collide:
http://ln.hixie.ch/?start=1037398795&count=1

and with evil tests such as this:
http://www.hixie.ch/tests/adhoc/html...-type/002.html
(read the comment in the source code)

--
Lachlan Hunt
http://www.lachy.id.au/
la**********@lachy.id.au.update.virus.scanners

Remove .update.virus.scanners to email me,
NO SPAM and NO VIRUSES!!!
Jul 20 '05 #17
On Sat, 10 Jul 2004, Lachlan Hunt wrote:
and with evil tests such as this:
http://www.hixie.ch/tests/adhoc/html...-type/002.html
(read the comment in the source code)


The comment makes no sense to me. The comment refers to iso-8859-1,
but the HTTP header says utf-8, and the actual content appears to be
us-ascii (for which the HTTP setting of utf-8 is acceptable).

The document is defective, since it contains a meta purporting to say
that the charset is utf-16, which is false (or am I missing
something?). However, since the HTTP header takes precedence over the
meta according to the published rules, this is document defect is of
no further concern, AFAICS.

The client agent passes the test, according to my interpretation of
what I'm seeing. What fails is Hixie's defective document. Am I
missing something?
Jul 20 '05 #18
On Sat, 10 Jul 2004, Alan J. Flavell wrote:
But you ask about protocols which don't have their own means of
informing the recipient about character codings, such as ftp, direct
file access, and so forth? The answer, at least for text/plain,
tab-separated-values etc. is that the recipient has to be informed
"out of band", and has to take appropriate steps to pass this
out-of-band knowledge into the client agent,


I have to correct myself on this point, sorry. I was reminded of this
while confirming for myself that Hixie seems to have been mistaken
about charset precedence. See the HTML4 specification,
http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2

where it says:

To sum up, conforming user agents must observe the following
priorities when determining a document's character encoding (from
highest priority to lowest):

and lists the HTTP protocol charset first. Confirming what I was
convinced that I already knew...

According to the third item in that list, in theory you can write
something like

<a href="file:///wibble.txt" charset="utf-16LE">Wibble</a>

to inform the client agent about the character coding of a resource
(in this case apparently a plain text file) as a last resort where
there is no other source of character coding information.

I don't recall seeing a browser which supports that, although I got
bored of trying it without success several years back, so maybe I'll
be proved wrong.

Jul 20 '05 #19
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
<a href="file:///wibble.txt" charset="utf-16LE">Wibble</a> - - I don't recall seeing a browser which supports that


I'm afraid the situation is still the same, as you suspect.

On Mozilla, when you right-click on the link text and select
"Properties", the browser will open a small window showing some link
properties. If you have, say, the hreflang="sv"
attribute there, Mozilla will report, in the small window,
Target language: Swedish

But for some odd reason, Mozilla does not support the charset attribute
even this way, in the sense of showing it to the user (which might, in
some cases, help the user make some informed decisions). This is
particularly odd since it would have to just display the value (which
is much simpler than what it does with the hreflang attribute, mapping
language codes to language names).

I could use
type="text/plain;charset=utf-16LE"
and Mozilla would then report
Target type: text/plain;charset=utf-16

I wonder why the HTML specification does not mention, in its
description of the procedure of deciding on the encoding, the
possibility of using the charset parameter of the media type specified
in the type attribute. I guess this was just an oversight; they didn't
think about the possibility of using the type attribute that way.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #20
On Sat, 10 Jul 2004, Jukka K. Korpela wrote:
I could use
type="text/plain;charset=utf-16LE"
and Mozilla would then report
Target type: text/plain;charset=utf-16


Interesting. But do you know if it would then actually honour the
encoding when the link was taken?

Jul 20 '05 #21
Alan J. Flavell wrote:
On Sat, 10 Jul 2004, Lachlan Hunt wrote:
and with evil tests such as this:
http://www.hixie.ch/tests/adhoc/html...-type/002.html
(read the comment in the source code)


The comment makes no sense to me. The comment refers to iso-8859-1,
but the HTTP header says utf-8, and the actual content appears to be
us-ascii (for which the HTTP setting of utf-8 is acceptable).

The document is defective, since it contains a meta purporting to say
that the charset is utf-16, which is false (or am I missing
something?). However, since the HTTP header takes precedence over the
meta according to the published rules, this is document defect is of
no further concern, AFAICS.

The client agent passes the test, according to my interpretation of
what I'm seeing. What fails is Hixie's defective document. Am I
missing something?


Yes. The META element says utf-16, that's _sixteen_. That uses two bytes
for each character, so normal ASCII text encoded characters don't work.
Try forcing your browser to show utf-16 on that page, I see this:

㰡䑏䍔奐*䡔䵌⁐啂䱉**⼯圳䌯⽄ ⁈呍*㐮〯⽅丢㸊㱨**污湧㴢敮∾ 敡搾㱴楴汥㹈呍*䵅呁⁃潮瑥湴ⵔ祰

for several lines (that's chinese, korean, a mess)

So, the browser takes the page, interprets it according to the header:
ISO-8859-1. OK.Then it sees the META tag, oh, we're in utf-16 -- so it
should reparse everthing as utf-16. This results in the mess of chinese -
the meta tag isn't there any more, so the browser goes back to ISO-8859-1.

You see? It's not possible to pass.

US-ASCII is part of unicode, directly. The byte-sequences match. An ASCII
'A' is a unicode 'A'. This is not the case with utf-16 (or 32).

--
Matt
-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----
Jul 20 '05 #22
On Sun, 11 Jul 2004 00:05:45 +0100, Matt
<no******@spam.matt.blissett.me.uk> wrote:
Alan J. Flavell wrote:
The client agent passes the test, according to my interpretation of
what I'm seeing. What fails is Hixie's defective document. Am I
missing something?So, the browser takes the page, interprets it according to the header:
ISO-8859-1. OK.Then it sees the META tag, oh, we're in utf-16 -- so it
should reparse everthing as utf-16.


No, because the META element does not over rule the HTTP header, that
is definitive, so once it's decided it is ISO-8859-1 (which is
irrelevant as Alan says it claims to be UTF-8) then that's what it is
the meta element is meaningless.
You see? It's not possible to pass.


In this case it is, you may have the problem you describe when loaded
from a file system that does not provide charset information, but even
in that case the document is clearly in error, so anything that would
happen then is error-fixup.

Jim.
--
comp.lang.javascript FAQ - http://jibbering.com/faq/

Jul 20 '05 #23
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
On Sat, 10 Jul 2004, Jukka K. Korpela wrote:
I could use
type="text/plain;charset=utf-16LE"
and Mozilla would then report
Target type: text/plain;charset=utf-16


Interesting. But do you know if it would then actually honour the
encoding when the link was taken?


(I meant
Target type: text/plain;charset=utf-16LE
i.e. actually repeating the charset value, as you probably figured out.)

I'm very surprised now. Mozilla Firebird actually uses utf-16LE (and
indicates this if I check it with View/Character coding) when I use that
type attribute. It does not seem to care about charset attribute.
(It was a little difficul to test this, since Firebird seems to
"remember" an encoding it has got for a URL, so I used different file
names to test the different cases.)

This seems to work for links to HTML documents as well. So if I have to
link to a page that does not itself advertize its character encoding, I
can add charset="text/html;charset=..." and make Firebird show it
correctly without intervention. Actually I have such a situation but with
<iframe> rather than a link, and unfortunately - and illogically IMHO -
the <iframe> element does not allow a type attribute (by HTML specs or by
Firebird practice).

And some bad news: if I use <object> with e.g.
type="text/html"
then Firebird shows it OK, but it fails to render the embedded document
(and shows the content of <object> instead) if I use
type="text/html;charset=iso-8859-1"
So instead of recognizing and using the charset parameter there, Firebird
fails to understand the attribute at all!

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #24
JRS: In article <lj********************************@4ax.com>, seen in
news:comp.infosystems.www.authoring.html, Jan Roland Eriksson
<jr****@newsguy.com> posted at Sat, 10 Jul 2004 07:23:38 :
On Fri, 9 Jul 2004 13:16:20 +0100, Dr John Stockton
<sp**@merlyn.demon.co.uk> wrote:
JRS: In article <61**************************@posting.google.com >, seen
in news:comp.infosystems.www.authoring.html, Anon
<q_***********@yahoo.co.uk> posted at Thu, 8 Jul 2004 10:04:14 :
If Http headers specify the character encoding, what is the point of
the Meta tag specifying it?
(a) There may be no HTTP headers, for example if the file is displayed
locally without using a server;


And why would any one want to do that? Simple http servers for local use
comes for free by the dozen at e.g. Tucows, and so does Apache too (for
several platforms, Windoze 95 and all up to XP included)


A local server is not needed to display pages locally. Eschew
surplusage.

Indeed, the displaying computer may not be owned by the page author. I
can, and have, put pages on floppy to be displayed elsewhere, on a
computer not connected to the Net. It would be unreasonable to expect a
server to be installed just for that.

Given today's de facto standard of well in excess of 20GB hard drives,
not even the Apache footprint will be noticed among all other bloat ware
that gets installed.
Many computers, including mine, are not today's. And I don't install
bloatware.

(b) The HTTP headers may not be under author control, for example in
sites such as mine;


Hmmm...

from <URL:http://www.merlyn.demon.co.uk/batfiles.htm>

HTTP/1.0 200 OK
Date: Fri, 09 Jul 2004 22:45:47 GMT
Server: thttpd/1.00.disbu
Content-type: text/html
Content-length: 30504
Last-modified: Fri, 18 Jun 2004 22:51:58 GMT

...who is it that keeps you locked into the 'merlyn' sub domain at
'demon'?


It is, in fact, quite a well-known domain; I need no other at present.

I mean, we are living in the year of 2004 and your docs get served as
HTTP/1.0 ??? from a server that names itself as 'thttpd/1.00.disbu' what
on earth is that?
I neither know nor need to know; nor do I know that it is any of your
business to know. I do know that the logging provided indicates that my
pages can be widely read; and it is that which matters.
Suggestion; instruct your 'merlyn.demon' provider to give you means to
configure your server space to correct status, or switch to an Apache
based provider that allows you to use .htaccess for your own
convenience.
"Instruct" is somewhat authoritarian. I am a customer who has chosen
what Demon offered, modified by what they have chosen to add. It is
sufficient.

If you wish to buy me a newer, bigger, computer, configured to match
this one; to purchase for me a lifetime subscription to another
provider; to arrange that the merlyn.demon address be redirected (a
service not offered for my class of account; you might need to purchase
the business); etc.; then I'll consider that offer.

I'm in no way affiliated with any one of the following two but have used
their services for some 6-7 years now and have not experienced any
problems at all with that...

<URL:http://www.he.net>

This is where e.g. <URL:http://www.css.nu> is hosted, all Apache based.
Hurricane Electric runs their operation out of the upper floors of the
same building that houses the mae.west Internet router system. They do
have several very short Tx fibers running straight into mae.west :-)

<URL:http://www.newsguy.com>
Foreigners, I believe; and operating under foreign legislation. I
prefer services operated under UK law; services provided by a firm
operating under HMG.

(c) It is more logical to have the meaning of a file depend only on what
is within the file, and not on auxiliary variable data.


You did put this in to "light the fire", right ? :-)


It is my opinion; you can have a different one. Where meaning depends
on auxiliary information, there is a risk of the document being
separated from that information, and therefore losing meaning. Web
pages can be copied for reference to a reader's local disc; one would
not wish that to change the apparent meaning.
... And the "snag" here is then that the client has to start with some
initial assumption of what kind of char encoding it is looking at and
then, after reading several lines of markup, it finds out that the part
of the reading has been done is in error and it must restart from top.
All codings that I might wish to use are based on what is often called
ASCII. It is a reasonable world-wide default. It is a fair match to an
ISO standard, possibly numbered 646.
Partly lucky situation is that there is no way to include a _correctly_
written HTTP-EQUIV meta in any other char repertoire than US-ASCII, so a
correctly written meta info in this area should be correctly interpreted
regardless of whatever default char encoding the client is set to use
initially.
Somewhat US-imperialistic; but we are used to that. Ideally, the
repertoire would be described throughput by reference to an ISO
standard, with perhaps a footnote of enlightenment for those in
countries which disdain international standards. Where transmission
uses ASCII-based protocols, and an ASCII-based external header is sent,
it is reasonable to have an ASCII-based internal header too, unless the
external header has specified otherwise. If, say, the Russians want a
protocol such that the internal header can be in plain Cyrillic like the
rest of the document, I have no objection.

Still I'm pretty sure that, if I put my mind to it, I would be able to
"hack up" some meta markup in a document that would totally confuse
WIN-IE, while the same doc would work good in Mozilla, served identical
to both with correct HTTP headers of course :-)


Possibly so; that would be mere showing-off. Apparently, there are
simpler ways to upset IE.

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 MIME.
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)
Jul 20 '05 #25
On Sun, 11 Jul 2004, Matt wrote:
Alan J. Flavell wrote:
The client agent passes the test, according to my interpretation of
what I'm seeing. What fails is Hixie's defective document. Am I
missing something?
Yes. The META element says utf-16, that's _sixteen_.


I didn't miss that.
so normal ASCII text encoded characters don't work.
Yes, I -do- know what utf-16 encodings are. But the real HTTP header
takes precedence according to the HTML specification, and this
document isn't really coded in utf-16, so the meta is (a) just plain
wrong and anyway (b) is supposed to be ignored in favour of the real
HTTP header.
US-ASCII is part of unicode, directly.


Indeed; but "unicode" is not an encoding. utf-8 is, and that's what
the server advertised this document to be. (Contrary to what Hixie
said).

Sorry, but you're trying to explain to me a number of things that I'm
already familiar with, while missing a key point: Hixie's text gives
the impression that the meta charset overrules the real HTTP charset,
but I'm sure he really knows better than that.
Jul 20 '05 #26
On Sat, 10 Jul 2004, Dr John Stockton wrote:
I mean, we are living in the year of 2004 and your docs get served as
HTTP/1.0 ??? from a server that names itself as 'thttpd/1.00.disbu' what
on earth is that?


I neither know nor need to know;


We've done this before.
<http://www.google.com/groups?q=thttpd+author%3APrilop>

--
<http://www.unics.uni-hannover.de/nhtcapri/temp/squareroot.html>

Jul 20 '05 #27
On Sat, 10 Jul 2004 21:41:54 +0100, Dr John Stockton
<sp**@merlyn.demon.co.uk> wrote:
JRS: In article <lj********************************@4ax.com>, seen in
news:comp.infosystems.www.authoring.html, Jan Roland Eriksson
<jr****@newsguy.com> posted at Sat, 10 Jul 2004 07:23:38 :
On Fri, 9 Jul 2004 13:16:20 +0100, Dr John Stockton
<sp**@merlyn.demon.co.uk> wrote:
(a) There may be no HTTP headers, for example if the file is displayed
locally without using a server;
And why would any one want to do that? ... A local server is not needed to display pages locally.
Doing that is "off topic" for any kind of document intended for www
access as AJF has already explained in another post.
Indeed, the displaying computer may not be owned by the page author.
I can, and have, put pages on floppy to be displayed elsewhere...
I can, and have on several occasions delivered browsable technical
documentation for the projects I have bee in charge of.

When my client stuffs my CD into his reader it autostarts an HTTP server
which in its turn presents a proper index page to my client. (and Yes, I
do have a license to distribute that CD based http server application. I
did write it myself :-)
It would be unreasonable to expect a server to be installed just for that.
If you want to make _your_ info fully available to your clients, _you_
serve your info as it should be served. This is exactly where user
configs of www server space comes in.
...not even the Apache footprint will be noticed... Many computers, including mine, are not today's. And I don't install
bloatware.
Good for you. Still; my very first hard disk was a 20MB unit bought for
a small fortune in 1985. Today's Apache server would have consumed less
than 10% of that original unit.
(b) The HTTP headers may not be under author control...

HTTP/1.0 200 OK
Date: Fri, 09 Jul 2004 22:45:47 GMT
Server: thttpd/1.00.disbu
Content-type: text/html
Content-length: 30504
Last-modified: Fri, 18 Jun 2004 22:51:58 GMT

...who is it that keeps you locked into the 'merlyn' sub domain at
'demon'?


It is, in fact, quite a well-known domain; I need no other at present.


I have spent time reading as much info as I have been able to find
(thanks to the link Andreas gave) about the "thttpd" server.

My _very_serious_conclusion_ from that reading is that this server is
solely designed for use by old stock-conservative British people.

If, as you say, "daemon" is an old reliable British ISP; my comment is
that they know nothing about progress.

Or to put it in clear text; "You are being fucked for your money".

You can have much better service elsewhere.
...your docs get served as HTTP/1.0 ??? I neither know nor need to know;
You can stay ignorant if that pleases you...
nor do I know that it is any of your business to know.
Of course it may very well be my business to know what kind of "crap" I
eventually allow to enter my home. Mind you that a www resource is not
to be regarded as a benefit from the authors point of view to the ones
that eventually invites the authors resource; on the contrary, any
author shall produce and treat his resources so that they will be looked
upon as welcome guests in other peoples houses.
I do know that the logging provided indicates that my
pages can be widely read; and it is that which matters.
Even I myself have proved to you that at least some of your docs are in
total error as served to the www.

From my reading of the Daemon docs, they are "screwing" you for your
money.
...instruct your 'merlyn.demon' provider... "Instruct" is somewhat authoritarian. I am a customer who has chosen
what Demon offered...
Silly British attitude. Any one can ask for better quality for money
paid. In your case it will be difficult since "daemon" is running a
"stone age" server that does not inherently allow for individual user
configs.

(that server operates on _one_ single config resource ? and daemon gets
by with that in a modern European country today ??? )

Change to another provider with better value for your same money.
<URL:http://www.he.net>
<URL:http://www.newsguy.com>

Foreigners, I believe


Sure, they are both US, California based; but the ting is that they give
me better value for my money than any Swedish/UK/EU resources I have
been able to find. For me the Money/Quality ratio is important and I
could not care less about borders of countries. I'm a European so far,
and I hope to become a world citizen before my days ends.

--
Rex
Jul 20 '05 #28
JRS: In article <3j********************************@4ax.com>, seen in
news:comp.infosystems.www.authoring.html, Jan Roland Eriksson
<jr****@newsguy.com> posted at Tue, 13 Jul 2004 03:29:16 :
On Sat, 10 Jul 2004 21:41:54 +0100, Dr John Stockton
<sp**@merlyn.demon.co.uk> wrote:
JRS: In article <lj********************************@4ax.com>, seen in
news:comp.infosystems.www.authoring.html, Jan Roland Eriksson
<jr****@newsguy.com> posted at Sat, 10 Jul 2004 07:23:38 :
On Fri, 9 Jul 2004 13:16:20 +0100, Dr John Stockton
<sp**@merlyn.demon.co.uk> wrote:
(a) There may be no HTTP headers, for example if the file is displayed
locally without using a server;And why would any one want to do that? ...
A local server is not needed to display pages locally.


Doing that is "off topic" for any kind of document intended for www
access as AJF has already explained in another post.


A page may be intended both for WWW access and for non-WWW access, while
still being the same file.

A page being written with the intent of presentation on the Web can be
read locally by the author as part of authoring.

My _very_serious_conclusion_ from that reading is that this server is
solely designed for use by old stock-conservative British people.
What could be better than serving them, and those who aspire to the same
standards?

If, as you say, "daemon" is an old reliable British ISP; my comment is
that they know nothing about progress.
The name is, fairly obviously, "Demon", and you should accept that.
I do know that the logging provided indicates that my
pages can be widely read; and it is that which matters.


Even I myself have proved to you that at least some of your docs are in
total error as served to the www.


Satisfied "clients" are of more importance to me than are people such as
yourself.

Sure, they are both US, California based; but the ting is that they give
me better value for my money than any Swedish/UK/EU resources I have
been able to find. For me the Money/Quality ratio is important and I
could not care less about borders of countries. I'm a European so far,
and I hope to become a world citizen before my days ends.

It appears to me that there is a very considerable amount of time indeed
left before you reach an age at which ending your days is appropriate.

--
John Stockton, Surrey, UK. ??*@merlyn.demon.co.uk Turnpike v4.00 MIME.
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
Check boilerplate spelling -- error is a public sign of incompetence.
Never fully trust an article from a poster who gives no full real name.
Jul 20 '05 #29
On Tue, 13 Jul 2004 14:19:18 +0100, Dr John Stockton
<sp**@merlyn.demon.co.uk> wrote:

Before I make any further comments here I would like to offer my sincere
apologies for the language used in my last post. It was inappropriate
and I would not have expressed myself in the same way if I had waited a
day before replying. I move on with hopes to be forgiven?
A page may be intended both for WWW access and for non-WWW access, while
still being the same file.
HTML documents was never intended to be made available through a local
'file://...' access, you may want to read up on the history as given by
"Sir Timothy Berners-Lee", this URL may be a good start, and links from
there of course.

<URL:http://www.w3.org/Protocols/DesignIssues.html> (it's from 1991)

HTML documents shall be _served_ to clients in response to client
_requests_, that is the _conservative_ approach on how to handle them.

The 'file://...' misnomer came later and not from the original source of
the HTTP protocol as I recall history.
A page being written with the intent of presentation on the Web can be
read locally by the author as part of authoring.
At the risk of repeating myself, installing and running your own server
locally is a breeze (regardless of your current hardware), and doing
that will aid the quality of your page authoring in several ways that
you do not realize today.
My _very_serious_conclusion_ from that reading is that this server is
solely designed for use by old stock-conservative British people. What could be better than serving them, and those who aspire to the same
standards?
Then think again about dropping the local 'file://...' access for
reasons already given.
If, as you say, "daemon" is an old reliable British ISP; my comment is
that they know nothing about progress. The name is, fairly obviously, "Demon", and you should accept that.
English is not my native language so I do rely on computerized spell
checking for my outgoing posts. Sometimes my spell checker plays tricks
with me, I would be happy to find that you do accept that.
I do know that the logging provided indicates that my
pages can be widely read; and it is that which matters.


A web log says nothing of that kind.
Even I myself have proved to you that at least some of your docs are in
total error as served to the www. Satisfied "clients" are of more importance to me than are people such as
yourself.
Well, the thing about www doc's accessed over Internet is that you will
almost never get anything but positive responses, _if_ you get a
response at all.

The reason behind that situation is that the common www-naut does not
bother to analyze something that does not come out acceptable and
understandable in his/her browsing situation.

You know; there is always another www page sitting "just around the
corner" that may be more interesting to invite as an alternative to the
"gibberish" just seen.

Following that "common standard behavior" of www-naut's, you will only
find feedback from those who happened to rely on the same browser bugs
as you did in your production process, and that feedback will always be
positive at major. It fools you to think that you have done something
good, while in reality your work may have missed a major target where it
could have made a real difference.
...I'm a European so far, and I hope to become a world citizen
before my days ends.

It appears to me that there is a very considerable amount of time
indeed left before you reach an age at which ending your days is
appropriate.


I like that :-)
I'm born in early November 1948, I could opt for an end no earlier than
2048 then? What a life that would be :-) I hope to stay as curious of
life's aspects, as I am today, for all that time too.

--
Rex
Jul 20 '05 #30
JRS: In article <qg********************************@4ax.com>, seen in
news:comp.infosystems.www.authoring.html, Jan Roland Eriksson
<jr****@newsguy.com> posted at Wed, 14 Jul 2004 00:52:53 :
On Tue, 13 Jul 2004 14:19:18 +0100, Dr John Stockton
<sp**@merlyn.demon.co.uk> wrote:

Before I make any further comments here I would like to offer my sincere
apologies for the language used in my last post. It was inappropriate
and I would not have expressed myself in the same way if I had waited a
day before replying. I move on with hopes to be forgiven?
Yes, provided that further occasion for forgiveness is not given.

A page may be intended both for WWW access and for non-WWW access, while
still being the same file.


HTML documents was never intended to be made available through a local
'file://...' access, you may want to read up on the history as given by
"Sir Timothy Berners-Lee", this URL may be a good start, and links from
there of course.


The plans of TBL, a decade and more ago, are historic. But that does
not prevent them from being extended by later authors, of whatever
talent.

Then think again about dropping the local 'file://...' access for
reasons already given.


What file:// access? The only file:// - type access on my public pages,
other than javascript view-source, is where I as author find it
convenient to have a link, which works on my author's local view, to
pages which I do not intend to publish. I impose this small burden on
my outside readers, with uninviting <a...>..</a> contents; it is noted
in the index page. I could, for those files on the same drive, use a
relative URL; but then readers might expect the links to work for them.

If, as you say, "daemon" is an old reliable British ISP; my comment is
that they know nothing about progress.

The name is, fairly obviously, "Demon", and you should accept that.


English is not my native language so I do rely on computerized spell
checking for my outgoing posts. Sometimes my spell checker plays tricks
with me, I would be happy to find that you do accept that.


If your spelling checker does not have "demon" as an English word, then
it is inadequate. In fact, "daemon" is not normally used in current
English, except as computer jargon. BTW, in English a "spell checker"
would be a tester of magic incantations.

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4
<URL:http://jibbering.com/faq/> JL / RC : FAQ for news:comp.lang.javascript
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
Jul 20 '05 #31

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Erik Neves | last post by:
I need help with utf-8 before i go bald and the rest of my hair turns white! Platform: Mac OS X 10.2.6 Safari 1.0 (v85) Internet Explorer:mac 5.2.3 (5815.1) Apache 2.0.47 PHP 4.3.2 (Apache 2...
1
by: Cezary | last post by:
Hello. I was read PHP manual, but i'm not sure yet. Here is my meta tags in html: <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-2"> <META HTTP-EQUIV="Expires"...
8
by: Bill Eldridge | last post by:
I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5,...
27
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
4
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...
2
by: slavi.marinov | last post by:
Hello all, I have a simple question on PHP and the way it handles strings. Let's say I have a database and a php script that communicates with the database. The database has some kind of...
18
by: Ger | last post by:
I have not been able to find a simple, straight forward Unicode to ASCII string conversion function in VB.Net. Is that because such a function does not exists or do I overlook it? I found...
1
by: encoding | last post by:
v Hi! I read the folowing discussion: ...
3
by: roland.saad | last post by:
Hi Everyone, I have been trying to build a website that has multilingual support using the LAMP setup. I have created tables that store language information and correlate different strings ids...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.