On Mon, 27 Oct 2003, Jukka K. Korpela wrote:
[color=blue]
> I have tried to actively forget all the confusion that XHTML causes
> since, as I wrote, the simple answer is to stay away from it.[/color]
;-}
But there's an interesting point here, nevertheless, which has nothing
directly to do with XML/XHTML and a lot to do with i18n form
submission, and, I now realise, relating tangentially to my form-i18n
page. Oh, and by chance Google just presented me with a discussion
thread which is also relevant to the underlying principle, in a way...
http://lists.w3.org/Archives/Public/...3Sep/0025.html etc.
[color=blue]
> Yes, we _could_ use the extended interface, which lets us specify
> the encoding the third time, and then we get
>
> Note: The HTTP Content-Type header sent by your web browser (unknown)
> did not contain a "charset" parameter, but the Content-Type was one of
> the XML text/* sub-types (text/xml).[/color]
This assertion presumably relates to the file upload "control" of the
multipart/form-data submission, yes? I'm not in the least surprised
by the absence of a "charset" specification, but I'm puzzled by the
fact that it's saying it was content-type "text/xml". Would this have
been sent by your client agent, or are they spoofing it in order to
make their validator accept it?
[quotation continues...][color=blue]
> The relevant specification (RFC
> 3023) specifies a strong default of "us-ascii" for such documents so we
> will use this value regardless of any encoding you may have indicated
> elsewhere. If you would like to use a different encoding, you should
> arrange to have your browser send this new encoding information.[/color]
Hmmm, yes, they have a point, despite its unfriendliness.
[color=blue]
> which looks pretty strange after a _file upload_ submission.[/color]
Oh, I don't know: the client agent is in a far better position to know
what encoding to assign to this portion of the multipart/form-data
submission, than is any other participant in the proceedings.
What it basically means is: because implementers have been avoiding
implementing the necessary features of the i18n specifications (in
some cases alleging that they couldn't do it because it would upset
other incomplete implementations), this kind of file upload can't do
the job that is needed at this point.
If the validator folk were to start applying heuristics at this point
then they'd defeat their own purpose, presumably. It's a shame about
the users who are caught out by this, though.
As you may recall, my thesis has always been that no text file is
complete without external information about its character encoding,
and that it's an architectural error to smuggle that information into
content of the file itself. But I've long since lost that battle,
what with the http meta thingy, the <?xml...encoding thingy. I could
almost live with the BOM, but of course the BOM doesn't solve anything
for non-Unicode encodings.
And Mark C made dire threats about the dangers of going anywhere near
ISO-2022 (which I hadn't even mentioned!) when I got involved in a
discussion about character code support in PINE recently.
I think the bottom line here is that the file upload feature of the
validator is of very limited usefulness, given the shortcomings which
have been raised here, and needs Some Big Text to warn users of the
pitfalls, relative to putting the content onto a server and pointing
the validator/checker at its URL.
thanks for the explanation!
all the best