473,413 Members | 1,807 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,413 software developers and data experts.

Server handling of Charset

Someone just drew attention to an open bug report in Apache concerning
shipping with an AddDefaultCharset set by default in httpd.conf.
This leads to bogus charsets being served in many cases.

I've just put forward a suggestion, but I'd welcome review from those
familiar with character encoding issues and CA-2000-02 (which is the
main argument in favour of the current behaviour).

Comments and suggestions please:
http://issues.apache.org/bugzilla/show_bug.cgi?id=23421

--
Nick Kew
Jul 20 '05 #1
1 1548
On Sun, 1 Aug 2004, Nick Kew wrote:
Someone just drew attention to an open bug report in Apache concerning
shipping with an AddDefaultCharset set by default in httpd.conf.
Thanks for pointing this issue out. This is only a first attempt at
an answer.
This leads to bogus charsets being served in many cases.
Inevitably, it does, yes, I can't disagree with that.
I've just put forward a suggestion, but I'd welcome review from those
familiar with character encoding issues
I think that might include me...
and CA-2000-02
Well, I know that its conclusion is that documents should not be
served out without a charset. On the other hand, Martin D's comment -
that serving out a wrong charset is worse than useless - cannot be
refuted.

What I cannot claim to be an expert on are the minutiae of cross-site
scripting as explored in that CERT advisory.
Comments and suggestions please:
http://issues.apache.org/bugzilla/show_bug.cgi?id=23421


It seems to me to be most unfortunate that Martin D presents a
detailed essay on the harmfulness of this procedure without apparently
making a single mention of the CA-2000-02 issue which motivated the
original introduction of this default. That makes it so much harder
to form a balanced view of the logic.

<advocatus-diaboli>
Maybe the default should be x-user-defined ?
</>

The implication of his point (2), which I interpret as saying it would
be better to get rid of server charset altogether, and rely on meta
http-equiv for HTML and the <?xml..encoding for XHTML, would I think
be energetically disputed by quite a number of respected contributors.

[Aside - I wish people wouldn't refer to cross-site scripting as CSS!
Those who insist on using a TLA would be better advised to use XSS -
Google suggests
http://www.cgisecurity.com/articles/...shtml#whatdoes ]

Keep in mind that character encoding is an issue for most kinds of
text/* content, as well as being an issue for some kinds of
application/* content. Some of those content types have their own
machinery for indicating character representations, but most of them
have not (text/plain for example). In general, you need an HTTP
mechanism which works for all of those content-types: and Apache *has*
such a mechanism. The hard part seems to be persuading people to use
it!!

However, this report seems to be confined to (X)HTML content.

The complainant is probably right that more emphasis should be put on
producing author-oriented documentation about this issue. (But I'm
not about to volunteer to write it, I'm afraid.)

My *reluctant* conclusion, bearing in mind the item (2) in Martin D's
original report, and the widespread observed reliance on meta
http-equiv and f(r)iends, would be that there needs to be a module
which parses the actual documents at least as far as the meta
http-equiv charset, the <?xml...encoding and BOM, and copies the
"correct"[1] information into the real HTTP header. And that this
behaviour should be enabled by default, with the documentation saying
to more clueful authors/admins that they would do well to turn this
default behaviour off, and handle the job properly for themselves.

[1] Unfortunately, what is "correct" is by no means obvious, when
confronted with an arbitrary document. I've seen occasional documents
claiming to be XHTML/1.0 and served out as text/html, in which the
HTTP header, the BOM, the meta..http-equiv and the <?xml..encoding
were apparently saying different things.

Your hint about multiviews is a Good Thing, but I'm not sure that the
use of multiple filename extensions has been sufficiently promoted yet
for it to be simply dropped onto users as a new Apache default. If
that route is chosen, I think it'll need to be staged-in, and there
still will need to be a default for use when there is no other source
of information - unless someone can interpret CA-2000-02 as harmless,
which I'm certainly not keen to do.

<digression> One should note that MultiViews sometimes accidentally
exposes the existence of other documents in a web subdirectory, which
the author had intended should be hidden from casual browsing.
(mod_speling sometimes does that, too).

was that any use? It's only first reactions.
Jul 20 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: lawrence | last post by:
How do I get PHP to tell the server that when I echo text to the screen, I need for the text to be sent as UTF-8? How does Apache know the right encoding when all the text is being generated by...
35
by: The Bicycling Guitarist | last post by:
My web site has not been spidered by Googlebot since April 2003. The site in question is at www.TheBicyclingGuitarist.net/ I received much help from this NG and the stylesheets NG when updating the...
1
by: Marco Miltenburg | last post by:
While working on some multilingual code I found a rather strange thing happening with Server.HTMLEncode. While loading different languages I change the Codepage and Charset in ASP to reflect the...
39
by: alex | last post by:
I've converted a latin1 database I have to utf8. The process has been: # mysqldump -u root -p --default-character-set=latin1 -c --insert-ignore --skip-set-charset mydb mydb.sql # iconv -f...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.