By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,319 Members | 2,361 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,319 IT Pros & Developers. It's quick & easy.

how to tell server from PHP that charset is UTF-8??

P: n/a
How do I get PHP to tell the server that when I echo text to the
screen, I need for the text to be sent as UTF-8? How does Apache know
the right encoding when all the text is being generated by PHP? If I
build a content management system (I have) and I make sure that all
input is encoded as UTF-8, how will the
server know that the text in the MySql database is UTF-8?

I'm taking all user input and using this function on the input:

http://us4.php.net/manual/en/function.utf8-encode.php

I'm doing this so I can output to XML without getting errors about
"You should not sent plain text".

But how will the server know how to serve these pages? How do I tell
it from PHP? I realize I can send a http equiv tag, but that's rather
weak, right?

Is this enough? Any conflicts with Apache?

$sent = headers_sent();
if (!$sent) header("Content-type:text/html;charset:UTF-8");
Jul 17 '05 #1
Share this Question
Share on Google+
12 Replies


P: n/a
On 4 Sep 2004 09:08:41 -0700, lk******@geocities.com (lawrence) wrote:
How do I get PHP to tell the server that when I echo text to the
screen, I need for the text to be sent as UTF-8?
Sent a content-type header with a charset attribute.
How does Apache know
the right encoding when all the text is being generated by PHP?
It doesn't, nor does it need to - that information's just for the end user.
If I
build a content management system (I have) and I make sure that all
input is encoded as UTF-8, how will the
server know that the text in the MySql database is UTF-8?

I'm taking all user input and using this function on the input:

http://us4.php.net/manual/en/function.utf8-encode.php

I'm doing this so I can output to XML without getting errors about
"You should not sent plain text".
Don't know what you mean here. XML content doesn't have to be UTF-8 encoded,
just properly escaped and the encoding set correctly.
But how will the server know how to serve these pages? How do I tell
it from PHP? I realize I can send a http equiv tag, but that's rather
weak, right?
Yep.
Is this enough? Any conflicts with Apache?

$sent = headers_sent();
if (!$sent) header("Content-type:text/html;charset:UTF-8");


Shouldn't the : after charset be an = sign? i.e.

Content-type: text/html; charset=utf-8

That would be enough, provided it's actually sent (i.e. $sent is false).

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool
Jul 17 '05 #2

P: n/a
Andy Hassall <an**@andyh.co.uk> wrote in message news:<4i********************************@4ax.com>. ..
On 4 Sep 2004 09:08:41 -0700, lk******@geocities.com (lawrence) wrote:
How do I get PHP to tell the server that when I echo text to the
screen, I need for the text to be sent as UTF-8?


Sent a content-type header with a charset attribute.
How does Apache know
the right encoding when all the text is being generated by PHP?


It doesn't, nor does it need to - that information's just for the end user.


I'm not sure if I follow you here. Yes, the information is for the end
user, or rather, the web browser (or other ua) that the end user is
using. But something has to send that information out from the
webserver. Normally Apache has some idea what it is dealing with, and
sends some kind of info, yes? A weaker solution is send a meta
http-equiv tag specifying the charset. But something somewhere has to
send that info. If the web server has no way to know the charset
because all the characters are being generated by PHP, the PHP should
send a charset header, yes?

By the way, in general, when you use echo or print in PHP, what is the
charset of the text being generated? Raw ASCII?

I'm doing this so I can output to XML without getting errors about
"You should not sent plain text".


Don't know what you mean here. XML content doesn't have to be UTF-8 encoded,
just properly escaped and the encoding set correctly.


Let's put it this way. Right now users can input whatever the hell
they want. Sometimes they write an essay in Microsoft Word and then
copy and paste the text to the input form, and input that as a weblog
entry. That post then gets added to the RSS feed for that weblog. At
first I tried to write my RSS output using Plain Text, but most
validators throw an error at that (all but radioland's). So I need to
give it a charset. So I decided to give all outgoing XML the charset
of UTF-8. Then I immediately started getting errors because lots of
users had input stuff that was not UTF-8. So what I need to do is take
all input and cast it to UTF-8. If that happens to change some
characters to garbage characters, that is fine - that throws the
problem back at the user, which is where I want it. I merely need to
let them see that they are being idiots. I'll tell them they need to
save any text from Microsoft Word as plain text. Once they start doing
that, then they won't get garbage characters and the software will
output valid XML and RSS.

Is this enough? Any conflicts with Apache?

$sent = headers_sent();
if (!$sent) header("Content-type:text/html;charset:UTF-8");


Shouldn't the : after charset be an = sign? i.e.

Content-type: text/html; charset=utf-8

That would be enough, provided it's actually sent (i.e. $sent is false).


Thanks for catching the bit about the equal sign.
Jul 17 '05 #3

P: n/a
try header('content-type:text/html; charset=UTF-8');

--
Tony Marston

http://www.tonymarston.net
"lawrence" <lk******@geocities.com> wrote in message
news:da**************************@posting.google.c om...
Andy Hassall <an**@andyh.co.uk> wrote in message
news:<4i********************************@4ax.com>. ..
On 4 Sep 2004 09:08:41 -0700, lk******@geocities.com (lawrence) wrote:
>How do I get PHP to tell the server that when I echo text to the
>screen, I need for the text to be sent as UTF-8?


Sent a content-type header with a charset attribute.
> How does Apache know
>the right encoding when all the text is being generated by PHP?


It doesn't, nor does it need to - that information's just for the end
user.


I'm not sure if I follow you here. Yes, the information is for the end
user, or rather, the web browser (or other ua) that the end user is
using. But something has to send that information out from the
webserver. Normally Apache has some idea what it is dealing with, and
sends some kind of info, yes? A weaker solution is send a meta
http-equiv tag specifying the charset. But something somewhere has to
send that info. If the web server has no way to know the charset
because all the characters are being generated by PHP, the PHP should
send a charset header, yes?

By the way, in general, when you use echo or print in PHP, what is the
charset of the text being generated? Raw ASCII?

>I'm doing this so I can output to XML without getting errors about
>"You should not sent plain text".


Don't know what you mean here. XML content doesn't have to be UTF-8
encoded,
just properly escaped and the encoding set correctly.


Let's put it this way. Right now users can input whatever the hell
they want. Sometimes they write an essay in Microsoft Word and then
copy and paste the text to the input form, and input that as a weblog
entry. That post then gets added to the RSS feed for that weblog. At
first I tried to write my RSS output using Plain Text, but most
validators throw an error at that (all but radioland's). So I need to
give it a charset. So I decided to give all outgoing XML the charset
of UTF-8. Then I immediately started getting errors because lots of
users had input stuff that was not UTF-8. So what I need to do is take
all input and cast it to UTF-8. If that happens to change some
characters to garbage characters, that is fine - that throws the
problem back at the user, which is where I want it. I merely need to
let them see that they are being idiots. I'll tell them they need to
save any text from Microsoft Word as plain text. Once they start doing
that, then they won't get garbage characters and the software will
output valid XML and RSS.

>Is this enough? Any conflicts with Apache?
>
> $sent = headers_sent();
> if (!$sent) header("Content-type:text/html;charset:UTF-8");


Shouldn't the : after charset be an = sign? i.e.

Content-type: text/html; charset=utf-8

That would be enough, provided it's actually sent (i.e. $sent is false).


Thanks for catching the bit about the equal sign.

Jul 17 '05 #4

P: n/a
"Tony Marston" <to**@NOSPAM.demon.co.uk> wrote in message news:<ci*******************@news.demon.co.uk>...
try header('content-type:text/html; charset=UTF-8');


The only difference I see in what you wrote is that "content" starts
with a lower case "c". Are you saying these headers are case
sensitive?
Jul 17 '05 #5

P: n/a

"lawrence" <lk******@geocities.com> wrote in message
news:da**************************@posting.google.c om...
"Tony Marston" <to**@NOSPAM.demon.co.uk> wrote in message
news:<ci*******************@news.demon.co.uk>...
try header('content-type:text/html; charset=UTF-8');


The only difference I see in what you wrote is that "content" starts
with a lower case "c". Are you saying these headers are case
sensitive?


No, but that is what I use and it works.

--
Tony Marston

http://www.tonymarston.net

Jul 17 '05 #6

P: n/a
On 12 Sep 2004 11:14:10 -0700, lk******@geocities.com (lawrence) wrote:
Andy Hassall <an**@andyh.co.uk> wrote in message news:<4i********************************@4ax.com>. ..
On 4 Sep 2004 09:08:41 -0700, lk******@geocities.com (lawrence) wrote:
>How do I get PHP to tell the server that when I echo text to the
>screen, I need for the text to be sent as UTF-8?
Sent a content-type header with a charset attribute.
How does Apache know
the right encoding when all the text is being generated by PHP?


It doesn't, nor does it need to - that information's just for the end user.


I'm not sure if I follow you here. Yes, the information is for the end
user, or rather, the web browser (or other ua) that the end user is
using. But something has to send that information out from the
webserver. Normally Apache has some idea what it is dealing with, and
sends some kind of info, yes?


It may send Content-type determined by the MIME type for the extension, or
looked up through mime-magic, but it generally doesn't know character set, and
to my knowledge Apache itself won't send the character set part of the header
itself - it just sends 'data' in a character-set agnostic way.

You can set it up so that Apache sends a character set header with content
negotiation settings, though, but you need to provide the server with more
information in that case.
A weaker solution is send a meta
http-equiv tag specifying the charset. But something somewhere has to
send that info. If the web server has no way to know the charset
because all the characters are being generated by PHP, the PHP should
send a charset header, yes?
Yes. There's an option in php.ini as to which character set to default to - I
think the default default is iso8859-1. (Although really ought to be iso8859-15
due to the Euro).
By the way, in general, when you use echo or print in PHP, what is the
charset of the text being generated? Raw ASCII?


(ASCII only goes up to 127)

Depends what Content-type header has been sent as to how the output is
interpreted. PHP won't do any conversion from the binary representation of
anything output, it's just sent as-is. (It might be image data, for example, if
you've sent an image/jpeg content-type header.)
>I'm doing this so I can output to XML without getting errors about
>"You should not sent plain text".


Don't know what you mean here. XML content doesn't have to be UTF-8 encoded,
just properly escaped and the encoding set correctly.


Let's put it this way. Right now users can input whatever the hell
they want. Sometimes they write an essay in Microsoft Word and then
copy and paste the text to the input form, and input that as a weblog
entry. That post then gets added to the RSS feed for that weblog. At
first I tried to write my RSS output using Plain Text, but most
validators throw an error at that (all but radioland's). So I need to
give it a charset. So I decided to give all outgoing XML the charset
of UTF-8. Then I immediately started getting errors because lots of
users had input stuff that was not UTF-8. So what I need to do is take
all input and cast it to UTF-8. If that happens to change some
characters to garbage characters, that is fine - that throws the
problem back at the user, which is where I want it. I merely need to
let them see that they are being idiots. I'll tell them they need to
save any text from Microsoft Word as plain text. Once they start doing
that, then they won't get garbage characters and the software will
output valid XML and RSS.


OK, but might have a piece of the puzzle missing here - you need to determine
what character set the user posted in in the first place, since it's impossible
to convert from an encoding of one character set to an encoding of another one
without knowing what the first character set encoding was.

I *think* form data is always in the character set of the page containing the
original form. I haven't got a reference to back that up, though.

I also seem to recall that some browsers (e.g. IE) will send HTML entity
encoded versions of characters pasted into a form whose character set does not
support them; e.g. Chinese characters into an iso8859-15 form turn up in their
&#xxxx; representation in the data.

Once you know that, then the mbstring extension has a function for converting
between encodings.

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool
Jul 17 '05 #7

P: n/a
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

lawrence wrote:
"Tony Marston" <to**@NOSPAM.demon.co.uk> wrote in message
news:<ci*******************@news.demon.co.uk>...
try header('content-type:text/html; charset=UTF-8');


The only difference I see in what you wrote is that "content" starts
with a lower case "c". Are you saying these headers are case
sensitive?


Hi,
No, the difference between your code and Mr. Marston's is that yours
uses a colon after the word "charset" and his uses an equals sign.
The equals sign is correct.

Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBTclkgxSrXuMbw1YRAsXeAKC7qga5M8usyxZ2cmxLPP BEyIkTXwCeNVUx
2R2Q7V9CuD+wDWIpWfIcBLQ=
=mhr2
-----END PGP SIGNATURE-----
Jul 17 '05 #8

P: n/a
Andy Hassall <an**@andyh.co.uk> wrote in message news:<p4********************************@4ax.com>. ..
It may send Content-type determined by the MIME type for the extension, or
looked up through mime-magic, but it generally doesn't know character set, and
to my knowledge Apache itself won't send the character set part of the header
itself - it just sends 'data' in a character-set agnostic way.

You can set it up so that Apache sends a character set header with content
negotiation settings, though, but you need to provide the server with more
information in that case.
A weaker solution is send a meta
http-equiv tag specifying the charset. But something somewhere has to
send that info. If the web server has no way to know the charset
because all the characters are being generated by PHP, the PHP should
send a charset header, yes?


Yes. There's an option in php.ini as to which character set to default to - I
think the default default is iso8859-1. (Although really ought to be iso8859-15
due to the Euro).


Okay, I don't get this at all. What sends the character encoding
information? If you have a set of static HTML files sitting on a
server, what is responsible for sending the character encoding? If I,
as a web-designer, am not supposed to use http-equiv meta tags,
because they are weak, then the information is not inside of the HTML
file. So the information needs to be outside of the HMTL file. And
what is outside of the HTML file? If Apache remains agnostic about
character encoding, then at what point does character encoding get
sent? Where is the information stored, and how is it sent out to web
browsers?

Every character has an encoding by default, right? If no encoding is
given, then there are a series of possible defaults, right? An Apache
server may have a default, or PHP may have a default encoding set in
the php.ini file, right? If not default is set anywhere then the
characters are basically raw text, right? In other words, ASCII? Or do
I have it all wrong?



>I'm doing this so I can output to XML without getting errors about
>"You should not sent plain text".

Don't know what you mean here. XML content doesn't have to be UTF-8 encoded,
just properly escaped and the encoding set correctly.

Sorry, I meant RSS. Most RSS validators throw an error if you try to
set up an RSS feed using plain text.

Let's put it this way. Right now users can input whatever the hell
they want. Sometimes they write an essay in Microsoft Word and then
copy and paste the text to the input form, and input that as a weblog
entry. That post then gets added to the RSS feed for that weblog. At
first I tried to write my RSS output using Plain Text, but most
validators throw an error at that (all but radioland's). So I need to
give it a charset. So I decided to give all outgoing XML the charset
of UTF-8. Then I immediately started getting errors because lots of
users had input stuff that was not UTF-8. So what I need to do is take
all input and cast it to UTF-8. If that happens to change some
characters to garbage characters, that is fine - that throws the
problem back at the user, which is where I want it. I merely need to
let them see that they are being idiots. I'll tell them they need to
save any text from Microsoft Word as plain text. Once they start doing
that, then they won't get garbage characters and the software will
output valid XML and RSS.


OK, but might have a piece of the puzzle missing here - you need to determine
what character set the user posted in in the first place, since it's impossible
to convert from an encoding of one character set to an encoding of another one
without knowing what the first character set encoding was.

I *think* form data is always in the character set of the page containing the
original form. I haven't got a reference to back that up, though.


Yes, we had quite a conversation about that over on another newsgroup.
It was quite informative. You can read it here, if you've any
interest:

http://groups.google.com/groups?hl=e...%3D10%26sa%3DN
Jul 17 '05 #9

P: n/a
On 21 Sep 2004 11:30:45 -0700, lk******@geocities.com (lawrence) wrote:
Andy Hassall <an**@andyh.co.uk> wrote in message news:<p4********************************@4ax.com>. ..
It may send Content-type determined by the MIME type for the extension, or
looked up through mime-magic, but it generally doesn't know character set, and
to my knowledge Apache itself won't send the character set part of the header
itself - it just sends 'data' in a character-set agnostic way.

You can set it up so that Apache sends a character set header with content
negotiation settings, though, but you need to provide the server with more
information in that case.
>A weaker solution is send a meta
>http-equiv tag specifying the charset. But something somewhere has to
>send that info. If the web server has no way to know the charset
>because all the characters are being generated by PHP, the PHP should
>send a charset header, yes?
Yes. There's an option in php.ini as to which character set to default to - I
think the default default is iso8859-1. (Although really ought to be iso8859-15
due to the Euro).


Okay, I don't get this at all. What sends the character encoding
information? If you have a set of static HTML files sitting on a
server, what is responsible for sending the character encoding?


Done a bit more digging, and there's this in my httpd.conf:

#
# Specify a default charset for all pages sent out. This is
# always a good idea and opens the door for future internationalisation
# of your web site, should you ever want it. Specifying it as
# a default does little harm; as the standard dictates that a page
# is in iso-8859-1 (latin1) unless specified otherwise i.e. you
# are merely stating the obvious. There are also some security
# reasons in browsers, related to javascript and URL parsing
# which encourage you to always set a default char set.
#
AddDefaultCharset ISO-8859-1
OK, so Apache sends out a character set heading under the recommended
configuration - although it's effectively hardcoded; it doesn't 'detect' the
encoding of the file since that's basically impossible in isolation.

To get Apache to send out a character set header for a specific file, you'd
then need to use Apache content negotiation if you wanted to select a different
character set for a particular file - either with a type-map or I believe it
can base it off suffixes of the filename (index.html.iso8859-p15 and so on).

Consider the following response from Apache:

andyh@server:~/public_html$ touch utf8.html.utf8
andyh@server:~/public_html$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
HEAD /~andyh/utf8.html HTTP/1.0

HTTP/1.1 200 OK
Date: Tue, 21 Sep 2004 19:19:03 GMT
Server: Apache/2.0.51 (Unix) PHP/5.0.1 DAV/2 SVN/1.0.6
Content-Location: utf8.html.utf8
Vary: negotiate
TCN: choice
Last-Modified: Tue, 21 Sep 2004 19:18:47 GMT
ETag: "3811f-0-7f9b93c0;7f9b93c0"
Accept-Ranges: bytes
Connection: close
Content-Type: text/html; charset=utf-8

Connection closed by foreign host.

OK - so a filename of utf8.html.utf8 means that a request for utf8.html comes
out in utf8 encoding. (I've got content negotiation enabled on my server).

Presumably in the case of multiple encodings for the same URI then the
browser's Accept-charset header comes into play for Apache to pick which to
serve.
If I,
as a web-designer, am not supposed to use http-equiv meta tags,
because they are weak, then the information is not inside of the HTML
file. So the information needs to be outside of the HMTL file. And
what is outside of the HTML file? If Apache remains agnostic about
character encoding, then at what point does character encoding get
sent? Where is the information stored, and how is it sent out to web
browsers?
Either a type map, or encoded in the filename. (can't speak for other servers
apart from Apache).
Every character has an encoding by default, right? If no encoding is
given, then there are a series of possible defaults, right? An Apache
server may have a default, or PHP may have a default encoding set in
the php.ini file, right?
Right.
If not default is set anywhere then the
characters are basically raw text, right? In other words, ASCII?
Ah, but even ASCII isn't raw text, depending on your definition of raw - it's
the ASCII encoding of a small-ish character set.

'Binary' is the usual definition of completely raw data - it's just a stream
of bytes with no defined correspondence to characters.

As to what the default in HTTP is - time to dig out the HTTP standards.

RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1
<ftp://ftp.isi.edu/in-notes/rfc2616.txt>

"
3.4.1 Missing Charset

Some HTTP/1.0 software has interpreted a Content-Type header without
charset parameter incorrectly to mean "recipient should guess."
Senders wishing to defeat this behavior MAY include a charset
parameter even when the charset is ISO-8859-1 and SHOULD do so when
it is known that it will not confuse the recipient.

Unfortunately, some older HTTP/1.0 clients did not deal properly with
an explicit charset parameter. HTTP/1.1 recipients MUST respect the
charset label provided by the sender; and those user agents that have
a provision to "guess" a charset MUST use the charset from the
content-type field if they support that charset, rather than the
recipient's preference, when initially displaying a document. See
section 3.7.1.
"

"
3.7.1 Canonicalization and Text Defaults

[...]

The "charset" parameter is used with some media types to define the
character set (section 3.4) of the data. When no explicit charset
parameter is provided by the sender, media subtypes of the "text"
type are defined to have a default charset value of "ISO-8859-1" when
received via HTTP. Data in character sets other than "ISO-8859-1" or
its subsets MUST be labeled with an appropriate charset value. See
section 3.4.1 for compatibility problems.
"

OK - so we officially default to ISO-8859-1, at least for text/* content
types, which is a superset of ASCII, but definitely a well-defined character
set and not just a raw stream of bytes. Makes sense.
Or do I have it all wrong?


Definitely sounds like you've got the idea.
>> >I'm doing this so I can output to XML without getting errors about
>> >"You should not sent plain text".
>>
>> Don't know what you mean here. XML content doesn't have to be UTF-8 encoded,
>> just properly escaped and the encoding set correctly.
Sorry, I meant RSS. Most RSS validators throw an error if you try to
set up an RSS feed using plain text.


Oh, is this just a case of the wrong Content-type though - text/plain or
text/html vs. text/xml or whatever it is?

[snip]
I *think* form data is always in the character set of the page containing the
original form. I haven't got a reference to back that up, though.


Yes, we had quite a conversation about that over on another newsgroup.
It was quite informative. You can read it here, if you've any
interest:

http://groups.google.com/groups?hl=e...%3D10%26sa%3DN


Hm - Netscape 4 as ever is a complete mess then! Does anyone actually use NN4
any more? It's well past time it was blasted out of existence - does it do
_anything_ right?

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool
Jul 17 '05 #10

P: n/a
Andy Hassall <an**@andyh.co.uk> wrote:
OK - so we officially default to ISO-8859-1, at least for text/* content
types, which is a superset of ASCII, but definitely a well-defined character
set and not just a raw stream of bytes. Makes sense.


Completely true... almost. text/html has unicode as it characterset
accoding to w3c[1], the charset header is nothing more than the encoding
used to transport the data. iso-8859-1 is the best choice if you need
upto the first 256 characters in unicode. If one needs more characters
the utf-x encodings should be used.

[1] http://www.w3.org/TR/html401/charset.html

--

Daniel Tryba

Jul 17 '05 #11

P: n/a
Daniel Tryba wrote:
Andy Hassall <an**@andyh.co.uk> wrote:

OK - so we officially default to ISO-8859-1, at least for text/* content
types, which is a superset of ASCII, but definitely a well-defined character
set and not just a raw stream of bytes. Makes sense.


Completely true... almost. text/html has unicode as it characterset
accoding to w3c[1],


'Character set', with or without a space, breeds confusion.

http://www.w3.org/MarkUp/html-spec/charset-harmful.html

If by 'characterset' you meant HTML4.01's document character
set, you're right. But HTML's document character set is
unrelated to this discussion. If however you meant
character encoding, you're wrong, because any encoding is
allowed. Did you mean something else?

RFC2854 sec. 6 lists sources that specify the default when a
text/html document is served without explicitly declaring
its character encoding. Despite RFC2616 defining text/*'s
default character encoding as ISO-8859-1, HTML4.01
conforming user-agents mustn't assume any default value:

'The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-
8859-1 as a default character encoding when the "charset"
parameter is absent from the "Content-Type" header field. In
practice, this recommendation has proved useless because
some servers don't allow a "charset" parameter to be sent,
and others may not be configured to send the parameter.
Therefore, user agents must not assume any default value for
the "charset" parameter.' (HTML4.01 sec. 5.2.2.)

So it'd be absurd to heed the advice given in RFC2616 sec.
19.3, which says that 'not labelling the entity is preferred
over labelling the entity with the labels US-ASCII or ISO-
8859-1'. The usual ciwa* recommendation stands, discord
notwithstanding: send a charset parameter.

[ ... ]

Roll on the weekend!

--
Jock
Jul 17 '05 #12

P: n/a
Andy Hassall <an**@andyh.co.uk> wrote in message news:<36********************************@4ax.com>. ..
OK, so Apache sends out a character set heading under the recommended
configuration - although it's effectively hardcoded; it doesn't 'detect' the
encoding of the file since that's basically impossible in isolation.

To get Apache to send out a character set header for a specific file, you'd
then need to use Apache content negotiation if you wanted to select a different
character set for a particular file - either with a type-map or I believe it
can base it off suffixes of the filename (index.html.iso8859-p15 and so on).

Consider the following response from Apache:

andyh@server:~/public_html$ touch utf8.html.utf8
andyh@server:~/public_html$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
HEAD /~andyh/utf8.html HTTP/1.0

HTTP/1.1 200 OK
Date: Tue, 21 Sep 2004 19:19:03 GMT
Server: Apache/2.0.51 (Unix) PHP/5.0.1 DAV/2 SVN/1.0.6
Content-Location: utf8.html.utf8
Vary: negotiate
TCN: choice
Last-Modified: Tue, 21 Sep 2004 19:18:47 GMT
ETag: "3811f-0-7f9b93c0;7f9b93c0"
Accept-Ranges: bytes
Connection: close
Content-Type: text/html; charset=utf-8

Connection closed by foreign host.

OK - so a filename of utf8.html.utf8 means that a request for utf8.html comes
out in utf8 encoding. (I've got content negotiation enabled on my server).

Presumably in the case of multiple encodings for the same URI then the
browser's Accept-charset header comes into play for Apache to pick which to
serve.


That's very interesting. Thanks for doing that bit of digging.

I'm sorry to say I've temporarily been handed responsibility for
keeping an Apache server going, though I don't know much about Apache.
We're hosting about 30 different domains on this machine. Most of
those domains have individuals who are handling all the web design for
that domain. If I set a default charset for Apache, how do the
individual web designers override the decision, if they need to? An
..htaccess file? http-equiv meta tags?

Just curious.
Jul 17 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.