473,387 Members | 1,573 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Determining Charset used by system or software

Question:
How can you determine the character set used by a webpage you built?

My understanding of the issue is that the character set used by an HTML
file (or any other file, for that matter) depends on your own system,
and the encoding used by it; you cannot randomly insert a

<META HTTP-EQUIV="Content-Type" content="text/html;
charset=xxx-1234-567">

entry and expect it to work. Ie, the web server doesn't re-encode your
page when serving it according to the charset specified in a meta tag.

But how do you know? Can anyone provide me with pointers as to how I
might determine this on a given machine?

Regards,
Remi.

Jan 19 '06 #1
4 2033


Rémi wrote:
How can you determine the character set used by a webpage you built?
It depends on what you under the "character set" to be, HTML 4 defines
that term here:
<http://www.w3.org/TR/html4/charset.html#h-5.1>
and the "document character set" as "Universal Character Set" (UCS)
("character-by-character equivalent to Unicode") for all HTML documents.
My understanding of the issue is that the character set used by an HTML
file (or any other file, for that matter) depends on your own system,
and the encoding used by it; you cannot randomly insert a

<META HTTP-EQUIV="Content-Type" content="text/html;
charset=xxx-1234-567">


The "character encoding" is usually a choice your text/html editor
offers. If yours does not offer that then consider getting an editor
that does.

--

Martin Honnen
http://JavaScript.FAQTs.com/
Jan 19 '06 #2
Rémi wrote :
Question:
How can you determine the character set used by a webpage you built?

My understanding of the issue is that the character set used by an HTML
file (or any other file, for that matter) depends on your own system,
The character set used by an HTML file does not depend on your own
operating system. Though there must be a *_font_* installed on the
user's operating system capable of rendering the characters of the
character set of the HTML document.
and the encoding used by it; you cannot randomly insert a

<META HTTP-EQUIV="Content-Type" content="text/html;
charset=xxx-1234-567">

entry and expect it to work.
The HTML document must be written and saved according to such
charset=xxx-1234-567 to begin with: that's done by the web author.

Ie, the web server doesn't re-encode your page when serving it according to the charset specified in a meta tag.

The server will serve the document according to its setting.

"How to make the server send out appropriate 'charset' information
depends on the server.
For Apache, this can be done via the AddCharset (Apache 1.3.10 and
later) or AddType directives, for directories or individual resources
(files). With AddDefaultCharset (Apache 1.3.12 and later), it is
possible to set the default 'charset' for a whole server."
http://www.w3.org/International/O-HTTP-charset.html

"If you are serving static files, this information can be associated
with the files by the server. The method of setting up a server to pass
character encoding information in this way will vary from server to
server. You should check with the server administrator.
As an example, Apache servers typically provide a default encoding,
which can usually be overridden by user settings. For example, a user
might add the following line to a .htaccess file to serve all files with
a .html extension as UTF-8 in this and all child directories (...)"
http://www.w3.org/International/tuto...Slide0280.html

But how do you know?
Live HTTP headers is one good tool for this.

http://livehttpheaders.mozdev.org/

http://dotavery.com/blog/archive/2004/07/23/1717.aspx

Can anyone provide me with pointers as to how I might determine this on a given machine?

Regards,
Remi.


The web author should be the one writing and saving the HTML document in
the proper character encoding. Then he should set up his web server (or
ask his web server admin) to make sure that his HTML document will be
served with the correct character encoding.

Gérard
--
remove blah to email me
Jan 20 '06 #3
On Thu, 19 Jan 2006, Rémi wrote:
How can you determine the character set used by a webpage you built?
You need to understand the character model of HTML: it's not evident
from your question that you do, and, until you do, any answer that you
get to your question is likely to be unhelpful.

The relevant section of the HTML4 specification is reasonably clearly
set out, provided one reads it without any preconceived earlier
notions from other fields (e.g word processing).
http://www.w3.org/TR/html401/charset.html

In HTML4 the "document character set" is iso-10646/unicode: that's
firmly defined and not open to negotiation.

The other important issue is what's nowadays accurately known as the
"character encoding scheme", which (for historical reasons) is defined
by that misleading MIME parameter "charset=".
My understanding of the issue is that the character set used by an
HTML file (or any other file, for that matter) depends on your own
system, and the encoding used by it;
That's basically wrong: the "document character set" in HTML4 and
afterwards is Unicode. The character encoding which is served out by
the server is very often the same as is used on the system, but that
isn't necessarily so - it depends.
you cannot randomly insert a

<META HTTP-EQUIV="Content-Type" content="text/html;
charset=xxx-1234-567">

entry and expect it to work.
Correct - you can't.
Ie, the web server doesn't re-encode your page when serving it
according to the charset specified in a meta tag.
In theory, that very much depends on the server. Russian Apache can
transcode "on the fly" to any of the Cyrillic encodings which it
supports, and it will then advertise that encoding in its real HTTP
header (in that unfortunately-named "charset=" attribute), which is
the final arbiter of the matter. Your "meta http-equiv" is only a
nuisance when that happens...

Servers which run on platforms whose native encoding is EBCDIC will
also want to transcode the EBCDIC content into an appropriate
ASCII-based encoding for the web.

But for most of the cases which simple-minded folk come into contact
with, it's true that the content is stored with the same encoding as
is served-out. Just how that encoding is "known", and gets into the
real HTTP header, is a matter for server configuration.
But how do you know?


It's a property which needs to be maintained alongside every text file
(not only HTML), and appropriately advertised when the document is
served-out. Just how that's done is a question of server
configuration etc.
Jan 20 '06 #4
On 19 Jan 2006, Rémi wrote:
X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12)

How can you determine the character set used by a webpage you built?
[ ... ]
But how do you know? Can anyone provide me with pointers as to how I
might determine this on a given machine?


What do you mean by "given machine"? Each and every computer on earth?
Or do you want to know for your own "Windows NT 5.0"?

Your editor should tell you. On MS Windows, you can generally use
UTF-8, UTF-16, or the Microsoft-specific code pages from
http://www.unicode.org/Public/MAPPIN...ICSFT/WINDOWS/
In Mozilla Composer, for example, you can choose
File > Save and Change Character Encoding
and save your document in MacBelgian if you like.

--
Netscape 3.04 does everything I need, and it's utterly reliable.
Why should I switch? Peter T. Daniels in <news:sci.lang>

Jan 20 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: lawrence | last post by:
How do I get PHP to tell the server that when I echo text to the screen, I need for the text to be sent as UTF-8? How does Apache know the right encoding when all the text is being generated by...
2
by: Luca | last post by:
I have the following problem: I'm developing a system where there are some processes that communicate each other via message queues; the message one process can send to another process is as...
0
by: tloren | last post by:
We have all seen on every software product that "minimum system requirements are......" and "recommended..........". How do they know this? Is there anyone here who knows how to find out what...
2
by: William Payne | last post by:
Hello, I am making a very simple and crude Makefile generator, that currently supports three different options: --project-name=<name_of_project>...
7
by: Jean-David Beyer | last post by:
I have six hard drives (4 SCSI and 2 EIDE) on my main machine with parts of a database on each drive. The main index is on one SCSI drive all to itself. The main data are on the other three SCSI...
6
by: Kenneth Courville | last post by:
Hello, I'm looking for assistance with the Access object model. I know this is VB, but I'm building an Office Add-using C# directed at Access 2002. I'm literate in VB, so you can reply in VB... I...
0
by: CTDev Team | last post by:
Hi, We are using Exchange Server 5.5, and have applications written in VB6 and C# that read and process emails. We are experiencing intermittent errors similar to C# Application ...
6
by: Calvin Lai | last post by:
Does anyone know the difference and usage of them? Great thanks!
8
by: =?Utf-8?B?R2VvcmdlQXRraW5z?= | last post by:
Greetings! I wrote a small Exe that simply runs Shell to load PowerPoint and launch a particular file, depending on the day of the week. However, it was set up for office 2003 (I naively hardcoded...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.