473,805 Members | 2,027 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

utf-8 or UTF-8?

How is this for correct HTML 4.01 headers?:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang="zh-tw"><head>
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="zh-tw">
And for my English pages, en-us instead of zh-tw.
Did I screw up any details? utf-8 or UTF-8 like Google?
The page should work via http:// or file:///.

Feb 25 '06 #1
14 2179
Dan Jacobson <ji*****@jidann i.org> wrote:
How is this for correct HTML 4.01 headers?:
You actually didn't reveal the _headers_, namely the HTTP headers,
which are what really matters. If they specify the encoding
("charset"), they trump any <meta> tags (as explained so often in this
group).
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang="zh-tw"><head>
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8">
OK, but real HTTP headers still have preference. (Some people would
prefer zh-Hant to zh-tw, since the subcode is really about variant of
writing system rather than geographic area, but that's mostly
politics.)
<meta http-equiv="Content-Language" content="zh-tw">
Do you know of _any_ software that actually _uses_ the information in
such a <meta> tag, as opposite to just emitting it?
And for my English pages, en-us instead of zh-tw.
That's fine in principle, if the pages are really in US English.
utf-8 or UTF-8 like Google?
There's no difference. Names of encodings are by definition case
insensitive. For what it's worth, the official registery of names of
encodings uses "UTF-8" in uppercase:
http://www.iana.org/assignments/character-sets
The page should work via http:// or file:///.


Nothing works via file:// on the World Wide Web; the file:// URLs are
by definition system-dependent and work (at most) inside a computer or
across similar computers in a local network.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Feb 25 '06 #2
Dan Jacobson wrote:
How is this for correct HTML 4.01 headers?:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang="zh-tw"><head>
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="zh-tw">
And for my English pages, en-us instead of zh-tw.
Did I screw up any details? utf-8 or UTF-8 like Google?
The page should work via http:// or file:///.


Did you save the file as UFT-8? I often forget that ;-)
Feb 27 '06 #3
Jukka K. Korpela wrote:
Dan Jacobson <ji*****@jidann i.org> wrote:
How is this for correct HTML 4.01 headers?:


You actually didn't reveal the _headers_, namely the HTTP headers,
which are what really matters. If they specify the encoding
("charset"), they trump any <meta> tags (as explained so often in this
group).


Is it wrong to refer to the HEAD element of an HTML document as an HTML
header?

What I noticed is:
- doesn't an HTML document have only one HTML header (if such
terminology is valid)?
- the snippet includes things that aren't part of the HEAD
- it isn't complete - no TITLE
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang="zh-tw"><head>
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8">


OK, but real HTTP headers still have preference. (Some people would
prefer zh-Hant to zh-tw, since the subcode is really about variant of
writing system rather than geographic area, but that's mostly
politics.)
<meta http-equiv="Content-Language" content="zh-tw">


Do you know of _any_ software that actually _uses_ the information in
such a <meta> tag, as opposite to just emitting it?

<snip>

Specifying the language of an HTML document certainly has its uses.

http://webtips.dan.info/language.html

Where different programs look for this information is, of course,
another matter....

Stewart.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M d- s:- C++@ a->--- UB@ P+ L E@ W++@ N+++ o K-@ w++@ O? M V? PS-
PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y
------END GEEK CODE BLOCK------

My e-mail is valid but not my primary mailbox. Please keep replies on
the 'group where everyone may benefit.
Feb 28 '06 #4
In our last episode,
<du**********@s un-cc204.lut.ac.uk >,
the lovely and talented Stewart Gordon
broadcast on comp.infosystem s.www.authoring.html:
Jukka K. Korpela wrote:
Dan Jacobson <ji*****@jidann i.org> wrote:
How is this for correct HTML 4.01 headers?:
You actually didn't reveal the _headers_, namely the HTTP headers,
which are what really matters. If they specify the encoding
("charset"), they trump any <meta> tags (as explained so often in this
group). Is it wrong to refer to the HEAD element of an HTML document as an HTML
header?
It is likely to lead to confusion with the http headers.
What I noticed is:
- doesn't an HTML document have only one HTML header (if such
terminology is valid)?
An HTML document may have only one HEAD element.
- the snippet includes things that aren't part of the HEAD
- it isn't complete - no TITLE
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang="zh-tw"><head>
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8">


OK, but real HTTP headers still have preference. (Some people would
prefer zh-Hant to zh-tw, since the subcode is really about variant of
writing system rather than geographic area, but that's mostly
politics.)
<meta http-equiv="Content-Language" content="zh-tw">


Do you know of _any_ software that actually _uses_ the information in
such a <meta> tag, as opposite to just emitting it?

<snip>

Specifying the language of an HTML document certainly has its uses. http://webtips.dan.info/language.html Where different programs look for this information is, of course,
another matter.... Stewart.


--
Lars Eighner us****@larseigh ner.com http://www.larseighner.com/
War on Terrorism: Okay, Unleash OUR Extreme Fundamentalists
"... all of them who have tried to secularize America, I point the finger in
their face and say, 'You helped this happen.'" --Jerry Falwell
Feb 28 '06 #5
Stewart Gordon wrote:
Jukka K. Korpela wrote:
Dan Jacobson <ji*****@jidann i.org> wrote:
How is this for correct HTML 4.01 headers?:
You actually didn't reveal the _headers_, namely the HTTP headers,
which are what really matters. If they specify the encoding
("charset"), they trump any <meta> tags (as explained so often in this
group).


Is it wrong to refer to the HEAD element of an HTML document as an HTML
header?


Yes, because in a document, "headers" are the introductory bits of text
at the beginning of the different sections are the content, represented
in HTML documents by the tags H1 through H6.

In a communication protocol, such as HTTP, "headers" are the attributes
of the communication itself, preceding the content and telling the
receiving application what it needs to know to process the
communication. My use of the term "receiving application" here is
narrow. In the case of a browser, I don't mean all of the modules in the
browser, including the HTML renderer. I mean just the part that is party
to the communication: the HTTP-processing component.
What I noticed is:
- doesn't an HTML document have only one HTML header (if such
terminology is valid)?
It has any number of headers (H1 through H6 elements). It has only one
*head*.
- the snippet includes things that aren't part of the HEAD
- it isn't complete - no TITLE


It contained the DOCTYPE declaration and the HTML tag, neither of which
is part of the head. It also doesn't complain the complete head, because
the title is missing, as you observe. It also doesn't contain the
closing </head> tag, but that's technically not required.
Feb 28 '06 #6
On Tue, 28 Feb 2006 15:14:05 +0000, Stewart Gordon
<sm*******@yaho o.com> wrote:
Is it wrong to refer to the HEAD element of an HTML document as an HTML
header?


Hey Stewart. Perhaps right or wrong is not so relevant, as opposed to
what will cause other coders to call foul. I've always avoided
referring to the <head> section as a header, just for this very
reason. I've no idea if it's actually correct or not, one way or the
other.

Ian
--
http://sundry.ws/
Feb 28 '06 #7
Stewart Gordon wrote:
<meta http-equiv="Content-Language" content="zh-tw">


Do you know of _any_ software that actually _uses_ the information in
such a <meta> tag, as opposite to just emitting it?


<snip>

Specifying the language of an HTML document certainly has its uses.

http://webtips.dan.info/language.html


Dan's page on specifying language is great, even though actual
utilization on such information is fairly limited at present, as Dan
mentions.

My point was the use of a <meta> tag to specify language. A <meta> tag
like this is just a surrogate for an HTTP header. The header is in this
case somewhat debatable (by HTTP protocols, Content-Language indicates
the language(s) of the intended _audience_, though is admittedly almost
splitting hairs). More importantly, does any user agent actually make
some use of the Content-Type header, whether sent as an actual header or
simulated via <meta>?

In any case, by HTML specs, the lang attribute takes precedence over the
HTTP header, so the <meta> tag is pointless if you use the lang
attribute in <html>
Feb 28 '06 #8
Harlan Messinger wrote:
Is it wrong to refer to the HEAD element of an HTML document as an
HTML header?
Yes, because in a document, "headers" are the introductory bits of text
at the beginning of the different sections are the content, represented
in HTML documents by the tags H1 through H6.


I would say that calling the HEAD element a header is misleading, but on
other grounds. The serious confusion here is between data in the HEAD
element and data in HTTP headers, especially since some data in the HEAD
element actually "simulates" HTTP headers but does _not_ take precedence
over actual HTTP headers.

The elements H1 through H6 are called headings in HTML specs, and I'd
keep them that way.

We also have THEAD (table header part) and TH (table header cell), so we
run out of terms and have to use the same word about two rather
different constructs. But what we can do is that we distinguish between
a) HTTP headers
b) HEAD part of an HTML document
c) headings in the BODY element of an HTML document
It has any number of headers (H1 through H6 elements).


Technically, yes. But it is normally good practice to have a single H1
element, since you rarely have meaningful use for two or more
_top-level_ headings. (A bilingual document containing parallel texts
could be an exception.)
Feb 28 '06 #9
Jukka K. Korpela wrote:
The elements H1 through H6 are called headings in HTML specs, and I'd
keep them that way.
Ack, that always gets me. To me "header" and "heading" are virtually the
same word, but since we're being precise here I realize I should have
thought of that.

We also have THEAD (table header part) and TH (table header cell), so we
run out of terms and have to use the same word about two rather
different constructs. But what we can do is that we distinguish between
a) HTTP headers
b) HEAD part of an HTML document
c) headings in the BODY element of an HTML document
It has any number of headers (H1 through H6 elements).


Technically, yes. But it is normally good practice to have a single H1
element, since you rarely have meaningful use for two or more
_top-level_ headings. (A bilingual document containing parallel texts
could be an exception.)


I had a feeling someone would bring that up. Note that I didn't say at
each level! I meant in the aggregate.
Mar 1 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
5047
by: sinasalek | last post by:
i have a problem with MySQL 4.1.x and UTF8. in version 4.0, i'm using html forms with utf8 charset for inserting unicode strings. but in version 4.1.x it is not working! if i change the charset of column, ALTER TABLE `icons` CHANGE `name_farsi` `name_farsi` VARCHAR( 99 ) CHARACTER SET utf8 COLLATE utf8_persian_ci DEFAULT NULL and change default charset of database like below code :
5
6929
by: Richard Lewis | last post by:
Hi there, I'm having a problem with unicode files and ftplib (using Python 2.3.5). I've got this code: xml_source = codecs.open("foo.xml", 'w+b', "utf8") #xml_source = file("foo.xml", 'w+b') ftp.retrbinary("RETR foo.xml", xml_source.write)
6
18336
by: Spamtrap | last post by:
I only work in Perl occasionaly, and have been searching for a solution for a conversion, and everything I found seems much too complex. All I need to do is take a simple text file and copy it, however some specific lines are in fact in UTF8 as printed garbagy characters and they need to be converted to Unicode, so that the new text file can be imported into a desktop program and into some Word documents. For the moment I would be...
1
3977
by: ryang | last post by:
I am trying to understand how to work with Unicode in Perl. I have read the relevant man pages (perluniintro, perlunicode, etc.) and have written severl scripts to test/verifiy my understanding. However, I created a script that has unexpected output. The script is below and it contains some UTF-8 encoded characters which represent all five Spanish accented vowels plus the enye (n with a tilde over it) in upper and lower case. I hope...
0
1760
by: Sagi Bashari | last post by:
Hello, I would like to know the status of the UTF8 support in MySQL 4.1. I tried to create a table using utf8 charset, and inserting hebrew text into it. it seems like it still treats this text as binary - for example the length() function returns 8 on 4 chars string, or when cretting a column using varchar(4) i can only insert 2 letters to it. So few questions:
0
2398
by: JJ | last post by:
Hi, I have a little, big, boring problem :) I have a utf8 txt file to import in a MySQL db, cause I must create a web-application in PHP for reading this information on-line. I have create a new DB in MYSQL 4.1.1a setting CHARACTER=utf8, then I have create a table t1 with character set utf8 and some fileds also with CHARACTER=utf8.
3
7777
by: hunterb | last post by:
I have a file which has no BOM and contains mostly single byte chars. There are numerous double byte chars (Japanese) which appear throughout. I need to take the resulting Unicode and store it in a DB and display it onscreen. No matter which way I open the file, convert it to Unicode/leave it as is or what ever, I see all single bytes ok, but double bytes become 2 seperate single bytes. Surely there is an easy way to convert these mixed...
4
11542
by: H Lee | last post by:
Hi, I'm an XML newbie, and not sure if this is the appropriate newsgroup to post my question, so feel free to suggest other newgroups where I should post this message if this is the case. I'm having issues using XmlTextWriter, saving it out to a file with UTF8 encoding, and seeing "dirty", or "human unreadable" characters show up *right before* the XML declaration. I need to have the XML declaration state "encoding = utf-8", but also...
7
13124
by: amygdala | last post by:
Hi, I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other searchengines. It's working, except that the content in the XML file doesn't seem to be UTF8. (Which it should be, judging by the information given on Google's webmaster helpcenter). The way I test to see if the content is UTF8, is by opening the XML file in notepad and choose 'save as...'. Normally the coding option should be set to UTF8, but now it just...
39
5879
by: alex | last post by:
I've converted a latin1 database I have to utf8. The process has been: # mysqldump -u root -p --default-character-set=latin1 -c --insert-ignore --skip-set-charset mydb mydb.sql # iconv -f ISO-8859-1 -t UTF-8 mydb.sql mydb_utf8.sql mysqlCREATE DATABASE mydb_utf8 CHARACTER SET utf8 COLLATE utf8_general_ci;
0
9716
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9596
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10604
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
7644
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6874
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5536
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5676
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4316
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3006
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.