473,397 Members | 1,972 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

utf-8 or UTF-8?

How is this for correct HTML 4.01 headers?:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang="zh-tw"><head>
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="zh-tw">
And for my English pages, en-us instead of zh-tw.
Did I screw up any details? utf-8 or UTF-8 like Google?
The page should work via http:// or file:///.

Feb 25 '06 #1
14 2122
Dan Jacobson <ji*****@jidanni.org> wrote:
How is this for correct HTML 4.01 headers?:
You actually didn't reveal the _headers_, namely the HTTP headers,
which are what really matters. If they specify the encoding
("charset"), they trump any <meta> tags (as explained so often in this
group).
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang="zh-tw"><head>
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8">
OK, but real HTTP headers still have preference. (Some people would
prefer zh-Hant to zh-tw, since the subcode is really about variant of
writing system rather than geographic area, but that's mostly
politics.)
<meta http-equiv="Content-Language" content="zh-tw">
Do you know of _any_ software that actually _uses_ the information in
such a <meta> tag, as opposite to just emitting it?
And for my English pages, en-us instead of zh-tw.
That's fine in principle, if the pages are really in US English.
utf-8 or UTF-8 like Google?
There's no difference. Names of encodings are by definition case
insensitive. For what it's worth, the official registery of names of
encodings uses "UTF-8" in uppercase:
http://www.iana.org/assignments/character-sets
The page should work via http:// or file:///.


Nothing works via file:// on the World Wide Web; the file:// URLs are
by definition system-dependent and work (at most) inside a computer or
across similar computers in a local network.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Feb 25 '06 #2
Dan Jacobson wrote:
How is this for correct HTML 4.01 headers?:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang="zh-tw"><head>
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="zh-tw">
And for my English pages, en-us instead of zh-tw.
Did I screw up any details? utf-8 or UTF-8 like Google?
The page should work via http:// or file:///.


Did you save the file as UFT-8? I often forget that ;-)
Feb 27 '06 #3
Jukka K. Korpela wrote:
Dan Jacobson <ji*****@jidanni.org> wrote:
How is this for correct HTML 4.01 headers?:


You actually didn't reveal the _headers_, namely the HTTP headers,
which are what really matters. If they specify the encoding
("charset"), they trump any <meta> tags (as explained so often in this
group).


Is it wrong to refer to the HEAD element of an HTML document as an HTML
header?

What I noticed is:
- doesn't an HTML document have only one HTML header (if such
terminology is valid)?
- the snippet includes things that aren't part of the HEAD
- it isn't complete - no TITLE
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang="zh-tw"><head>
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8">


OK, but real HTTP headers still have preference. (Some people would
prefer zh-Hant to zh-tw, since the subcode is really about variant of
writing system rather than geographic area, but that's mostly
politics.)
<meta http-equiv="Content-Language" content="zh-tw">


Do you know of _any_ software that actually _uses_ the information in
such a <meta> tag, as opposite to just emitting it?

<snip>

Specifying the language of an HTML document certainly has its uses.

http://webtips.dan.info/language.html

Where different programs look for this information is, of course,
another matter....

Stewart.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M d- s:- C++@ a->--- UB@ P+ L E@ W++@ N+++ o K-@ w++@ O? M V? PS-
PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y
------END GEEK CODE BLOCK------

My e-mail is valid but not my primary mailbox. Please keep replies on
the 'group where everyone may benefit.
Feb 28 '06 #4
In our last episode,
<du**********@sun-cc204.lut.ac.uk>,
the lovely and talented Stewart Gordon
broadcast on comp.infosystems.www.authoring.html:
Jukka K. Korpela wrote:
Dan Jacobson <ji*****@jidanni.org> wrote:
How is this for correct HTML 4.01 headers?:
You actually didn't reveal the _headers_, namely the HTTP headers,
which are what really matters. If they specify the encoding
("charset"), they trump any <meta> tags (as explained so often in this
group). Is it wrong to refer to the HEAD element of an HTML document as an HTML
header?
It is likely to lead to confusion with the http headers.
What I noticed is:
- doesn't an HTML document have only one HTML header (if such
terminology is valid)?
An HTML document may have only one HEAD element.
- the snippet includes things that aren't part of the HEAD
- it isn't complete - no TITLE
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang="zh-tw"><head>
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8">


OK, but real HTTP headers still have preference. (Some people would
prefer zh-Hant to zh-tw, since the subcode is really about variant of
writing system rather than geographic area, but that's mostly
politics.)
<meta http-equiv="Content-Language" content="zh-tw">


Do you know of _any_ software that actually _uses_ the information in
such a <meta> tag, as opposite to just emitting it?

<snip>

Specifying the language of an HTML document certainly has its uses. http://webtips.dan.info/language.html Where different programs look for this information is, of course,
another matter.... Stewart.


--
Lars Eighner us****@larseighner.com http://www.larseighner.com/
War on Terrorism: Okay, Unleash OUR Extreme Fundamentalists
"... all of them who have tried to secularize America, I point the finger in
their face and say, 'You helped this happen.'" --Jerry Falwell
Feb 28 '06 #5
Stewart Gordon wrote:
Jukka K. Korpela wrote:
Dan Jacobson <ji*****@jidanni.org> wrote:
How is this for correct HTML 4.01 headers?:
You actually didn't reveal the _headers_, namely the HTTP headers,
which are what really matters. If they specify the encoding
("charset"), they trump any <meta> tags (as explained so often in this
group).


Is it wrong to refer to the HEAD element of an HTML document as an HTML
header?


Yes, because in a document, "headers" are the introductory bits of text
at the beginning of the different sections are the content, represented
in HTML documents by the tags H1 through H6.

In a communication protocol, such as HTTP, "headers" are the attributes
of the communication itself, preceding the content and telling the
receiving application what it needs to know to process the
communication. My use of the term "receiving application" here is
narrow. In the case of a browser, I don't mean all of the modules in the
browser, including the HTML renderer. I mean just the part that is party
to the communication: the HTTP-processing component.
What I noticed is:
- doesn't an HTML document have only one HTML header (if such
terminology is valid)?
It has any number of headers (H1 through H6 elements). It has only one
*head*.
- the snippet includes things that aren't part of the HEAD
- it isn't complete - no TITLE


It contained the DOCTYPE declaration and the HTML tag, neither of which
is part of the head. It also doesn't complain the complete head, because
the title is missing, as you observe. It also doesn't contain the
closing </head> tag, but that's technically not required.
Feb 28 '06 #6
On Tue, 28 Feb 2006 15:14:05 +0000, Stewart Gordon
<sm*******@yahoo.com> wrote:
Is it wrong to refer to the HEAD element of an HTML document as an HTML
header?


Hey Stewart. Perhaps right or wrong is not so relevant, as opposed to
what will cause other coders to call foul. I've always avoided
referring to the <head> section as a header, just for this very
reason. I've no idea if it's actually correct or not, one way or the
other.

Ian
--
http://sundry.ws/
Feb 28 '06 #7
Stewart Gordon wrote:
<meta http-equiv="Content-Language" content="zh-tw">


Do you know of _any_ software that actually _uses_ the information in
such a <meta> tag, as opposite to just emitting it?


<snip>

Specifying the language of an HTML document certainly has its uses.

http://webtips.dan.info/language.html


Dan's page on specifying language is great, even though actual
utilization on such information is fairly limited at present, as Dan
mentions.

My point was the use of a <meta> tag to specify language. A <meta> tag
like this is just a surrogate for an HTTP header. The header is in this
case somewhat debatable (by HTTP protocols, Content-Language indicates
the language(s) of the intended _audience_, though is admittedly almost
splitting hairs). More importantly, does any user agent actually make
some use of the Content-Type header, whether sent as an actual header or
simulated via <meta>?

In any case, by HTML specs, the lang attribute takes precedence over the
HTTP header, so the <meta> tag is pointless if you use the lang
attribute in <html>
Feb 28 '06 #8
Harlan Messinger wrote:
Is it wrong to refer to the HEAD element of an HTML document as an
HTML header?
Yes, because in a document, "headers" are the introductory bits of text
at the beginning of the different sections are the content, represented
in HTML documents by the tags H1 through H6.


I would say that calling the HEAD element a header is misleading, but on
other grounds. The serious confusion here is between data in the HEAD
element and data in HTTP headers, especially since some data in the HEAD
element actually "simulates" HTTP headers but does _not_ take precedence
over actual HTTP headers.

The elements H1 through H6 are called headings in HTML specs, and I'd
keep them that way.

We also have THEAD (table header part) and TH (table header cell), so we
run out of terms and have to use the same word about two rather
different constructs. But what we can do is that we distinguish between
a) HTTP headers
b) HEAD part of an HTML document
c) headings in the BODY element of an HTML document
It has any number of headers (H1 through H6 elements).


Technically, yes. But it is normally good practice to have a single H1
element, since you rarely have meaningful use for two or more
_top-level_ headings. (A bilingual document containing parallel texts
could be an exception.)
Feb 28 '06 #9
Jukka K. Korpela wrote:
The elements H1 through H6 are called headings in HTML specs, and I'd
keep them that way.
Ack, that always gets me. To me "header" and "heading" are virtually the
same word, but since we're being precise here I realize I should have
thought of that.

We also have THEAD (table header part) and TH (table header cell), so we
run out of terms and have to use the same word about two rather
different constructs. But what we can do is that we distinguish between
a) HTTP headers
b) HEAD part of an HTML document
c) headings in the BODY element of an HTML document
It has any number of headers (H1 through H6 elements).


Technically, yes. But it is normally good practice to have a single H1
element, since you rarely have meaningful use for two or more
_top-level_ headings. (A bilingual document containing parallel texts
could be an exception.)


I had a feeling someone would bring that up. Note that I didn't say at
each level! I meant in the aggregate.
Mar 1 '06 #10
Jukka K. Korpela wrote:
Technically, yes. But it is normally good practice to have a single H1
element, since you rarely have meaningful use for two or more
_top-level_ headings. (A bilingual document containing parallel texts
could be an exception.)


I use multiple H1 all the time,

e.g.

<h1>first heading</h1>
....
<h1>second heading</h1>
...
<h1>third heading</h1>
....
<h1>fourth heading</h1>
....

Is that bad?
Mar 1 '06 #11
On Thu, 2 Mar 2006 12:08:55 +1300, "windandwaves"
<wi*********@coldmail.com> wrote:
I use multiple H1 all the time,

Is that bad?


Not terribly, IMO, but I'm not schooled too well in the semantic
aspects of HTML ... it's a pretty complex subject. I would say <h1> is
more for a top-level heading, and <h2> more for section headings. Why,
who knows. This post is in case someone like Jukka doesn't jump in
with the official verdict. :-)

What you said reminded me of an HTML version that never took off,
ISO-HTML, and what they have to say about headings:

https://www.cs.tcd.ie/15445/UG.HTML#H1

I guess the idea is that we want the structure of the document to be
logical. 'Fraid I can't explain more, as I don't know more. I'm
curious to see what others say on the subject.

Ian
--
http://sundry.ws/
Mar 1 '06 #12
"windandwaves" <wi*********@coldmail.com> wrote:
I use multiple H1 all the time, - - Is that bad?


H1 means first level heading. How many first levels has your document
got?

The first level heading is a heading for the document as a whole,
since "level" refers to division into structural parts. If your
document does not contain such a heading but only headings for parts
of the document, the logical move is to make them H2 elements.

On the practical side - perhaps even too practical to some people's
taste -, using H1 elements illogically results in poor _default_
rendering of the document. The default rendering is typically in very
large font and bolded. As an author, you can suggest a different
rendering, but browsers may ignore some or all of your suggestions.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Mar 2 '06 #13
On Thu, 2 Mar 2006 12:08:55 +1300, "windandwaves"
<wi*********@coldmail.com> wrote:
I use multiple H1 all the time, Is that bad?


No - because W3C HTML has no semantics defined for "sibling" headers or
for their permitted nesting. ISO HTML does define this, although I think
they permit multiple siblings, they just don't permit <h1>...<h3>
directly. ISO get it wrong here, because HTML isn't theirs to define
and they certainly can't change the semantics from the "real" version
like this.

It might not be a recommended best practice to have multiple <h1>
elements, but it's not demonstrably wrong.
Mar 2 '06 #14
Under Subject: Re: utf-8 or UTF-8?
Andy Dingley wrote:
I use multiple H1 all the time,
Is that bad?


No - because W3C HTML has no semantics defined for "sibling" headers or
for their permitted nesting.


Sorry, but you are very confused now (and confusing). The HTML
specifications define the semantics (meaning) of H1, H2, etc. - not very
exactly, but still. It rigorously defines permitted nesting: they must
not be nested (no H2 inside H1 for example). What you actually mean by
"nesting" is a different issue - and a syntactic question. The specs
also recommend against skipping header levels, though this more or less
follows from the semantics (you don't go from level 2 to level 4 without
going through level 3).
ISO HTML does define this,
Nobody really cares about ISO HTML. And it takes a long way in its
attempt to express formally a simple principle about headings, and fails.
although I think they permit multiple siblings,
If I cared about ISO HTML, I would actually check what it says instead
of writing "I think". It's on the www and probably easily googleable.
ISO get it wrong here, because HTML isn't theirs to define
and they certainly can't change the semantics from the "real" version
like this.
What you write is about syntax (specifically, an attempt to add a
syntactic constraint in a rigorous way), not semantics.
It might not be a recommended best practice to have multiple <h1>
elements, but it's not demonstrably wrong.


That statement has very little content. Compare: It might not be a
recommended best practice to have an empty <title> element (or a <title>
element with 1,000 characters in it), elements, but it's not
demonstrably wrong. (For some values of "demonstrably" and "wrong", as
your statement.)
Mar 3 '06 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: sinasalek | last post by:
i have a problem with MySQL 4.1.x and UTF8. in version 4.0, i'm using html forms with utf8 charset for inserting unicode strings. but in version 4.1.x it is not working! if i change the charset of...
5
by: Richard Lewis | last post by:
Hi there, I'm having a problem with unicode files and ftplib (using Python 2.3.5). I've got this code: xml_source = codecs.open("foo.xml", 'w+b', "utf8") #xml_source = file("foo.xml",...
6
by: Spamtrap | last post by:
I only work in Perl occasionaly, and have been searching for a solution for a conversion, and everything I found seems much too complex. All I need to do is take a simple text file and copy...
1
by: ryang | last post by:
I am trying to understand how to work with Unicode in Perl. I have read the relevant man pages (perluniintro, perlunicode, etc.) and have written severl scripts to test/verifiy my understanding. ...
0
by: Sagi Bashari | last post by:
Hello, I would like to know the status of the UTF8 support in MySQL 4.1. I tried to create a table using utf8 charset, and inserting hebrew text into it. it seems like it still treats this...
0
by: JJ | last post by:
Hi, I have a little, big, boring problem :) I have a utf8 txt file to import in a MySQL db, cause I must create a web-application in PHP for reading this information on-line. I have create...
3
by: hunterb | last post by:
I have a file which has no BOM and contains mostly single byte chars. There are numerous double byte chars (Japanese) which appear throughout. I need to take the resulting Unicode and store it in a...
4
by: H Lee | last post by:
Hi, I'm an XML newbie, and not sure if this is the appropriate newsgroup to post my question, so feel free to suggest other newgroups where I should post this message if this is the case. I'm...
7
by: amygdala | last post by:
Hi, I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other searchengines. It's working, except that the content in the XML file doesn't seem to be UTF8. (Which it should be,...
39
by: alex | last post by:
I've converted a latin1 database I have to utf8. The process has been: # mysqldump -u root -p --default-character-set=latin1 -c --insert-ignore --skip-set-charset mydb mydb.sql # iconv -f...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.