473,671 Members | 2,580 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Content-type META tag paradox?

Why would anyone ever have expected a content-type META tag to be effective
at all? Is it because someone was misled by the happenstance the letters of
the alphabet, the digits, and the characters {< > / ; , " ' =} happen to be
at the same locations in several particular common encodings (US-ASCII,
ISO-8859-1, etc.)?

Even assuming that these characters are always in the same locations, before
it can find my META tag in the first place, the UA has to make an *initial*
assumption as to whether the document is even 8-bit versus 16-bit.

Let's say I were perverse enough to save my HTML document using an editor
that encodes text as EBCDIC. How would the UA figure out not only where my
META tag is, but where ANYTHING is?

And that just addreses the encoding. How about the content-type? What if I
were twisted enough to tell the UA, which has been assuming that my document
is an HTML text document just long enough to reach and parse my META tag in
the first place, that the document is really text/plain? (In that case,
would it start over from the beginning based on text/plain, this time not
doing any parsing at all, and therefore not finding a META tag at all, and
therefore not finding an override for the default supposition of text/html,
and therefore AGAIN starting from the beginning and parsing the document as
HTML, and then finding the META tag and realizing the document is
text/plain, and starting all over again based on that assumption, and then
.....) Or image/gif?

So I'm just curious how the content-type META tag got into the spec in the
first place. It seems to defy logic.

--
Harlan Messinger
Remove the first dot from my e-mail address.
Veuillez ôter le premier point de mon adresse de courriel.

Jul 20 '05 #1
40 2372
"Harlan Messinger" <h.*********@co mcast.net> a écrit dans le message de
news:c5******** ****@ID-114100.news.uni-berlin.de
Let's say I were perverse enough to save my HTML document using an
editor that encodes text as EBCDIC. How would the UA figure out not
only where my META tag is, but where ANYTHING is?


I can't understand - HTML must use only us-ascii characters, either do the
content type attributes values...
Hu ?

Jul 20 '05 #2
"Harlan Messinger" <h.*********@co mcast.net> a écrit dans le message de
news:c5******** ****@ID-114100.news.uni-berlin.de
Let's say I were perverse enough to save my HTML document using an
editor that encodes text as EBCDIC. How would the UA figure out not
only where my META tag is, but where ANYTHING is?


I can't understand - HTML must use only us-ascii characters, either do the
content type attributes values...
Hu ?

Jul 20 '05 #3
Harlan Messinger wrote:
[snip]
And that just addreses the encoding. How about the content-type? What if I
were twisted enough to tell the UA, which has been assuming that my
document is an HTML text document just long enough to reach and parse my
META tag in the first place, that the document is really text/plain? (In
that case, would it start over from the beginning based on text/plain,
this time not doing any parsing at all, and therefore not finding a META
tag at all, and therefore not finding an override for the default
supposition of text/html, and therefore AGAIN starting from the beginning
and parsing the document as HTML, and then finding the META tag and
realizing the document is text/plain, and starting all over again based on
that assumption, and then ....) Or image/gif?
The HTTP 1.1 specification makes it clear that the Content-Type header
should take precedence over anything that may be in the response body:

"If and only if the media type is not given by a Content-Type field, the
recipient MAY attempt to guess the media type via inspection of its content
and/or the name extension(s) of the URI used to identify the resource."

-- <URL:http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7. 2.1>

So if the HTTP headers say it's text/plain, text/plain it is (unless you use
a browser that violates the HTTP 1.1 specification).

So I'm just curious how the content-type META tag got into the spec in the
first place. It seems to defy logic.


According to the HTML 4.01 specification, <meta> elements with http-equiv
attributes are designed to be parsed by the server and converted to proper
HTTP headers. In reality, this is usually impractical, and browsers
started to pay attention themselves a long time ago. As you've said, this
can lead to some stupid results.

"HTTP servers may use the property name specified by the http-equiv
attribute to create an [RFC822]-style header in the HTTP response."

-- <URL:http://www.w3.org/TR/html401/struct/global.html#h-7.4.4.2>

As far as the paradox of figuring out the character encoding goes, the way I
understand it is that if the HTTP headers don't indicate the encoding, it
defaults to US-ASCII, and when it gets to the relevant <meta> element, the
browser has the option of starting again with that character encoding.
This means that as long as you use a superset of US-ASCII, it will "work".

I believe the rules for default character encodings change when you start
talking about XHTML (plus you have to throw the XML prolog into the mix).
Also remember that a browser can (reliably?) detect UTF-16 by the BOM.

You'll probably want to read through this lot if you haven't already:

<URL:http://ppewww.ph.gla.a c.uk/~flavell/charset/>
--
Jim Dabell

Jul 20 '05 #4
Harlan Messinger wrote:
[snip]
And that just addreses the encoding. How about the content-type? What if I
were twisted enough to tell the UA, which has been assuming that my
document is an HTML text document just long enough to reach and parse my
META tag in the first place, that the document is really text/plain? (In
that case, would it start over from the beginning based on text/plain,
this time not doing any parsing at all, and therefore not finding a META
tag at all, and therefore not finding an override for the default
supposition of text/html, and therefore AGAIN starting from the beginning
and parsing the document as HTML, and then finding the META tag and
realizing the document is text/plain, and starting all over again based on
that assumption, and then ....) Or image/gif?
The HTTP 1.1 specification makes it clear that the Content-Type header
should take precedence over anything that may be in the response body:

"If and only if the media type is not given by a Content-Type field, the
recipient MAY attempt to guess the media type via inspection of its content
and/or the name extension(s) of the URI used to identify the resource."

-- <URL:http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7. 2.1>

So if the HTTP headers say it's text/plain, text/plain it is (unless you use
a browser that violates the HTTP 1.1 specification).

So I'm just curious how the content-type META tag got into the spec in the
first place. It seems to defy logic.


According to the HTML 4.01 specification, <meta> elements with http-equiv
attributes are designed to be parsed by the server and converted to proper
HTTP headers. In reality, this is usually impractical, and browsers
started to pay attention themselves a long time ago. As you've said, this
can lead to some stupid results.

"HTTP servers may use the property name specified by the http-equiv
attribute to create an [RFC822]-style header in the HTTP response."

-- <URL:http://www.w3.org/TR/html401/struct/global.html#h-7.4.4.2>

As far as the paradox of figuring out the character encoding goes, the way I
understand it is that if the HTTP headers don't indicate the encoding, it
defaults to US-ASCII, and when it gets to the relevant <meta> element, the
browser has the option of starting again with that character encoding.
This means that as long as you use a superset of US-ASCII, it will "work".

I believe the rules for default character encodings change when you start
talking about XHTML (plus you have to throw the XML prolog into the mix).
Also remember that a browser can (reliably?) detect UTF-16 by the BOM.

You'll probably want to read through this lot if you haven't already:

<URL:http://ppewww.ph.gla.a c.uk/~flavell/charset/>
--
Jim Dabell

Jul 20 '05 #5
"Jim Dabell" <ji********@jim dabell.com> a écrit dans le message de
news:7K******** ************@gi ganews.com
According to the HTML 4.01 specification, <meta> elements with
http-equiv attributes are designed to be parsed by the server and
converted to proper HTTP headers.
The rec says "may", so this is not mandatory. By the way, if anyone could
post a list of HTTPd that really retrieve the HTML content-type META value
and sends it in the response http header, I would be very glad 8) Can't find
this information anywhere since years
In reality, this is usually impractical, and browsers
started to pay attention themselves a long time ago.


I didn't understand, can you please explain ?

Jul 20 '05 #6
"Jim Dabell" <ji********@jim dabell.com> a écrit dans le message de
news:7K******** ************@gi ganews.com
According to the HTML 4.01 specification, <meta> elements with
http-equiv attributes are designed to be parsed by the server and
converted to proper HTTP headers.
The rec says "may", so this is not mandatory. By the way, if anyone could
post a list of HTTPd that really retrieve the HTML content-type META value
and sends it in the response http header, I would be very glad 8) Can't find
this information anywhere since years
In reality, this is usually impractical, and browsers
started to pay attention themselves a long time ago.


I didn't understand, can you please explain ?

Jul 20 '05 #7
"Pierre Goiffon" <pg******@nowhe re.invalid> wrote:
Let's say I were perverse enough to save my HTML document using an
editor that encodes text as EBCDIC. How would the UA figure out not
only where my META tag is, but where ANYTHING is?


I can't understand - HTML must use only us-ascii characters, either
do the content type attributes values...


Where did you get such ideas?

HTML surely needs some characters that belong to the Ascii repertoire,
such as "<" and "a". But there is no requirement that they be represented
in the Ascii encoding.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #8
"Pierre Goiffon" <pg******@nowhe re.invalid> wrote:
Let's say I were perverse enough to save my HTML document using an
editor that encodes text as EBCDIC. How would the UA figure out not
only where my META tag is, but where ANYTHING is?


I can't understand - HTML must use only us-ascii characters, either
do the content type attributes values...


Where did you get such ideas?

HTML surely needs some characters that belong to the Ascii repertoire,
such as "<" and "a". But there is no requirement that they be represented
in the Ascii encoding.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #9
"Jukka K. Korpela" <jk******@cs.tu t.fi> a écrit dans le message de
news:Xn******** *************** ******@193.229. 0.31
I can't understand - HTML must use only us-ascii characters, either
do the content type attributes values...


Where did you get such ideas?


I was talking about HTML code, not about contents ! Tags like <html>,
<head>, etc
Contents in a HTML document needs of course all the meaning to contains
international characters, from european to chinese.

I was pretty sure to have read that HTML code is only composed of us-ascii
characters but... can't find out that somewhere now :(
If this is confirmed, as said Jim Dabell just below, a browser that
retrieves a document with no http charset information could I guess easyly
gets the html content-type meta value by parsing the document in us-ascii.
In a well formed document, all the structured delimiter for that meta
(doctype, html, head) would be indeed encoded using us-ascii only.

Did I explained it much clearer ? Sorry but it's just difficult for me to
express in english which is not at all my mother tongue.

Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
3547
by: Jimmy Cerra | last post by:
I recently came up with a cool little stylesheet for definition lists. There is a small demostration for the impatient . I hope this helps someone. Here's how I did it. Definition lists are usually styled something like: ] Term ] A tab of white space followed by the definition ] By
0
2130
by: Scott Abel | last post by:
For immediate release: The Rockley Group Content Management Workshop Series Coming to Atlanta, Seattle, Vancouver, Chicago, Washington, DC, Toronto, and Research Triangle Park Learn more: http://www.rockley.com/workshops.htm The Rockley Group Content Management Workshop Series is designed to
14
2130
by: j1c | last post by:
How can I remove the content in between tags? I have a page that has several custom tags: <!--tag:1--> Content 1 <!--/tag:1--> <br> <!--tag:2--> Content 2 <!--/tag:2--> <br> <!--tag:3--> Content 3 <!--/tag:3--> If I only wanted to see the contents of tag 2 for example, how could I strip out 1 and 3?
1
1546
by: Richard | last post by:
http://dynamicdrive.com/dynamicindex5/linkinfo.htm Using the above script, I have a plan whereby when the main link is active, two different content swaps take place. Column A shows the main menu which is working just fine. In column b I have two subdivisions. Top is for thumbnails, bottom is for description. What I am looking at doing, if possible, is to have the thumbnails and
7
370
by: Water Cooler v2 | last post by:
I know what it is, and yet the knowledge of what a CMS is, is so vague that I find myself asking this question every now and then. I've googled and read the resources too. However, the knowledge is still not clear. It is so vague. Me: Just what is a content management system? Myself: A place where content can be stored.
0
2096
by: Managed Code | last post by:
Hello All, Here is my issue and thanks in advance for any assistance. I have a base page with a dropdownlist that fires an event with the selected index. The content page catches the event and sets a connection string to the database. The content page has a simple gridview that should show records from the selected database. Initial content page displays data from correct place. first change of dropdownlist correctly updates content...
1
2275
by: dave8421 | last post by:
Hi, I'm trying to make sense of the definition for "Rendered Content" in current CR for CSS 2.1 Is rendered content what is displayed on the particular media or device? from the definitions, I got the impression that the rendered content is referring to the source document and not the particular
9
2951
by: pbd22 | last post by:
Hi. This is just a disaster management question. I am using XMLHTTP for the dynamic loading of content in a very crucial area of my web site. Same as an IFrame, but using XMLHTTP and a DIV. I got the core of the javascript from here: http://www.dynamicdrive.com/dynamicindex17/ajaxcontent.htm I noticed in the demo that sometimes the content takes a long
5
2180
by: jessy | last post by:
hi, i have an FreeRTE editor used in my site but whenever i enter something in the TextArea of the editor is not showing in the DB , can anyone tell me what's wrong with my that Piece of Code function freeRTE_Preload($content) { // // Strip newline characters. $content = str_replace(chr(10), " ", $content); $content = str_replace(chr(13), " ", $content);
0
8403
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8930
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8828
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8605
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
5704
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4227
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4417
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2062
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1816
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.