I'm sure that I read somewhere that an HTML document might be
transcoded to a different characterset at some stage in its journey,
so while it might start out as (for example) ISO-8859-15, by the time
it is actually viewed its been converted to UTF-8. Maybe by whatever
the author used to upload the document to the server, maybe a a proxy,
maybe by the user agent (if it saves it to disk), maybe by the httpd
in some content negotiation.
Does anybody have any information on systems that do this in practise?
--
David Dorward http://dorward.me.uk/ 9 2036
On Tue, 28 Oct 2003, David Dorward wrote: I'm sure that I read somewhere that an HTML document might be transcoded to a different characterset at some stage in its journey, so while it might start out as (for example) ISO-8859-15, by the time it is actually viewed its been converted to UTF-8.
In theory this is true. In practice the use of such transcoding
features in servers or proxies seems to be confined to particular
communities where, for whatever reason, several incompatible character
codings are in use. I heard of Japanese transcoding proxies, but the
only ones I met directly were Russian ones, see Russian Apache for
details.
There's a URL here http://apache.lexa.ru/english/meta-http-eng.html
(with a rather remarkable figurehead ;-) but I suspect it may be out
of date. Still, it'll give you the flavour of the thing, I guess.
David Dorward wrote: I'm sure that I read somewhere that an HTML document might be transcoded to a different characterset at some stage in its journey, so while it might start out as (for example) ISO-8859-15, by the time it is actually viewed its been converted to UTF-8. Maybe by whatever the author used to upload the document to the server, maybe a a proxy, maybe by the user agent (if it saves it to disk), maybe by the httpd in some content negotiation.
Does anybody have any information on systems that do this in practise?
IE6 will often do this when saving a document locally. The FileSave
dialog box lets the user choose an encoding, and an appropriate element
like
<META http-equiv=Content-Type content="text/html; charset=utf-8">
is added or changed depending on whether the document had the element
originally.
Other changes that are made:
- <!DOCTYPE...> (HTML4.0 trans.) is added if it wasn't there.
- <META content="MSHTML 6.00.2800.1264" name=GENERATOR> is added
- All the elements are capitalized.
- Line breaks are adjusted.
- Quotes around attribute values are stripped where not required.
- Numeric character references like © may be rewritten as the
actual character if supported by the encoding.
I'm sure more changes are made, but I noticed these in a quick
examination.
I'll speculate that IE6 creates the new document from its internal
representation without reference to the original source.
Even more oddly, sometimes the document is saved as a verbatim copy of
the source. Perhaps this only happens when the declared encoding and the
user's chosen encoding are identical.
Andrew Graham
On Tue, 28 Oct 2003 18:09:47 GMT, "Andrew Graham"
<an*********************@nospam.invalid> wrote: I'll speculate that IE6 creates the new document from its internal representation without reference to the original source.
Yes it's a representation of the document tree, and bears no relation
to the original source.
Even more oddly, sometimes the document is saved as a verbatim copy of the source. Perhaps this only happens when the declared encoding and the user's chosen encoding are identical.
It normally depends if you say "save web page complete" or "save web
page html only" the first is a normalised source, the second the
actual source.
Jim.
--
comp.lang.javascript FAQ - http://jibbering.com/faq/
On Tue, 28 Oct 2003, Andrew Graham wrote: IE6 will often do this when saving a document locally.
Good point. Mozilla Composer can also do this when one chooses an
encoding and then saves the edited document.
I thought the questioner was more interested in automated transcoding
in servers and proxies...?
Alan J. Flavell wrote: I thought the questioner was more interested in automated transcoding in servers and proxies...?
No no, any system that does it is of interest.
--
David Dorward http://dorward.me.uk/
On Tue, 28 Oct 2003, David Dorward wrote: I thought the questioner was more interested in automated transcoding in servers and proxies...?
No no, any system that does it is of interest.
Well, you're in the best position to know what you're interested in
;-) so please excuse me for assuming. Can't think of any other
examples at the moment though.
In article <ca*************************@posting.google.com> , one of infinite monkeys
at the keyboard of do*****@yahoo.com (David Dorward) wrote: I'm sure that I read somewhere that an HTML document might be transcoded to a different characterset at some stage in its journey, so while it might start out as (for example) ISO-8859-15, by the time it is actually viewed its been converted to UTF-8.
Yes, there are certainly reasons why that might happen.
Most markup parsers work internally with a selected charset, and
documents at input. They can transcode back on output, but this
is then an extra overhead. Several of my modules generate all output
as UTF-8, leaving you the option to filter it through a transcoding
module if you want something else. XSLT of course has its own rules,
but will typically be fastest if you use the processor's internal
charset for output.
Does anybody have any information on systems that do this in practise?
Come and see my talk at ApacheCon!
--
Nick Kew
In urgent need of paying work - see http://www.webthing.com/~nick/cv.html
In article <3f***************@news.cis.dfn.de>, ji*@jibbering.com (Jim Ley) wrote: On Tue, 28 Oct 2003 18:09:47 GMT, "Andrew Graham" <an*********************@nospam.invalid> wrote:
I'll speculate that IE6 creates the new document from its internal representation without reference to the original source.
Yes it's a representation of the document tree, and bears no relation to the original source.
However, if the document is reparsed, the new tree is not necessarily
the same due to whitespace introduced by pretty printing, which may
affect scripts. Also, due to the doctype change, the layout mode may be
different after reparse.
--
Henri Sivonen hs******@iki.fi http://iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
"Andrew Graham" <an*********************@nospam.invalid> wrote: IE6 will often do this when saving a document locally.
Don't do this then. Rather choose "View source" and save in your text
editor. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: VK |
last post by:
09/30/03 Phil Powell posted his "Radio buttons do not appear checked"
question.
This question led to a long discussion about the naming rules applying to
variables, objects, methods and properties...
|
by: Boris Ammerlaan |
last post by:
This notice is posted about every week. I'll endeavor to use the same
subject line so that those of you who have seen it can kill-file the
subject; additionally, Supersedes: headers are used to...
|
by: Francois Keyeux |
last post by:
hello everyone:
i have a web site built using vbasic active server scripting running on
iis (it works on either iis 50 and 60, but is designed for iis 50)
i know how to create a plain text...
|
by: HeroOfSpielburg |
last post by:
Hello,
I am trying to using the Shift_JIS character set in my web pages, and
have specified it as such in the <head>.
<meta http-equiv="Content-Type" content="text/html;
charset=Shift_JIS">
...
|
by: serge calderara |
last post by:
Dear all,
I am new in asp.net and prepare myself for exam
I still have dificulties to understand the difference between server control
and HTML control.
Okey things whcih are clear are the fact...
|
by: Matthias Langbein |
last post by:
Hi all,
when i convert a uploaded file to UTF-8 with the utf8_encode function,
the string is prefixed by the two characters
ÿþ
The file is originally encoded as UTF-16. Can anybody tell me,...
|
by: alou131 |
last post by:
Hello all!
I have this server side video transcoding script that works on all video files uploaded and transcodes them to the .flv format. The problem is when a video file that is already in...
|
by: Guy Macon |
last post by:
cwdjrxyz wrote:
HTML 5 has solved the above probem. See the following web page:
HTML 5, one vocabulary, two serializations
http://www.w3.org/QA/2008/01/html5-is-html-and-xml.html
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
| |