473,769 Members | 5,834 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

UTF-8 & Unicode

Do web pages have to be created in unicode in order to use UTF-8 encoding?
If so, can anyone name a free application which I can use under Windows 98
to create web pages?
Jul 20 '05
27 5148
In article <cu**********@n ews-sop.inria.fr>,
Philippe Poulard <Ph************ ****@SPAMsophia .inria.fr> wrote:
this is theory

is there anybody who knows a parser that doesn't handle iso-8859-1
corresctly ?


I don't. I do, however, know a parser that does not support (by default
without extra work) ISO-8859-15, Windows-1252 or Shift-JIS: expat.

AFAIK, in *practice* the set of safe encodings is US-ASCII, ISO-8859-1
and UTF-8. In theory, it is UTF-8 and UTF-16. The intersection of
reality and theory is UTF-8.

--
Henri Sivonen
hs******@iki.fi
http://iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 20 '05 #21
In article <hs************ *************** *@news.dnainter net.net>,
Henri Sivonen <hs******@iki.f i> wrote:
That is not a safe conclusion. XML processors are only required to
support UTF-8 and UTF-16. Support for any other encoding is an XML
processor-specific extra feature. It follows that using any encoding
other than UTF-8 or UTF-16 is unsafe.
This is an exaggeration. You might as well say: XML processors are
not required to support any particular URI scheme, so referring to
a DTD at an HTTP URI is unsafe.
If communication fails, because
someone sent an XML document in an encoding other than UTF-8 or UTF-16,
the sender is to blame.


So phone them up and ask them to change it. Not every XML document has
to be instantly useful to everyone.

-- Richard
Jul 20 '05 #22
In article <cu***********@ pc-news.cogsci.ed. ac.uk>,
ri*****@cogsci. ed.ac.uk (Richard Tobin) wrote:
You might as well say: XML processors are
not required to support any particular URI scheme, so referring to
a DTD at an HTTP URI is unsafe.


I consider external subsets on the Web harmful. Not because of HTTP URIs
but because non-validating processors are not required to process the
DTD and the usefulness of DTDs relative to their usual size is low.
Mozilla, for one, never dereferences an HTTP URI to retrieve an external
entity.

--
Henri Sivonen
hs******@iki.fi
http://iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 20 '05 #23
In article <Pi************ *************** ***@ppepc56.ph. gla.ac.uk>,
"Alan J. Flavell" <fl*****@ph.gla .ac.uk> wrote:
On Fri, 4 Feb 2005, Henri Sivonen wrote:

XML processors are only required to
support UTF-8 and UTF-16. Support for any other encoding is an XML
processor-specific extra feature.


But that's OK, since any plausible encoding produced by the editor can
be transformed by rote into utf-8 prior to subsequent XML processing
(that's the XML relevance).


Such conversion leads to bugs like this one:
https://bugzilla.mozilla.org/show_bug.cgi?id=174351

--
Henri Sivonen
hs******@iki.fi
http://iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 20 '05 #24
On Fri, 4 Feb 2005, Henri Sivonen wrote:
Newsgroups: comp.infosystem s.www.authoring.html,comp.text.xml
You should set a F'up-To. I've done this and remark only
what's relevant to c.i.w.a.html.
AFAIK, in *practice* the set of safe encodings is US-ASCII, ISO-8859-1
and UTF-8. In theory, it is UTF-8 and UTF-16. The intersection of
reality and theory is UTF-8.


Google still doesn't support UTF-16 as can be seen from
http://www.google.com/search?q=U.T.F.1-6
Hence the recommendation in
http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist
to use only UTF-8 as Unicode encoding on the WWW.

Jul 20 '05 #25
On Fri, 4 Feb 2005, Henri Sivonen wrote:
"Alan J. Flavell" <fl*****@ph.gla .ac.uk> wrote:
But that's OK, since any plausible encoding produced by the editor can
be transformed by rote into utf-8 prior to subsequent XML processing
(that's the XML relevance).


Such conversion leads to bugs like this one:
https://bugzilla.mozilla.org/show_bug.cgi?id=174351


Does it? I'll have to ask you to explain that in more detail, please.
As far as I can see, the bug relates to a byte stream which is not
valid utf-8 - which by definition is therefore not utf-8 at all.

What I'm talking about is taking a properly-labelled and
properly-formed character stream in some known encoding, and
transcoding that into properly-formed utf-8 (with appropriate
re-labelling, of course).
Jul 20 '05 #26
In article <Pi************ *************** ***@ppepc56.ph. gla.ac.uk>,
"Alan J. Flavell" <fl*****@ph.gla .ac.uk> wrote:
On Fri, 4 Feb 2005, Henri Sivonen wrote:
"Alan J. Flavell" <fl*****@ph.gla .ac.uk> wrote:
But that's OK, since any plausible encoding produced by the editor can
be transformed by rote into utf-8 prior to subsequent XML processing
(that's the XML relevance).


Such conversion leads to bugs like this one:
https://bugzilla.mozilla.org/show_bug.cgi?id=174351


Does it? I'll have to ask you to explain that in more detail, please.
As far as I can see, the bug relates to a byte stream which is not
valid utf-8 - which by definition is therefore not utf-8 at all.

What I'm talking about is taking a properly-labelled and
properly-formed character stream in some known encoding, and
transcoding that into properly-formed utf-8 (with appropriate
re-labelling, of course).


The problem is that the XML spec is not only concerned with proper UTF-8
streams but also says what to do in improper cases. If the character
encoding conversion is decoupled from the XML processor, but this is
viewed as an implementation detail so that the combination of the
converter and actual XML processor is subjected to the conformance
requirements placed on XML processors, non-conformance ensues if the
converter is lenient, which they usually are.

--
Henri Sivonen
hs******@iki.fi
http://iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 20 '05 #27
On Sat, 5 Feb 2005, Henri Sivonen wrote:
"Alan J. Flavell" <fl*****@ph.gla .ac.uk> wrote:

> But that's OK, since any plausible encoding produced by the
> editor can be transformed by rote into utf-8 prior to
> subsequent XML processing (that's the XML relevance).

[...]
The problem is that the XML spec is not only concerned with proper
UTF-8 streams but also says what to do in improper cases. If the
character encoding conversion is decoupled from the XML processor,
but this is viewed as an implementation detail so that the
combination of the converter and actual XML processor is subjected
to the conformance requirements placed on XML processors,
non-conformance ensues if the converter is lenient, which they
usually are.


Thanks. I understand your point now.

I have this feeling that there's a lot of scope for practical utility
without running the risk of falling foul of this particular problem;
but I won't drag the argument out.

all the best
Jul 20 '05 #28

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
4191
by: lawrence | last post by:
Someone on www.php.net suggested using a seems_utf8() method to test text for UTF-8 character encoding but didn't specify how to write such a method. Can anyone suggest a test that might work? Something that maybe gives 90% confidence that a given block of text is or is not UTF-8 encoded?
3
3384
by: aa | last post by:
Is it OK to include an ANSI file into a UTF-8 file?
38
5739
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I find an answer to this question (don't find it in the W3C_char_entities document). -- Haines Brown brownh@hartford-hwp.com
48
4643
by: Zenobia | last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at once) * regular expression search & replace. Normally my documents are encoded with the ISO setting. Recently I was writing an XHTML document. After changing the encoding to UTF-8 I used the
7
5002
by: Philipp Lenssen | last post by:
How do I load and save a UTF-8 document in XML in ASP/VBS? Well, the loading* is not the problem actually -- the file is in UTF-8, and understood correctly -- but once saved, the UTF-8 is replaced by what seems to be iso-8859-1 (which Flash doesn't understand, but that's another problem). Any help greatly appreciated. * Something like this...
6
18765
by: jmgonet | last post by:
Hello everybody, I'm having troubles loading a Xml string encoded in UTF-8. If I try this code: ------------------------------ XmlDocument doc=new XmlDocument(); String s="<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?><a>Schönbühl</a>"; doc.LoadXml(s); doc.Save("d:\\temp\\test.xml");
7
12151
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32, but with zero padding and hence no real conversion is necessary? If I am completely wrong and some intricate conversion operation needs to take place, can anyone give me some primer on the subject?
10
19579
by: Jed | last post by:
I have a form that needs to handle international characters withing the UTF-8 character set. I have tried all the recommended strategies for getting utf-8 characters from form input to email message and I cannot get it to work. I need to stay with classic asp for this. Here are some things I tried: 'CDONTS Call msg.SetLocaleIDs(65001)
23
5027
by: Allan Ebdrup | last post by:
I hava an ajax web application where i hvae problems with UTF-8 encoding oc chineese chars. My Ajax webapplication runs in a HTML page that is UTF-8 Encoded. I copy and paste some chineese chars from another HTML page viewed in IE7, that is also UTF-8 encoded (search for "china" on google.com). I paste the chineese chars into a content editable div. My Ajax webservice compiles an XML where the data from the content editable div is...
4
6875
by: =?ISO-8859-2?Q?Boris_Du=B9ek?= | last post by:
Hi, I have an API that returns UTF-8 encoded strings. I have a utf8 codevt facet available to do the conversion from UTF-8 to wchar_t encoding defined by the platform. I have no trouble converting when a UTF-8 encoded string comes from file - I just create a std::wifstream and imbue it with a locale that uses the utf-8 facet for std::locale::ctype. Then I just use operator>to get wstring properly decoded from UTF-8. I thought I could...
0
10199
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10032
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9849
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8861
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7393
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6661
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5433
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3948
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3551
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.