473,753 Members | 7,236 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

UTF-8 & Unicode

Do web pages have to be created in unicode in order to use UTF-8 encoding?
If so, can anyone name a free application which I can use under Windows 98
to create web pages?
Jul 20 '05 #1
27 5147
EU citizen wrote:
Do web pages have to be created in unicode in order to use UTF-8 encoding?


Yes, but that doesn't mean you need a special text editor: any plain
US-ASCII (but not ISO 8859-1) file is automatically correct in UTF-8.
Jul 20 '05 #2
[Follow-up to comp.infosystem s.www.authoring.html]

EU citizen wrote:
Do web pages have to be created in unicode in order to use UTF-8 encoding?


UTF-8 is one of the encoding scheme used for Unicode.
You should read carefully :
http://www.unicode.org/faq/
http://www.cs.tut.fi/~jkorpela/unicode/guide.html
http://ppewww.ph.gla.ac.uk/~flavell/.../internat.html
Jul 20 '05 #3
EU citizen wrote:
Do web pages have to be created in unicode in order to use UTF-8 encoding?


That's kind of a silly question because UTF-8 is a unicode encoding.
See my 3 part guide to unicode for an in-depth tutorial on creating
unicode files.

http://lachy.id.au/blogs/log/2004/12...unicode-part-1
http://lachy.id.au/blogs/log/2004/12...unicode-part-2
http://lachy.id.au/blogs/log/2005/01...unicode-part-3

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://SpreadFirefox.com/ Igniting the Web
Jul 20 '05 #4
"Lachlan Hunt" <sp***********@ gmail.com> wrote in message
news:42******** **************@ per-qv1-newsreader-01.iinet.net.au ...
EU citizen wrote:
Do web pages have to be created in unicode in order to use UTF-8
encoding?
That's kind of a silly question because UTF-8 is a unicode encoding.
See my 3 part guide to unicode for an in-depth tutorial on creating
unicode files.

http://lachy.id.au/blogs/log/2004/12...unicode-part-1
http://lachy.id.au/blogs/log/2004/12...unicode-part-2
http://lachy.id.au/blogs/log/2005/01...unicode-part-3


I wish people would give simple answers to simple questions.
This is not a silly question; See
http://www.w3schools.com/xml/xml_encoding.asp on XML Encoding. Slightly
edited, this says:

XML documents can contain foreign characters like Norwegian æøå, or French
êèé.
To let your XML parser understand these characters, you should save your XML
documents as Unicode.
Windows 95/98 Notepad cannot save files in Unicode format.
You can use Notepad to edit and save XML documents that contain foreign
characters (like Norwegian or French æøå and êèé),
But if you save the file and open it with IE 5.0, you will get an ERROR
MESSAGE.

Windows 95/98 Notepad files must be saved with an encoding attribute.
To avoid this error you can add an encoding attribute to your XML
declaration, but you cannot use Unicode.
The encoding below (open it with IE 5.0), will NOT give an error message:
<?xml version="1.0" encoding="UTF-8"?>
Jul 20 '05 #5
In article <4Z************ *@newsfe5-win.ntli.net>,
EU citizen <no*******@form e.com> wrote:
> Do web pages have to be created in unicode in order to use UTF-8encoding? That's kind of a silly question because UTF-8 is a unicode encoding.

I wish people would give simple answers to simple questions.
It may be a simple question for you, because you know what you mean,
but for the rest of us it's a hard-to-understand question, because
if you use UTF-8, you are inevitably using Unicode, since it's a
way of writing Unicode.

But from what you say now, it looks as if your question is really
about some Windows software.
To let your XML parser understand these characters, you should save your XML
documents as Unicode.
Windows 95/98 Notepad cannot save files in Unicode format.
You can use Notepad to edit and save XML documents that contain foreign
characters (like Norwegian or French æøå and êèé),
But if you save the file and open it with IE 5.0, you will get an ERROR
MESSAGE.
Presumably this means that Notepad saves documents containing those
characters in some non-Unicode encoding, in which case you must put
an appropriate encoding declaration at the top of the document. But
you will need to know the name of the encoding that Notepad uses.

<?xml version="1.0" encoding="whate ver-the-notepad-encoding-is"?>
Windows 95/98 Notepad files must be saved with an encoding attribute.
This is mysterious. What does it mean? That Notepad won't save
them without one? Or that you have to add one to make it work
in the web browser?
To avoid this error you can add an encoding attribute to your XML
declaration, but you cannot use Unicode.
The encoding below (open it with IE 5.0), will NOT give an error message:
<?xml version="1.0" encoding="UTF-8"?>


It only makes sense to say that you're using UTF-8 if you are. If Notepad
really doesn't know about Unicode, this will only be true if you
restrict yourself to ASCII characters, because they're the same
in UTF-8 as they are in ASCII and most other common encodings.

-- Richard
Jul 20 '05 #6
On Wed, 2 Feb 2005, EU citizen wrote:
Do web pages have to be created in unicode in order to use UTF-8
encoding?


[...]
I wish people would give simple answers to simple questions.
I don't think you've understood the problem. If the questioner was in
a position to understand the "simple answer" which you say you want, I
can't imagine how they would have asked the question in that form in
the first place.
This is not a silly question;
The original questioner should not feel offended or dispirited by what
I'm going to say: but, in the form in which is was asked, the question
is incoherent.

This is not unusual: many people are confused both by the theory and
by the terminology of character representation, especially if they
gained an initial understanding in a simpler situation (typically,
character repertoires of 256 characters or less, represented by an
8-bit character encoding such as iso-8859-anything; and fonts that
were laid out accordingly).
See
http://www.w3schools.com/xml/xml_encoding.asp on XML Encoding.
How very strange. This claims to be XHTML, but, as far as I can see,
it has no character encoding specified on its HTTP Content-type header
*nor* on its <?xml...> thingy (indeed it doesn't have a <?xml...>
thingy).

In the absence of a BOM, XML is entitled to deduce that it's utf-8:
but since it's invalid utf-8, it *ought* to refuse to process it.
Unless someone can show me what I'm missing.

By looking at it, it is evidently encoded in iso-8859-1.
It purports to declare that via a "meta http-equiv", but for XML this
is meaningless - and anyway comes far too late.

I don't know why the W3C validator doesn't reject it out of hand?

(Of course the popular browsers will be slurping it as slightly
xhtml-flavoured tag soup, so we can't expect to deduce very much from
the fact that they calmly display what the author intended.)
Slightly
edited, this says:

XML documents can contain foreign characters like Norwegian æøå, or French
êèé.
And those characters are presented encoded in iso-8859-1 ...
To let your XML parser understand these characters, you should save
your XML documents as Unicode.
Two things wrong here. What do they suppose they mean by "save ... as
Unicode"? The XML Document Character Set is *by definition* Unicode,
there's nothing that an author can do to change that (unlike SGML).

Characters can be represented in at least two different ways in XML:
by /numerical character references/ (&#number;), or as /encoded
characters/ using some /character encoding scheme/. (In some contexts
there may also be named character entities, but they introduce no new
principles for the present purpose so we won't need to discuss them
here).

The only coherent interpretation I can put on their "should save as
Unicode" statement is "should save in one of the character encoding
schemes of Unicode". But /should/ we? Do they? No, they don't: they
are using iso-8859-1 (they *could* even do it correctly); and they
also discuss the use of windows-1252, although without giving much
detail about the implications of deploying a proprietary character
encoding on the WWW.

The /conclusions/ are fine, in their way:

* Use an editor that supports encoding.
* Make sure you know what encoding it uses.
* Use the same encoding attribute in your XML documents.

But the reader still hasn't really learned anything about the
underlying principles yet. And the page hasn't told them anything
useful about *which* encoding to choose for deploying their documents
on the WWW.
Windows 95/98 Notepad cannot save files in Unicode format.


Then it's unfit for composing the kind of document that we are
discussing here. No matter - there are plenty of competent editors
which can work on that platform.

My own tutorial pages weren't really aimed at XML, so I won't suggest
them as an appropriate answer here. Actually, the relevant chapter of
the Unicode specification is not unreasonable as an introduction to
the principles of character representation and encoding, even if they
might be a bit indigestible at a first reading.
Jul 20 '05 #7
/EU citizen/:
XML documents can contain foreign characters like Norwegian æøå, or French
êèé. [...] You can use Notepad to edit and save XML documents that contain foreign
characters (like Norwegian or French æøå and êèé),


Hm, I don't see any Norwegian or French characters but some Cyrillic
instead... could it be you forgot to label the encoding of your
message? ;-)

--
Stanimir
Jul 20 '05 #8
"Richard Tobin" <ri*****@cogsci .ed.ac.uk> wrote in message
news:ct******** **@pc-news.cogsci.ed. ac.uk...
In article <4Z************ *@newsfe5-win.ntli.net>,
EU citizen <no*******@form e.com> wrote:
> Do web pages have to be created in unicode in order to use UTF-8encoding?

That's kind of a silly question because UTF-8 is a unicode encoding.
I wish people would give simple answers to simple questions.


It may be a simple question for you, because you know what you mean,
but for the rest of us it's a hard-to-understand question, because
if you use UTF-8, you are inevitably using Unicode, since it's a
way of writing Unicode.

But from what you say now, it looks as if your question is really
about some Windows software.


No. I am using a version of Windows (like most computer users on this
planet). However, my question isn't specific to Windows. For all I knew,
declaring uft-8 encoding might've caused the file to be transformed into
utf-8 regardless of the original file format.

To let your XML parser understand these characters, you should save your
XMLdocuments as Unicode.
Windows 95/98 Notepad cannot save files in Unicode format.
You can use Notepad to edit and save XML documents that contain foreign
characters (like Norwegian or French æøå and êèé),
But if you save the file and open it with IE 5.0, you will get an ERROR
MESSAGE.


Presumably this means that Notepad saves documents containing those
characters in some non-Unicode encoding, in which case you must put
an appropriate encoding declaration at the top of the document. But
you will need to know the name of the encoding that Notepad uses.

<?xml version="1.0" encoding="whate ver-the-notepad-encoding-is"?>


Based on what I know now, I agree. I always assumed that Notepad, being a
simple text editor, saved files in Ascii format. Nothing in Notepad's Help,
Windows' Help or Microsoft's website says anything about the formt used by
Notepad. Through experimentation with the W3C HTML vakidator, I've worked
out that iso-8859-1will work for Notepad files with standard english text
plus acute accented vowels.
Windows 95/98 Notepad files must be saved with an encoding attribute.
This is mysterious. What does it mean? That Notepad won't save
them without one? Or that you have to add one to make it work
in the web browser?


I can't make head or tail of it.
To avoid this error you can add an encoding attribute to your XML
declaration, but you cannot use Unicode.
The encoding below (open it with IE 5.0), will NOT give an error message:
<?xml version="1.0" encoding="UTF-8"?>


It only makes sense to say that you're using UTF-8 if you are. If Notepad
really doesn't know about Unicode, this will only be true if you
restrict yourself to ASCII characters, because they're the same
in UTF-8 as they are in ASCII and most other common encodings.


The need for the XML encoding statement to match the original file format
was not mentioned in any of the (many) articles I've read on XM:/XHTML over
the last *four* years.
Jul 20 '05 #9
"Alan J. Flavell" <fl*****@ph.gla .ac.uk> wrote in message
news:Pi******** *************** *******@ppepc56 .ph.gla.ac.uk.. .
On Wed, 2 Feb 2005, EU citizen wrote:
> Do web pages have to be created in unicode in order to use UTF-8

encoding?


[...]
I wish people would give simple answers to simple questions.


I don't think you've understood the problem. If the questioner was in
a position to understand the "simple answer" which you say you want, I
can't imagine how they would have asked the question in that form in
the first place.
This is not a silly question;


The original questioner should not feel offended or dispirited by what
I'm going to say: but, in the form in which is was asked, the question
is incoherent.


I think there's a lot of miscommunicatio n going on, I don't entirely
understand what your posting.
See
http://www.w3schools.com/xml/xml_encoding.asp on XML Encoding.


How very strange. This claims to be XHTML, but, as far as I can see,
it has no character encoding specified on its HTTP Content-type header
*nor* on its <?xml...> thingy (indeed it doesn't have a <?xml...>
thingy).


<snip>

You makee a number of valid criticisms about the w3schools article, but they
turned up near the top of my Google search for information on this subject.
It just shows how difficult it is to get reliable information.
Windows 95/98 Notepad cannot save files in Unicode format.


Then it's unfit for composing the kind of document that we are
discussing here. No matter - there are plenty of competent editors
which can work on that platform.


My original question asked for suggestions about suitable applications, and
yet no one has named one.
Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
4191
by: lawrence | last post by:
Someone on www.php.net suggested using a seems_utf8() method to test text for UTF-8 character encoding but didn't specify how to write such a method. Can anyone suggest a test that might work? Something that maybe gives 90% confidence that a given block of text is or is not UTF-8 encoded?
3
3384
by: aa | last post by:
Is it OK to include an ANSI file into a UTF-8 file?
38
5739
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I find an answer to this question (don't find it in the W3C_char_entities document). -- Haines Brown brownh@hartford-hwp.com
48
4640
by: Zenobia | last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at once) * regular expression search & replace. Normally my documents are encoded with the ISO setting. Recently I was writing an XHTML document. After changing the encoding to UTF-8 I used the
7
5001
by: Philipp Lenssen | last post by:
How do I load and save a UTF-8 document in XML in ASP/VBS? Well, the loading* is not the problem actually -- the file is in UTF-8, and understood correctly -- but once saved, the UTF-8 is replaced by what seems to be iso-8859-1 (which Flash doesn't understand, but that's another problem). Any help greatly appreciated. * Something like this...
6
18764
by: jmgonet | last post by:
Hello everybody, I'm having troubles loading a Xml string encoded in UTF-8. If I try this code: ------------------------------ XmlDocument doc=new XmlDocument(); String s="<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?><a>Schönbühl</a>"; doc.LoadXml(s); doc.Save("d:\\temp\\test.xml");
7
12149
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32, but with zero padding and hence no real conversion is necessary? If I am completely wrong and some intricate conversion operation needs to take place, can anyone give me some primer on the subject?
10
19579
by: Jed | last post by:
I have a form that needs to handle international characters withing the UTF-8 character set. I have tried all the recommended strategies for getting utf-8 characters from form input to email message and I cannot get it to work. I need to stay with classic asp for this. Here are some things I tried: 'CDONTS Call msg.SetLocaleIDs(65001)
23
5026
by: Allan Ebdrup | last post by:
I hava an ajax web application where i hvae problems with UTF-8 encoding oc chineese chars. My Ajax webapplication runs in a HTML page that is UTF-8 Encoded. I copy and paste some chineese chars from another HTML page viewed in IE7, that is also UTF-8 encoded (search for "china" on google.com). I paste the chineese chars into a content editable div. My Ajax webservice compiles an XML where the data from the content editable div is...
4
6875
by: =?ISO-8859-2?Q?Boris_Du=B9ek?= | last post by:
Hi, I have an API that returns UTF-8 encoded strings. I have a utf8 codevt facet available to do the conversion from UTF-8 to wchar_t encoding defined by the platform. I have no trouble converting when a UTF-8 encoded string comes from file - I just create a std::wifstream and imbue it with a locale that uses the utf-8 facet for std::locale::ctype. Then I just use operator>to get wstring properly decoded from UTF-8. I thought I could...
0
9072
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8896
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9451
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9421
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9333
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6869
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6151
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4771
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3395
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.