473,394 Members | 1,829 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

chinese encoded in UTF-8 and XML

Hi, I wrote a XML file with GNU emacs 21.2.2 and with
chinese character content encoded in UTF-8.
I wrote something like:

<?xml version="1.0" encoding="UTF-8"?>
<test>
<chinese>¼»</chinese>
<chinese>ÄÎ</chinese>
</test>

and then I used "C-x RET f" and then I choosed utf-8.
Then I typed "C-x C-s" to save my file.
I hope this is the right way in emacs to store the content
as UTF-8 encoded text ?!
Now I tried to parse the file with xmllint. xmllint is a
small xml-parser program which comes with libxml2.
The parser complains that the second "chinese line" is not proper
UTF-8.

==>

uhu:4: error: Input is not proper UTF-8, indicate encoding !
<chinese>ÄÎ</chinese>
^
uhu:4: error: Bytes: 0xC4 0xCE 0x3C 0x2F
<chinese>ÄÎ</chinese>

It is interesting that the parser only grumbles about the second
chinese line.

I'm anxious to see an explanation !

Jul 20 '05 #1
4 6449
Knackeback <kn********@randspringer.de> wrote:
Content-Type: text/plain; charset=big5

Hi, I wrote a XML file with GNU emacs 21.2.2 and with
chinese character content encoded in UTF-8.
[...]
I hope this is the right way in emacs to store the content
as UTF-8 encoded text ?!
Probably not.
uhu:4: error: Input is not proper UTF-8, indicate encoding !
uhu:4: error: Bytes: 0xC4 0xCE 0x3C 0x2F


It seems your text was Big5-encoded, not UTF-8-encoded.
Jul 20 '05 #2
Knackeback <kn********@randspringer.de> writes:
Hi, I wrote a XML file with GNU emacs 21.2.2 and with
chinese character content encoded in UTF-8.
I wrote something like:

<?xml version="1.0" encoding="UTF-8"?>
<test>
<chinese>¼»</chinese>
<chinese>ÄÎ</chinese>
</test>

and then I used "C-x RET f" and then I choosed utf-8.
Then I typed "C-x C-s" to save my file.
I hope this is the right way in emacs to store the content
as UTF-8 encoded text ?!
Now I tried to parse the file with xmllint. xmllint is a
small xml-parser program which comes with libxml2.
The parser complains that the second "chinese line" is not proper
UTF-8.

==>


FWICT, Emacs doesn't have a chinese input method which supports
unicode output... :-( ...I've had similar troubles with
Japanese. I've also noted that, e.g. for greek, there are input
methods which explicitly support unicode, and others which do
not.

-Micah
Jul 20 '05 #3
>> and then I used "C-x RET f" and then I choosed utf-8.
Then I typed "C-x C-s" to save my file.
[...] FWICT, Emacs doesn't have a chinese input method which supports
unicode output... :-( ...I've had similar troubles with


But since he specified utf-8, Emacs should have complained rather than
silently use some other coding-system.
Please report the bug with M-x report-emacs-bug.
Stefan
Jul 20 '05 #4
Knackeback <kn********@randspringer.de> writes:
Hi, I wrote a XML file with GNU emacs 21.2.2 and with
chinese character content encoded in UTF-8.
I wrote something like:

<?xml version="1.0" encoding="UTF-8"?>
<test>
<chinese>¼»</chinese>
<chinese>ÄÎ</chinese>
</test>


In my Gnus on Emacs 21.3, I saw the Chinese characters in BIG5.
Maybe you should download MULE-UCS package and install it. With the
package, I can just enter BIG5 encoded Chinese characters, and specify
coding to utf-8, and I got utf-8 encoding text file.
Download mule-ucs from ftp://ftp.m17n.org, and add the lines below
to your .emacs file. The function of BIG5 to UTF-8 conversion is
defined in big5c-ucs.el, which is located in mule-ucs/lisp/big5conv

(add-to-list 'load-path "/path/to/your/mule-ucs/")
(add-to-list 'load-path "/path/to/your/mule-ucs/lisp")

(require 'un-define)
(require 'big5c-ucs)

--
Chun-Chieh Huang, aka Albert | E-mail: jjhuang AT cm.nctu.edu.tw
¶À«T³Ç |
Department of Computer Science |
National Tsing Hua University | MIME/ASCII/PDF/PostScript are welcome!
HsinChu, Taiwan | NO MS WORD DOC FILE, PLEASE!
Jul 20 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Kobi Lurie | last post by:
Hello all, I'm trying to make a simple script beginner level script, with just functions. it uses the functions: file_get_contents substr taking into an array the text substr took then...
0
by: Alex Chan | last post by:
Hi group, I am writing a RFC Server with SAP.NET Connector to connect to SAP. There are chinese characters passing back and forth. I found that all chinese characters sending from SAP are...
4
by: K | last post by:
I've an XML file in UTF-8. It contains some chinese characters ( both simplified chinese and traditional chinese). In loading the XML file with MSXML parser, I used the below code to retrieve...
3
by: msnews.microsoft.com | last post by:
Hey there, I'm having trouble reading Simple Chinese characters from an XML document in an ASP file, I want to update the database based on what is in the file. Everytime, I read in the...
7
by: kernel1983 | last post by:
I'm try to build a bundle on OS X, so I write a simple python script for a test: #!/usr/bin/env python import EasyDialogs EasyDialogs.Message("Hello,Mac!") This runs OK,but when I try to...
2
by: Taras_96 | last post by:
Hi everyone, Firstly, I would like to know if you can open chinese filenames under win2000 using PHP 5.0? I have a file named 中国.php, and try to open it using fopen(‘中国.php','r');....
2
by: Clive Green | last post by:
Hello peeps, I am using PHP 5.2.2 together with MP3_Id (a PEAR module for reading and writing MP3 tags). I have been using PHP on the command line (Mac OS X Unix shell, to be precise), and am...
5
by: Siegfried Heintze | last post by:
Can someone point me to an example of a little program that emits non-ascii Unicode characters (Russian or Chinese perhaps)? The unicode Russian/Cyrillic alphabet starts at 0x410. Is this possible...
13
by: Liang Chen | last post by:
Hope you all had a nice weekend. I have a question that I hope someone can help me out. I want to run a Python program that uses Tkinter for the user interface (GUI). The program allows me to type...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.