469,643 Members | 2,003 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,643 developers. It's quick & easy.

Question about using SHIFT-JIS encoding with libxml2

Hi,

I am using libxml2 for xml parsing. When the client application sends
data to libxml2 in UTF-8 format, it works fine.

But, I have a scenarion in which the client application sends data to
libxml2 parser in SHIFT-JIS format.

The following error is thrown by libxml2 -

"Parsing error in results: Input is not proper UTF-8, indicate
encoding !

In libxml2 documentation at http://www.xmlsoft.org/encoding.html I
read that libxml2 can support any encoding by calling the
xmlSwitchEncoding() routine.
What do I have to do to make libxml2 support SHIFT-JIS format? I want
to continue supporting UTF-8 also.
Thanks,
Saumya

Apr 10 '07 #1
6 4337
sa************@gmail.com wrote:
But, I have a scenarion in which the client application sends data to
libxml2 parser in SHIFT-JIS format.

The following error is thrown by libxml2 -

"Parsing error in results: Input is not proper UTF-8, indicate
encoding !
Does the XML contain an XML declaration indicating the encoding e.g.
<?xml version="1.0" encoding="SHIFT-JIS"?>

--

Martin Honnen
http://JavaScript.FAQTs.com/
Apr 10 '07 #2
Does the XML contain an XML declaration indicating the encoding e.g.
<?xml version="1.0" encoding="SHIFT-JIS"?>
Yes, it does. I thought that should that be enough to tell the libxml2
parser that the encoding format is SHIFT-JIS.
Does libxml2 support SHIFT-JIS encoding ? I want to keep the support
for UTF-8 intact too. Is it possible?
Does libxml2 convert SHIFT-JIS to UTF-8 internally if it is supplied
in XML declaration as above?

Thanks,
Saumya

On Apr 10, 7:20 pm, Martin Honnen <ma*******@yahoo.dewrote:
sa************@gmail.com wrote:
But, I have a scenarion in which the client application sends data to
libxml2 parser in SHIFT-JIS format.
The following error is thrown by libxml2 -
"Parsing error in results: Input is not proper UTF-8, indicate
encoding !

Does the XML contain an XML declaration indicating the encoding e.g.
<?xml version="1.0" encoding="SHIFT-JIS"?>

--

Martin Honnen
http://JavaScript.FAQTs.com/

Apr 11 '07 #3
On Tue, 10 Apr 2007 22:13:25 -0700, sa************@gmail.com scripst:
Yes, it does. I thought that should that be enough to tell the libxml2
parser that the encoding format is SHIFT-JIS. Does libxml2 support
SHIFT-JIS encoding ? I want to keep the support for UTF-8 intact too. Is
it possible? Does libxml2 convert SHIFT-JIS to UTF-8 internally if it is
supplied in XML declaration as above?
This looks promising (and yes, do read both referenced tutorials)
http://xmlsoft.org/encoding.html

Matej
Apr 11 '07 #4
sa************@gmail.com wrote:
Does libxml2 support SHIFT-JIS encoding ?
I don't know offhand. Find its documentation?
Does libxml2 convert SHIFT-JIS to UTF-8 internally if it is supplied
in XML declaration as above?
Most Java-based XML processors actually convert to UTF-16 internally,
since that's a native character representation in Java. I don't know
what libxml2 is using, but I would expect they're doing something
similar -- convert to some standardized internal form, process that,
then convert back. Some tools have tried to avoid the double conversion
when data is being passed straight through, but recognizing and taking
advantage of that optimization is not easy.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Apr 11 '07 #5
On Apr 11, 7:13 am, "saumya.agar...@gmail.com"
<saumya.agar...@gmail.comwrote:
Does libxml2 support SHIFT-JIS encoding ? I want to keep the support
for UTF-8 intact too. Is it possible?
For what it's worth, the source code contains the following (in
version 2.6.27):

case XML_CHAR_ENCODING_2022_JP:
__xmlErrEncoding(ctxt, XML_ERR_UNSUPPORTED_ENCODING,
"encoding not supported %s\n",
BAD_CAST "ISO-2022-JP", NULL);
break;
case XML_CHAR_ENCODING_SHIFT_JIS:
__xmlErrEncoding(ctxt, XML_ERR_UNSUPPORTED_ENCODING,
"encoding not supported %s\n",
BAD_CAST "Shift_JIS", NULL);
break;
case XML_CHAR_ENCODING_EUC_JP:
__xmlErrEncoding(ctxt, XML_ERR_UNSUPPORTED_ENCODING,
"encoding not supported %s\n",
BAD_CAST "EUC-JP", NULL);
break;

Apr 12 '07 #6
"Arndt Jonasson" <ar************@gmail.comwrites:
For what it's worth, the source code contains the following (in
version 2.6.27):
However, according to the webpage (link to which I sent to this
thread) libxml can use iconv and all its supported codepages
(i.e., whatever you have even dreamed about).

Matej
Apr 12 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

11 posts views Thread by William Stacey | last post: by
8 posts views Thread by John Salerno | last post: by
13 posts views Thread by HARDCORECODER | last post: by
7 posts views Thread by gokkog | last post: by
15 posts views Thread by Christopher Layne | last post: by
1 post views Thread by pitjpz | last post: by
1 post views Thread by matthewroth | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.