472,328 Members | 1,383 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,328 software developers and data experts.

Errors parsing Japanese chars

I am trying to use xerces-c SAX parser to parse japanese characters. I
have a <?xml... utf-8> line in the xml file. When the parser
encounters the jap characters it throws a UTFDataFormatException.
I am quite new to xml and I am not sure how to deal with this
situation.
Is there a way to parse the jap characters ? or should the japanese
characters be escaped in the xml file (i.e. &#1234) for this to work.
Jul 20 '05 #1
1 3059
On Tue, Jul 8, Sriv Chakravarthy inscribed on the eternal scroll:
I am trying to use xerces-c SAX parser to parse japanese characters. I
have a <?xml... utf-8> line in the xml file. When the parser
encounters the jap characters it throws a UTFDataFormatException.
Seems to be indicating that the Japanese characters are not in fact
encided in utf-8, then.
I am quite new to xml and I am not sure how to deal with this
situation.
Irrespective of xml or not xml, any text file needs to be accompanied
with information on its encoding if it's to be reliably read. (Modulo
some heuristics which claim to auto-recognise a limited number of
encodings[1]).
Is there a way to parse the jap characters ?
If I've understood what you're reporting, it's not a matter of
_parsing_ them, it's a matter of understanding them in the first
place.
or should the japanese
characters be escaped in the xml file (i.e. &#1234) for this to work.


Not necessarily. And indeed it's a most inefficent way to represent
them if a large quantity of CJK text is involved. But yes, it's
certainly a legal possibility.

Can you view your data (e.g as plain text) in a web browser? (Or if
you haven't got a web browser, try MSIE...) Which character coding
does the browser need to be set to in order to make sense of the
Japanese? (You might try its auto recognition options and if it's
successful, then check to see which encoding it has chosen).

Then, if the encoding is one that's supported by the parser software,
just nominate it on the <?xml... thingy.

hope this helps.

[1] or of course the BOM, if you know for a fact that it's
a unicode encoding that you're dealing with.
Jul 20 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Prakash | last post by:
Hi all, I am trying a parse a xml document containing japanese text by constructing a DOMBuilder object. The document created after parsing is...
5
by: Aleksandar Matijaca | last post by:
Hi there, I am in some need of help. I am trying to parse using the apache sax parser a file that has vaid UTF-8 characters - I keep end up...
2
by: Robert M. Gary | last post by:
I'm on a Solaris 9 Japanese machine w/ an Ultra 5 Sparc CPU. I'm using Xerces 2.6 DOM I've got a document in UTF-8 format.. <?xml version="1.0"...
16
by: Christopher Benson-Manica | last post by:
I'm wondering about the best way to do the following: I have a string delimited by semicolons. The items delimited may be in any of the...
4
by: tim | last post by:
Hi there! I am in Japan right now fiddeling with an JP to AD date change program, for this I have constructed one block where the date is...
2
by: Joseph | last post by:
Hello. I have this problem. See I have a transformed XML file and I checked its contents prior to outputting it to excel file via responseset. here...
21
by: Doug Lerner | last post by:
I'm working on a client/server app that seems to work fine in OS Firefox and Windows IE and Firefox. However, in OS X Safari, although the...
2
by: Victor | last post by:
Hi guys i am facing a real big problem here. I bought a hosting plan and try to build my own website. my website has several language...
13
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to...
0
by: concettolabs | last post by:
In today's business world, businesses are increasingly turning to PowerApps to develop custom business applications. PowerApps is a powerful tool...
0
better678
by: better678 | last post by:
Question: Discuss your understanding of the Java platform. Is the statement "Java is interpreted" correct? Answer: Java is an object-oriented...
0
by: teenabhardwaj | last post by:
How would one discover a valid source for learning news, comfort, and help for engineering designs? Covering through piles of books takes a lot of...
0
by: CD Tom | last post by:
This happens in runtime 2013 and 2016. When a report is run and then closed a toolbar shows up and the only way to get it to go away is to right...
0
by: CD Tom | last post by:
This only shows up in access runtime. When a user select a report from my report menu when they close the report they get a menu I've called Add-ins...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
1
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.