473,769 Members | 2,348 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Errors parsing Japanese chars

I am trying to use xerces-c SAX parser to parse japanese characters. I
have a <?xml... utf-8> line in the xml file. When the parser
encounters the jap characters it throws a UTFDataFormatEx ception.
I am quite new to xml and I am not sure how to deal with this
situation.
Is there a way to parse the jap characters ? or should the japanese
characters be escaped in the xml file (i.e. &#1234) for this to work.
Jul 20 '05 #1
1 3167
On Tue, Jul 8, Sriv Chakravarthy inscribed on the eternal scroll:
I am trying to use xerces-c SAX parser to parse japanese characters. I
have a <?xml... utf-8> line in the xml file. When the parser
encounters the jap characters it throws a UTFDataFormatEx ception.
Seems to be indicating that the Japanese characters are not in fact
encided in utf-8, then.
I am quite new to xml and I am not sure how to deal with this
situation.
Irrespective of xml or not xml, any text file needs to be accompanied
with information on its encoding if it's to be reliably read. (Modulo
some heuristics which claim to auto-recognise a limited number of
encodings[1]).
Is there a way to parse the jap characters ?
If I've understood what you're reporting, it's not a matter of
_parsing_ them, it's a matter of understanding them in the first
place.
or should the japanese
characters be escaped in the xml file (i.e. &#1234) for this to work.


Not necessarily. And indeed it's a most inefficent way to represent
them if a large quantity of CJK text is involved. But yes, it's
certainly a legal possibility.

Can you view your data (e.g as plain text) in a web browser? (Or if
you haven't got a web browser, try MSIE...) Which character coding
does the browser need to be set to in order to make sense of the
Japanese? (You might try its auto recognition options and if it's
successful, then check to see which encoding it has chosen).

Then, if the encoding is one that's supported by the parser software,
just nominate it on the <?xml... thingy.

hope this helps.

[1] or of course the BOM, if you know for a fact that it's
a unicode encoding that you're dealing with.
Jul 20 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1701
by: Prakash | last post by:
Hi all, I am trying a parse a xml document containing japanese text by constructing a DOMBuilder object. The document created after parsing is empty. If the xml document does not contain japanese characters, the parsing goes thro' properly. Here is the sample document that is causing the problem. <root>
5
14061
by: Aleksandar Matijaca | last post by:
Hi there, I am in some need of help. I am trying to parse using the apache sax parser a file that has vaid UTF-8 characters - I keep end up getting a sun.io.MalformedInputException error. This is my code:
2
1660
by: Robert M. Gary | last post by:
I'm on a Solaris 9 Japanese machine w/ an Ultra 5 Sparc CPU. I'm using Xerces 2.6 DOM I've got a document in UTF-8 format.. <?xml version="1.0" encoding="UTF-8"?> <Name>ja_alert-\343\201\250\343\201\241\343\201\244\343\201\252\343\201\256\343\ 201\253</Name> (I'm not sure if the Japanese came out right here, but everything after ja_alert- is UTF-8 for Japanese).
16
5996
by: Christopher Benson-Manica | last post by:
I'm wondering about the best way to do the following: I have a string delimited by semicolons. The items delimited may be in any of the following formats: 1) 14 alphanum characters 2) 5 alphanums space 8 alphanums 3) 6 alphanums colon 8 alphanums 4) 5 alphanums colon 8 alphanums My task is to convert items in the third format to the first format, and items
4
6168
by: tim | last post by:
Hi there! I am in Japan right now fiddeling with an JP to AD date change program, for this I have constructed one block where the date is inputted, and which decides wether it will go to the AD to JP, or JP to AD change block.. Like so: public string ChangeDate(string tmpDate) { Match
2
3701
by: Joseph | last post by:
Hello. I have this problem. See I have a transformed XML file and I checked its contents prior to outputting it to excel file via responseset. here is the gist of the code: XmlReader reader = myEsiCommand.ExecuteXmlReader(); reader.MoveToContent(); string myCSV = reader.ReadInnerXml(); //Load the xml fragment into the document XmlDocument xmlDataDoc = new XmlDocument();
21
2510
by: Doug Lerner | last post by:
I'm working on a client/server app that seems to work fine in OS Firefox and Windows IE and Firefox. However, in OS X Safari, although the UI/communications themselves work fine, if the characters getting sent back and forth are in Japanese they come back from the server "moji bake" (corrupted). Anybody have any ideas why this might work differently in Safari than in Firefox or IE?
2
1744
by: Victor | last post by:
Hi guys i am facing a real big problem here. I bought a hosting plan and try to build my own website. my website has several language version(chinese english japanese). but i just found out the server in my plan only support the latin based character sets. So my chinese version and japanese version can not be displayed correctly. But i dont want to switch my plan( that means i need to pay much more). Is there anyway i can display chiese and...
13
4514
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple command set consisting of several single letter commands which take no arguments. A few additional single letter commands take arguments:
0
9586
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9423
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10043
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9861
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8869
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7406
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5446
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3956
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3561
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.