473,395 Members | 1,474 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Xerces and .NET System.Xml

Hi everybody:

I do not know if this is the correct list to a very specific
implementation problem, but if you can help me, it would be great! :)

I have one application that builds a Xml that contains some strange
characters:

std::string str = "Code = ";
str += '♦'; //strange character ASCII 4

and I serialize the Xml using Xerces and Xerces writes something like
(no matter the encoding I am using; I tried iso-8859-1; utf-8; utf-16,
etc.)

<XmlTest>
Code = ♦
</XmlTest>
But when I want to load this Xml using the Microsoft .NET
System.Xml.XmlDocument, I get an:

"Invalid character found" exception and the XML cannot be loaded.

What is wrong here? If I try to serialize the same String using the MS
implementation, I get a:

<XmlTest>
Code = &x4;
</XmlTest>

Jun 27 '08 #1
5 3017
Ernesto Bascón Pantoja wrote:
I have one application that builds a Xml that contains some strange
characters:

std::string str = "Code = ";
str += '♦'; //strange character ASCII 4
ASCII 4 is not allowed in XML 1.0 and only allowed as numeric character
reference in XML 1.1 I think.
Why do you want to put such characters into your XML documents?

What is wrong here? If I try to serialize the same String using the MS
implementation, I get a:

<XmlTest>
Code = &x4;
</XmlTest>
..NET is not necessarily complying with the XML 1.0 specification, it
allows you to serialize such characters as numeric character references.
You can turn that off by using an XmlWriter with XmlWriterSettings where
CheckCharacters is set to true.
--

Martin Honnen
http://JavaScript.FAQTs.com/
Jun 27 '08 #2
On Apr 22, 1:29 pm, Martin Honnen <mahotr...@yahoo.dewrote:
Ernesto Bascón Pantoja wrote:
I have one application that builds a Xml that contains some strange
characters:
std::string str = "Code = ";
str += '♦'; //strange character ASCII 4

ASCII 4 is not allowed in XML 1.0 and only allowed as numeric character
reference in XML 1.1 I think.
Why do you want to put such characters into your XML documents?
I am getting clear text from a database and I serialize it into a XML
to allow a .NET client to receive such information;
the problem occurs when the "clear text" comes with those characters
or with international characters. Xerces performs the serialization
but does not transform the '♦' or the 'ß' in 'Straße' and serializes
them as they come.

I do not know if written directy those characters with utf-8 encoding
is valid.
Jun 27 '08 #3
ASCII 4 is not a legal XML 1.0 character, no matter what your encoding
is or how you try to escape it. I'd suggest introducing something like
<mychar codepoint="4"/and having your application code convert this
appropriately. Or do a base-64 encoding on your block of binary code and
have the application convert that appropriately.

XML 1.1 relaxes restrictions on characters somewhat. I'm not sure
offhand whether it would let you get away with this one or not. But
support for 1.1 is, alas, still extremely rare; you may have to beat up
your XML library suppliers to get it, and having gotten it you may have
trouble interchanging those files with other applications or users that
haven't yet upgraded.
Jun 27 '08 #4
On Apr 22, 1:54 pm, "Joseph J. Kesselman" <keshlam-nos...@comcast.net>
wrote:
ASCII 4 is not a legal XML 1.0 character, no matter what your encoding
is or how you try to escape it. I'd suggest introducing something like
<mychar codepoint="4"/and having your application code convert this
appropriately. Or do a base-64 encoding on your block of binary code and
have the application convert that appropriately.
So, how can I say to Xerces: "given this string, transcode the special
characters to their Unicode escape sequence (i.e. &4;)

XML 1.1 relaxes restrictions on characters somewhat. I'm not sure
offhand whether it would let you get away with this one or not. But
support for 1.1 is, alas, still extremely rare; you may have to beat up
your XML library suppliers to get it, and having gotten it you may have
trouble interchanging those files with other applications or users that
haven't yet upgraded.
Jun 27 '08 #5
Ernesto Bascón Pantoja wrote:
So, how can I say to Xerces: "given this string, transcode the special
characters to their Unicode escape sequence (i.e. &4;)
Xerces deals with XML. The 0x04 character is not XML. (Or at least not
XML 1.0), so it isn't Xerces' responsibility to deal with it.

If you must represent this character in data that's expressed as XML,
it's your application's responsibility to use some alternate escaping
solution (such as the element I suggested, or base-64 encoding, or
whatever).

If you really want this character to appear as itself in the file...
that isn't an XML file and you can't expect XML tools to either accept
it or generate it.

Take a long step back from this detail and look at the the actual
problem you're trying to solve. You haven't told us that, so we can't
say more than that the specific solution you've proposed here doesn't work.
Jun 27 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: BODIN | last post by:
I have an XML document, which is actually physically stored in 2 separated files. doc1.xml and doc2.xml I NEED a WAY FOR THIS SIMPLE NEED : XML file splitted into two physical files, and use...
2
by: Olaf Meyer | last post by:
Apprentently xerces 2.6.0 (Java) does not validate against contraints specified in the schema (e.g. constraints specified via unique element). The validation works with the XML editor I'm using...
4
by: joes | last post by:
Hello there I tried for several days to get a simple validation with xml schema & xerces working. Goal for me is tuse JAXP and not specific Xerces classes. I don't get the point what I am doing...
0
by: Jim Phelps | last post by:
After having memory leak issues with Xerces-c 2.3.0 for Solaris 2.7 for CC 6.2 I have decided to update to at least 2.4. I have downloaded the binary tarball and have installed it on my...
4
by: SL | last post by:
Hi, Im' using Xerces-j (version 2.0.1 and 2.6.2). When parsing this prolog : <!DOCTYPE teiCorpus PUBLIC "-//TEI Consortium//DTD TEI P4//EN" "d:/Program Files/tei-emacs/sgml/dtds/tei/tei2.dtd"...
2
by: Cigdem | last post by:
Hello, I am trying to parse the XML files that the user selects(XML files are on anoher OS400 system called "wkdis3"). But i am permenantly getting that error: Directory0: \\wkdis3\ROOT\home...
7
by: Georg J. Stach | last post by:
Hi, as mentioned above I'd like to validate a simple XML-document with a simple DTD. For this, I use Java and Xerces. But, when I have tags of this form: <tag>some characters in here</tag> ...
4
by: cgparis | last post by:
Dear forum members, I am trying to compile C++ code under MS Visual Studio .NET 2003, which references the latest Xerces C++ release library (2.6.0). This Xerces release was made available...
3
by: Dhirendra Singh | last post by:
I am new to xml parsing concept. can anyone suggest good books on Xerces C++ parsers. API documentation provided by apache is very raw and i do not find it very useful.
2
by: Boris Kolpackov | last post by:
Hi, I am pleased to announce the availability of Apache Xerces-C++ 3.0.0. Xerces-C++ is an open-source validating XML parser written in a portable subset of C++. It provides DOM (level 1, 2, and...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.