473,383 Members | 1,717 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

embedding xml in xml as non-xml :)

Hi all,

I have an application that logs in xml.

Assume <xmlLog></xmlLog>. In this element the app logs anything it gets
from foreign hosts. Now if the host sends xml data, the structure of the
document changes. ie. <xmlLog><somTag></somTag></xmlLog>. This will
cause problems with my log reader, because it assumes that <xmlLog/>
contains non-xml data.

My question is, is there a way to treat the data in the <xmlLog/>
element as non xml data. Something I can do that would treat anything
this element contains as a literal?

Any help or suggestions would be greatly appreciated.

Regards,
Mark
Jul 20 '05 #1
5 6568
Mark Van Orman <ma**@icsaccess.com> wrote:
Hi all,

I have an application that logs in xml.

Assume <xmlLog></xmlLog>. In this element the app logs
anything it gets from foreign hosts. Now if the host sends xml
data, the structure of the document changes. ie.
<xmlLog><somTag></somTag></xmlLog>. This will cause problems
with my log reader, because it assumes that <xmlLog/> contains
non-xml data.

My question is, is there a way to treat the data in the
<xmlLog/> element as non xml data. Something I can do that
would treat anything this element contains as a literal?

Any help or suggestions would be greatly appreciated.


Modify your "log reader". If remote can send any ASCII, then why does
log reader assume a particular format? '<somTag></somTag>' is ASCII
string to me.

--
William Park <op**********@yahoo.ca>
Open Geometry Consulting, Toronto, Canada
Jul 20 '05 #2
On Mon, 13 Sep 2004 23:51:39 -0500, Mark Van Orman
<ma**@icsaccess.com> wrote:
In this element the app logs anything it gets from foreign hosts.
Your problem is to map "input" to well-formed character data according
to the rules of
http://www.w3.org/TR/2004/REC-xml11-20040204/#syntax

This is a task as old as computer programming with input files. There
are several rechniques to solve it, broadly by "escaping" or by
"wrapping"
Your example of <xmlLog><somTag></somTag></xmlLog>

is quite easy, and could indeed be stored and read back, then treated
as ASCII.

However a foreign host that sends "<notATag<><>>" will break things,
because
<xmlLog><notATag<><>></xmlLog>
isn't well-formed XML and so parsers will choke on it.
The main problem is to handle the mapping of arbitrary characters into
"character data" (this is a term carefully defined in the XML spec).

The "escaping" way to do this is quite simple, and can be done with a
handful of character substitutions (from the XML spec):

:>The ampersand character (&) and the left angle bracket (<) MUST NOT
:> appear in their literal form, [...] they MUST be escaped using
:> either numeric character references or the strings "&amp;" and "&lt;"
:> respectively. The right angle bracket (>) MAY be represented using
:> the string "&gt;", and MUST, for compatibility, be escaped using
:> either "&gt;" or a character reference when it appears in the string
:> "]]>" in content,

So your example of
<xmlLog><somTag></somTag></xmlLog>
becomes
<xmlLog>&lt;somTag&gt;&lt;/somTag&gt;</xmlLog>
You could also use a "CDATA section", which would be the "wrapping"
approach. This takes the dubious input content and places it between
two markers that say "Between these points is CDATA, not XML markup"

The markers are <![CDATA[ and ]]>

Your example of
<xmlLog><somTag></somTag></xmlLog>
becomes
<xmlLog><![CDATA[<somTag></somTag>]]></xmlLog>

be warned that you'll still need escaping in case the input contains a
copy of the end marker! (read the XML spec, or ask again)

Second problem is to define "input". This is important because in
today's world we're really having to face up to internationalization,
character sets and encodings. It's likely that you can redefine input
from "anything" to "anything that is in UTF-8", which will make your
life easier, but be aware you _have_ made a deliberate choice here.

It's OK to write code that breaks in Japanese - just be aware that
you've done so, and know what would need changing if you needed to
remedy this.
You'll find that RSS has this same problem when embedding HTML content
within it. Some RSS versions handle this better than others, and
there's an excellent overview here
http://diveintomark.org/archives/200...compatible-rss

--
Smert' spamionam
Jul 20 '05 #3
Andy Dingley wrote:

It's OK to write code that breaks in Japanese - just be aware that
you've done so, and know what would need changing if you needed to
remedy this.

Andy,

Why would code break only in Japanese and why is that ok?

Regards,
Kenneth
Jul 20 '05 #4
On Tue, 14 Sep 2004 12:51:49 GMT, Kenneth Stephen
<ma**********************@gmail.com> wrote:
Why would code break only in Japanese and why is that ok?


That's just as an example. Most European-written XML code fails in
CJKV countries (China, Japan, Korea, Vietnam). Most American-written
XML fails in France Just look how many RSS feeds choke when they meet
é, or more usually &eacute; with the entity having been defined.

XML _itself_ (and the major tools) are very good at supporting a wide
range of character sets and encodings, but there are rules you have to
follow. For most _applications_, coders don't bother to do this. If
you _know_ your app will never receive something outside ASCII, then
that's all you need - but you should still be aware of what you've
built.

--
Smert' spamionam
Jul 20 '05 #5
In article <3f********************************@4ax.com>,
Andy Dingley <di*****@codesmiths.com> wrote:

[...]

% The markers are <![CDATA[ and ]]>
%
% Your example of
% <xmlLog><somTag></somTag></xmlLog>
% becomes
% <xmlLog><![CDATA[<somTag></somTag>]]></xmlLog>
%
% be warned that you'll still need escaping in case the input contains a
% copy of the end marker! (read the XML spec, or ask again)

You don't need escaping so much as you need to end and restart the
CDATA section

<xmlLog><![CDATA[<somTag><![CDATA[with a CDATA section]]>]]><![CDATA[</somTag>]]></xmlLog>

The first ]]> ends the first CDATA section. The second is data.
--

Patrick TJ McPhee
East York Canada
pt**@interlog.com
Jul 20 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Alicia Haumann | last post by:
I accidentally sent this to webmaster@python.org, so this could be a duplicate if "webmaster" forwards it to this list. :{ Hi, there. Thanks for any help that can be offered. I've been...
2
by: Roose | last post by:
With some googling I have found these resources: http://docs.python.org/ext/win-dlls.html http://www.python.org/doc/faq/windows.html I have a large Win32/MFC/C/C++ application that has an...
1
by: Craig Ringer | last post by:
Hi folks I'm a bit of a newbie here, though I've tried to appropriately research this issue before posting. I've found a lot of questions, a few answers that don't really answer quite what I'm...
0
by: Peter Jakubik | last post by:
Hi I am embedding Python 2.3.3 in C++ under Win2k. I am using in my App only Python DLL and empty site.py (so that User doesn't have to install Python). The Python Scripts contains Non-ASCII...
5
by: John | last post by:
I have a number of self-composed songs in mp3 format which are linked to their individual titles with <a href... The problem with this coding is that the songs open in a new window containing the...
3
by: Gérard Talbot | last post by:
Hello all, When webfonts are used for purely cosmetic/ornemental reasons and on a large scale, I don't agree. When webfonts are used because Unicode support among browsers for a particular...
4
by: Mok-Kong Shen | last post by:
Apology, if this is OT. (Please kindly refer me eventually to another group.) With VC++ I can embed asm statements in a function e.g. as follows: __asm { ........ je lab
6
by: Qun Cao | last post by:
Hi Everyone, I am a beginner on cross language development. My problem at hand is to build a python interface for a C++ application built on top of a 3D game engine. The purpose of this python...
6
by: mistabean | last post by:
Hello, first of all, I am a programming newbie, especially in python... Onwards to the problem, I have been having difficulty embedding a python module into my C/C++ program. (just a test...
3
by: dmoore | last post by:
Hi Folks: I have a question about the use of static members in Python/C extensions. Take the simple example from the "Extending and Embedding the Python Interpreter" docs: A simple module...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.