On 7 Nov 2003 07:11:26 -0800,
Au*******@bigfoot.com (Austin) wrote:
I am wondering if anyone knows if there is a way to store string
literals within an XML tag.
Yes, but the definition of "string" has issues with angle brackets.
For instance I would like to store HTML formatting data for an
attribute but it keeps getting picked up as a tag by the XML parser.
There are three ways; namespacing, entity encoding and CDATA sections.
I'd do it by entity encoding.
Namespacing is the easiest and "cleanest" in an XML sense. It's
particularly good for mixing XML elements from multiple schemas. It's
also quite easy to work with from XSLT.
Some people, mainly old SGML hands, have arguments against
namespacing. Try Googling comp.infosystems.
www.authoring.html for
comments from Arjun Ray.
The biggest problem with namespacing is that it requires all
components to be well-formed XML. Fragments must also be balanced
fragments. This is no problem with XHTML, but it's a minor hassle
with HTML and it can be very difficult if you have to accept any HTML
(which can be badly malformed) from other sources.
Entity encoding is how it's done in RSS. You would probably find
looking at RSS useful here. Entities which are awkward as "string"
characters in XML [<, >, &] are represented by their entity
equivalents
Your example would look like this:
<name>John</name>
<prettyName><HTML><BR>John &
Jane</BR></HTML><prettyName>
The main advantage of entity encoding is that it's simple to do,
although it requires some string-handling tools, like regexes. You
can't do this in XSLT (practically) but you can do it easily by
calling JavaScript extensions from within XSLT.
Be careful to track what is encoded and what isn't. You can safely
double-encode HTML (ampersands simply expand to "&amp;") but you
must de-encode it afterwards by the _same_ number of operations.
CDATA sections are perhaps "The SGMLWay", but personally I find entity
encoding easier to work with. You wrap your literal string in a
wrapper that says "This is not XML, just treat it literally"
Your example would look like this:
<name>John</name>
<prettyName><![CDATA[ <HTML><BR>John</BR></HTML>]]><prettyName>
Remember to also replace the sequence "]]>" inside the string with
"]]>]]><![CDATA[ " . You can't "escape" this sequence, but you
can concatenate two CDATA sections around it. It's a rare problem to
encounter, but if you ever handle the content of comp.text.xml through
XML tools, then you're going to meet it !
--
Die Gotterspammerung - Junkmail of the Gods