JohnAD wrote:
Hello NG,
I am getting some information from DB, and that data has mix html and
XML tags in the content (e.g. detail on country).
Basically CDATA types are mixed with regular string. Also, html tags are
in escape form (e.g. is >). When I display that string I see those
tags.
This sounds like someone has interfered with the file.
Basically I am getting all this data as xml form and I want to find out
how can I change those html tags into regular tags,
What does that mean? Change <p> back into <p>?
and also how to
remove CDATA or any instructions in the string. Is there a quick way to
do that? My problem is increased as I don't know XML.
It sounds like whoever supplied you with the file doesn't know any XML
either.
a) Best move is to ask them for valid (or at least well-formed) XML to
start with. Unless you're working with well-formed data at the very
least, you don't stand much chance of using XML. If you don't know
if what you've got is well-formed or not, install a reliable
standalone XML parser like rxp and use it to test the file[s].
b) To change the escaped pointy brackets back into real ones you'll need
to write and run some non-XML script, but the risk is that they were
escaped for a reason (usually ignorance, sometimes laziness) and that
by putting them back they way they were, you'll break the data model.
By restoring them, you are essentially adding new elements to a file
which wasn't designed to hold them (which is why they were escaped to
begin with). It *is* possible to repair the damage with XSLT, but its
string-handling isn't very sophisticated.
c) CDATA markup is used along with HTML escapement to allow the remains
of the elements to be embedded in XML, in the (usually) forlorn hope
that someone (you) will struggle to restore them at a later stage in
the process. This is often done by people with little understanding
of markup or XML (your supplier). Running the document through any
parsing XML processor will automatically remove the CDATA markup and
pass the content through to whatever the next stage is. However, if
doing so reveals pointy-bracket markup that doesn't fit the document
model (DTD, Schema,...) then the process will halt (as it's supposed
to).
Have a look at
http://xml.silmaril.ie/authors/cdata/ and
http://xml.silmaril.ie/authors/html/
And do try (a) if at all possible: it will make your life, your
supplier's life, and the life of the information very much easier.
///Peter
--
XML FAQ:
http://xml.silmaril.ie/