By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,963 Members | 1,834 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,963 IT Pros & Developers. It's quick & easy.

XPath and CDATA

P: n/a
Hi,

Is it possible to retrieve CDATA Sections from an XML document? If so,
could someone give me a syntax example?

Cheers

Aidy

Mar 17 '06 #1
Share this Question
Share on Google+
15 Replies


P: n/a
aidy wrote:
Is it possible to retrieve CDATA Sections from an XML document?


Yes, just parse it up as a textfile with Perl, or somesuch. This is
probably not what you want.

If you want to work with XML though, you'll probably work through some
tool with a DOM interface. You can't see a CDATA section through this
because there just _isn't_ one. CDATA is not part of XML-Infoset, it's
solely an artefact of the particular serialisation of that instance of
that file.
http://www.w3.org/TR/2004/REC-xml-in...40204/#omitted

<a>foo</a>
and
<a><![CDATA[foo]]></a>
are not only "indistinguishable" when viewed through the DOM, they are
absolutely _the_same_thing_. Either of them is an equally valid
serialisation of the same underlying XML content.

<a><![CDATA[<foo>]]></a> can of course be analysed by looking at its
text and you could set a flag for
"some_encooding_maybe_a_cdata_is_needed", but that's a question of
writing, not reading.
It's fundamental to XML (or at least to good XML design) that you can't
see CDATA and similar issues, and you don't care about them either.
_Use_ the tools, don't fight them. Transparency is good, you don't care
about whether there's a CDATA in there or not. Your app works equally
well either way and doesn't need to know. If it does, then you're doing
something badly wrong.

Mar 17 '06 #2

P: n/a
The XPath data model considers CDATA to be just an alternative markup of
text, so no, you can't distinguish CDATA sections from any other form of
text. You can, of course, retrieve their text value...

--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden
Mar 17 '06 #3

P: n/a


aidy wrote:
Is it possible to retrieve CDATA Sections from an XML document? If so,
could someone give me a syntax example?


The XPath data model does not distinguish between normal text nodes and
CDATA section nodes the way the W3C DOM does, in the XPath 1.0 data
model there are only text nodes which you select with
text()
So for XPath it does not matter whether you have e.g.
<element>Kibo &amp; Xibo</element>
or
<element><![CDATA[Kibo & Xibo]]></element>
you would use e.g.
/element/text()
to select the text node with XPath and its string value is
Kibo & Xibo

--

Martin Honnen
http://JavaScript.FAQTs.com/
Mar 17 '06 #4

P: n/a
aidy wrote:
Hi,

Is it possible to retrieve CDATA Sections from an XML document? If so,
could someone give me a syntax example?


See http://xml.silmaril.ie/authors/cdata/

If you're using XML software, the answer is no. The CDATA markup
simply prevents its contents being parsed for more markup: the
result is passed through to the processor as text, untouched. So
an XML application never sees the CDATA markup and is unaware that
it ever existed.

If you want to get at it via a non-XML method, write a script in
your favourite language to do so. Warning: this is non-trivial.

It would help if you could tell us why you want to do this. There
may be another way around the problem.

///Peter
--
XML FAQ: http://xml.silmaril.ie/
Mar 17 '06 #5

P: n/a
Andy Dingley <di*****@codesmiths.com> wrote:
aidy wrote:

Is it possible to retrieve CDATA Sections from an XML document?

Yes, just parse it up as a textfile with Perl, or somesuch. This is
probably not what you want.

If you want to work with XML though, you'll probably work through some
tool with a DOM interface. You can't see a CDATA section through this
because there just _isn't_ one.


CDATAs are in the DOM
http://java.sun.com/j2se/1.4.2/docs/...TASection.html

but CDATAs are not part of the data model, so -as Andy explained- it is
useless to deal with them ; what your application need is to retrieve
some text content
high level tools such as XPath don't allow you to handle CDATAs,
everything you'll find is some text : sibling text nodes (for example a
mix of CDATAs and non-CDATA texts) will be supplied as a single text item

CDATAs are just a convenient way to escape characters in XML, but an
application don't care how the text content had been written

CDATA is not part of XML-Infoset, it's solely an artefact of the particular serialisation of that instance of
that file.
http://www.w3.org/TR/2004/REC-xml-in...40204/#omitted

<a>foo</a>
and
<a><![CDATA[foo]]></a>
are not only "indistinguishable" when viewed through the DOM, they are
absolutely _the_same_thing_. Either of them is an equally valid
serialisation of the same underlying XML content.

<a><![CDATA[<foo>]]></a> can of course be analysed by looking at its
text and you could set a flag for
"some_encooding_maybe_a_cdata_is_needed", but that's a question of
writing, not reading.
It's fundamental to XML (or at least to good XML design) that you can't
see CDATA and similar issues, and you don't care about them either.
_Use_ the tools, don't fight them. Transparency is good, you don't care
about whether there's a CDATA in there or not. Your app works equally
well either way and doesn't need to know. If it does, then you're doing
something badly wrong.

--
Cordialement,

///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
-----------------------------
http://reflex.gforge.inria.fr/
Have the RefleX !
Mar 17 '06 #6

P: n/a
Hi,

<SNIP>

<?xml version="1.0" ?>
- <SAFS_LOG>
<LOG_OPENED date="16-03-2006" time="15:50:19" />
<LOG_VERSION major="1" minor="1" />
- <LOG_MESSAGE type="GENERIC" date="16-03-2006" time="15:50:19">
- <MESSAGE_TEXT>
- <![CDATA[ getTrimmedField result 'TESTID_10' assigned to variable
'^TestData'.]]>
</MESSAGE_TEXT>
</LOG_MESSAGE>
- <LOG_MESSAGE type="GENERIC" date="16-03-2006" time="15:50:20">
- <MESSAGE_TEXT>
- <![CDATA[ TESTID_10]]>
</MESSAGE_TEXT>
</LOG_MESSAGE>

<SNIP>
This is a snippet of the XML document. What I am struggling with is, is
to retrieve through XPath all the CDATA where the text includes
'TESTID' (third line from the bottom).

So in the HTML I will be producing I will have

TESTID

TestID_10
TestID_20
TestID_30

is it possible to conditionally extract text?

Aidy

Mar 24 '06 #7

P: n/a
As far as XPath is concerned CDATA is just text. So write an XPath which
searches for text nodes (or, probably more accurately for your case,
<MESSAGE_TEXT> elements) whose text value contains "TESTID".

--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden
Mar 24 '06 #8

P: n/a
> What I am struggling with is, is
to retrieve through XPath all the CDATA where the text includes
'TESTID' (third line from the bottom).


Try this
select="//MESSAGE_TEXT [contains (translate (text(),'TESID_-
','tesid'), 'TESTID')]"
(It might work. But it's Friday, so don't be surprised if it's buggy)

Mar 24 '06 #9

P: n/a
aidy wrote:
Hi,

<SNIP>

<?xml version="1.0" ?>
- <SAFS_LOG>
<LOG_OPENED date="16-03-2006" time="15:50:19" />
<LOG_VERSION major="1" minor="1" />
- <LOG_MESSAGE type="GENERIC" date="16-03-2006" time="15:50:19">
- <MESSAGE_TEXT>
- <![CDATA[ getTrimmedField result 'TESTID_10' assigned to variable
'^TestData'.]]>
</MESSAGE_TEXT>
</LOG_MESSAGE>
- <LOG_MESSAGE type="GENERIC" date="16-03-2006" time="15:50:20">
- <MESSAGE_TEXT>
- <![CDATA[ TESTID_10]]>
</MESSAGE_TEXT>
</LOG_MESSAGE>

<SNIP>
This is a snippet of the XML document. What I am struggling with is, is
to retrieve through XPath all the CDATA where the text includes
'TESTID' (third line from the bottom).

So in the HTML I will be producing I will have

TESTID

TestID_10
TestID_20
TestID_30

is it possible to conditionally extract text?


Yes, but by the time your application receives the information from
the parser, there won't be any evidence of there having been CDATA
markup, so what you really mean is "can I conditionally extract the
text in MESSAGE_TEXT elements?"

<xsl:template match="MESSAGE_TEXT[contains(.,'TESTID')]">
<xsl:value-of select="."/>
</xsl:template>

///Peter
--
XML FAQ: http://xml.silmaril.ie/
Mar 25 '06 #10

P: n/a
Below is the XSL I have got. I am trying to assign the value of the
MESSAGE_TEXT node to a variable, then do a contains to gather whether
that text has a substring of 'TESTID'. Then I wanna write that value to
the HTML. However when I run the transformation, I receive this error:
'A string literal was expected, but no opening quote character was
found. <xsl:if test=contains("$host1","TESTID")', so at the moment I
am a bit stuck.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
<HTML>
<xsl:for-each select="SAFS_LOG/LOG_MESSAGE">
<P>
<!-- <xsl:value-of select="MESSAGE_TEXT"/> -->

<xsl:variable name="host1" select="MESSAGE_TEXT"/>
<xsl:if test=contains("$host1","TESTID")
<xsl:value-of select="$host1"/>
</P>
</xsl:if>
</xsl:for-each>
</HTML>
</xsl:template>
</xsl:stylesheet>

Cheers

Aidy

Mar 25 '06 #11

P: n/a
That sounds like you're trying to style HTML. HTML is not XML -- it's
SGML -- and permits some things that XML doesn't, such as attribute
values without quotes around them.

If that's what you're trying to do, you need to get a parser that can
read HTML (I believe the W3C's "tidy" tool can be persuaded to do this
for you, or try the NekoHTML parser based on the Apache Xerces system)
and do a bit of simple API programming to use that to feed the document
to the stylesheet.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Mar 25 '06 #12

P: n/a
aidy wrote:
Below is the XSL I have got. I am trying to assign the value of the
MESSAGE_TEXT node to a variable, then do a contains to gather whether
that text has a substring of 'TESTID'. Then I wanna write that value to
the HTML. However when I run the transformation, I receive this error:
'A string literal was expected, but no opening quote character was
found. <xsl:if test=contains("$host1","TESTID")', so at the moment I
am a bit stuck.


I think you need to remove the quotes around "$host1".

///Peter
Mar 26 '06 #13

P: n/a
Hi,

This is a snippet of the XML

<SAFS_LOG>
<LOG_MESSAGE type="GENERIC" date="27-03-2006" time="10:56:06" >
<MESSAGE_TEXT><![CDATA[.TESTID_10]]></MESSAGE_TEXT>
</LOG_MESSAGE>
<LOG_MESSAGE type="FAILED" date="27-03-2006" time="10:56:08" >
<MESSAGE_TEXT><![CDATA[Country Code: GB <> AU </MESSAGE_TEXT>
</LOG_MESSAGE>
<LOG_MESSAGE type="GENERIC" date="27-03-2006" time="10:56:10" >
<MESSAGE_TEXT><![CDATA[.TESTID_20]]></MESSAGE_TEXT>
</LOG_MESSAGE>
<LOG_MESSAGE type="PASSED" date="27-03-2006" time="10:56:13" >
<MESSAGE_TEXT><![CDATA[AddressServiceWin DOES EXIST as
expected.]]></MESSAGE_TEXT>
</LOG_MESSAGE>

</SAFS_LOG>

I have managed to extract the TESTID's from the MESSAGE_TEXT by using
this xsl

<xsl:template match="/">
<HTML>
<head><title>Address Service Test Log </title></head>

<body>
<h2>Test Summary</h2>
<tr><th><B> TEST ID</B></th></tr>
<xsl:for-each select="SAFS_LOG/LOG_MESSAGE">
<xsl:variable name="host1" select="MESSAGE_TEXT"/>
<xsl:if test="(contains($host1,'.TESTID'))">
<table border="1">
<td> <xsl:value-of select="$host1"/> </td>
</table>
</xsl:if>
</xsl:for-each>
</body>
</HTML>
</xsl:template>
</xsl:stylesheet>

In the HTML I get something like this

Test Summary
TEST ID

..TESTID_10
..TESTID_20

Now I want to extract from the xml whether these tests have passed or
failed - as we can see above we have got a 'FAILED' on the TESTID_10
and a 'PASSED' on TESTID_20

The code I have added is enclosed in asterisks

<xsl:for-each select="SAFS_LOG/LOG_MESSAGE">
<xsl:variable name="host1" select="MESSAGE_TEXT"/>
<xsl:if test="(contains($host1,'.TESTID'))">
<table border="1">
<td> <xsl:value-of select="$host1"/> </td>

************************************************** ************
<xsl:variable name="host2" select="@type"/>
<xsl:if test="(contains($host2,'FAILED'))">
<td> <xsl:value-of select="MESSAGE_TEXT"/> </td>
</xsl:if>
************************************************** ************

</table>
</xsl:if>
</xsl:for-each>

I don't seem to be returning any 'FAILED' even though they are in
<LOG_MESSAGE>. Does anyone know why?

Cheers

Aidy

Mar 27 '06 #14

P: n/a
I need to extract a data from CDATA section of XML.That is not an issue
for me as what CDATA secion contains.I just need that section only as a
string or file or whatever. Pls help.


*** Sent via Developersdex http://www.developersdex.com ***
Mar 31 '06 #15

P: n/a
In article <er**************@news.uswest.net>,
Rajesh Kochhar <ra************@abnamro.com> wrote:
I need to extract a data from CDATA section of XML.That is not an issue
for me as what CDATA secion contains.I just need that section only as a
string or file or whatever. Pls help.


You can't extract a CDATA section with XPath. You will have to specify
the characters you want by some other means, such as the element they
are contained in.

-- Richard
Mar 31 '06 #16

This discussion thread is closed

Replies have been disabled for this discussion.