By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,106 Members | 2,474 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,106 IT Pros & Developers. It's quick & easy.

Using XSLT to Filter out elements with CDATA matching a regular expression

P: n/a
Hi,

I'm new to XSLT and I'm having a hard time figuring out whether XSLT
will do what I need it to do.

I have a XML file with a whole bunch of <message> elements. These
<message> elements have <![CDATA[...]]> in them. I would like to use
XSLT to remove the <message> elements whose CDATA (the "...") matches
a particular regular expression.

Could anyone help me determine if XSLT can do this? If it can, could
someone at least point me in the right direction so I can write the
transform?

Thanks!

--Edwin G. Castro
Jul 20 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Edwin G. Castro wrote:
Hi,

I'm new to XSLT and I'm having a hard time figuring out whether XSLT
will do what I need it to do.

I have a XML file with a whole bunch of <message> elements. These
<message> elements have <![CDATA[...]]> in them. I would like to use
XSLT to remove the <message> elements whose CDATA (the "...") matches
a particular regular expression.


Edwin,

The trick is to match on whatever text the CDATA would evaluate to. For
example, here is a testcase I wrote.

input :

<?xml version="1.0"?>
<root>
<a>some data</a>
<a><![CDATA[Some CDATA]]></a>
</root>

XSL :

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:value-of select="//text()[contains(.,'CDATA')]" />
</xsl:template>

</xsl:stylesheet>

....and when XSL is run on input, I get :

<?xml version="1.0" encoding="UTF-8"?>
Some CDATA
Regards,
Kenneth
Jul 20 '05 #2

P: n/a
In article <d9*************************@posting.google.com> ,
Edwin G. Castro <ec*****@hp.com> wrote:

% I have a XML file with a whole bunch of <message> elements. These
% <message> elements have <![CDATA[...]]> in them. I would like to use
% XSLT to remove the <message> elements whose CDATA (the "...") matches
% a particular regular expression.

XSLT can't detect the CDATA sections as such. They're just (parts of) text
nodes so far as XPath and XSLT are concerned. Assuming that's not
a problem, you run into the problem that XSLT 1.0 doesn't provide any
regular expression support.

When you run into a basic limitation like that, scoot over to
http://exslt.org and see what's being proposed there. In this
case, you'll find a little number they like to call
http://exslt.org/regular-expressions, which can be used like this

<xsl:stylesheet xmlns:xsl = 'http://www.w3.org/1999/XSL/Transform'
version = '1.0'
xmlns:re = 'http://exslt.org/regular-expressions'>
<xsl:template match='node()|@*'>
<xsl:copy>
<xsl:apply-templates select='node()|@*'/>
</xsl:copy>
</xsl:template>

<!-- ignore messages dated in the first two months of the year -->
<xsl:template match = 'message[re:test(string(.),
"(Janvier|Fevrier) 200[34]")]'/>
</xsl:stylesheet>

This interface is marked experimental, so it might not work with your
XSLT processor, even if it supports the EXSLT extensions. Also, some
processors require you to have extension-element-prefixes='re' on
the stylesheet element, but I believe they shouldn't.

If that doesn't work for you, you could see if there are other
extension functions or language interfaces available to you. On
the other hand, if all you want to do is drop out elements whose
content matches an RE, this may be a job for another language.
--

Patrick TJ McPhee
East York Canada
pt**@interlog.com
Jul 20 '05 #3

P: n/a
> If that doesn't work for you, you could see if there are other
extension functions or language interfaces available to you. On
the other hand, if all you want to do is drop out elements whose
content matches an RE, this may be a job for another language.


Sounds like my current solution in using C# to process the file is a
much better option. The processor I'm using (via NAnt) is the one
provided by .NET so I'm sure it doesn't support the extensions.

On the other hand, going through this thought process has provided
some better ideas on how to organize my code. Thanks for the
responses.

--Edwin
Jul 20 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.