By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,454 Members | 3,191 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,454 IT Pros & Developers. It's quick & easy.

XSL problem

P: n/a
Hi,

I'm stuck with an XSL problem - can anyone give me any hints?

I have some XML with nested formatting tags like this:

<text>
this is plain
<bold>
this is bold
<italic>
this is bold-italic
</italic>
</bold>
this is plain
</text>

which I need to 'flatten out' into something like this:

<text>this is plain</text>
<text bold="true">this is bold</text>
<text bold="true" italic="true">this is bold-italic</text>
<text>this is plain</text>

It doesn't have to work with any arbitrary tags - there are only a few
possible ones - but I'm not sure how to "remember" the outer level
formatting nodes when processing the text inside. It seems to be
crying out for some kind of state variable

Andy
Jul 20 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a


Andy Fish wrote:

I'm stuck with an XSL problem - can anyone give me any hints?

I have some XML with nested formatting tags like this:

<text>
this is plain
<bold>
this is bold
<italic>
this is bold-italic
</italic>
</bold>
this is plain
</text>

which I need to 'flatten out' into something like this:

<text>this is plain</text>
<text bold="true">this is bold</text>
<text bold="true" italic="true">this is bold-italic</text>
<text>this is plain</text>

It doesn't have to work with any arbitrary tags - there are only a few
possible ones - but I'm not sure how to "remember" the outer level
formatting nodes when processing the text inside. It seems to be
crying out for some kind of state variable


Modes can help to give some kind of state in which a node is to be
processed, here is my attempt at using them to solve the problem:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" encoding="UTF-8" indent="yes" />

<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()" />
</xsl:copy>
</xsl:template>

<xsl:template match="text">
<xsl:apply-templates select="node()" mode="flatten" />
</xsl:template>

<xsl:template match="text()" mode="flatten">
<text><xsl:value-of select="." /></text>
</xsl:template>

<xsl:template match="bold" mode="flatten">
<xsl:apply-templates select="node()" mode="flattenBold" />
</xsl:template>

<xsl:template match="text()" mode="flattenBold">
<text bold="true"><xsl:value-of select="." /></text>
</xsl:template>

<xsl:template match="italic" mode="flattenBold">
<xsl:apply-templates select="node()" mode="flattenBoldItalic" />
</xsl:template>

<xsl:template match="text()" mode="flattenBoldItalic">
<text bold="true" italic="true"><xsl:value-of select="." /></text>
</xsl:template>

</xsl:stylesheet>

The result is not quite what you want but besides a white space text
node showing up it has the right structure (note I wrapped your source
above in a <doc> element as otherwise if the result is flattened it
wouldn't have a root element):

<doc>
<text>
this is plain
</text>
<text bold="true">
this is bold
</text>
<text italic="true" bold="true">
this is bold-italic
</text>
<text bold="true">
</text>
<text>
this is plain
</text>
</doc>
Now to solve the whitespace text node issue I think the following should
help:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" encoding="UTF-8" indent="yes" />

<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()" />
</xsl:copy>
</xsl:template>

<xsl:template match="text">
<xsl:apply-templates select="node()" mode="flatten" />
</xsl:template>

<xsl:template match="text()" mode="flatten">
<xsl:variable name="normalizedText" select="normalize-space(.)" />
<xsl:if test="$normalizedText">
<text><xsl:value-of select="." /></text>
</xsl:if>
</xsl:template>

<xsl:template match="bold" mode="flatten">
<xsl:apply-templates select="node()" mode="flattenBold" />
</xsl:template>

<xsl:template match="text()" mode="flattenBold">
<xsl:variable name="normalizedText" select="normalize-space(.)" />
<xsl:if test="$normalizedText">
<text bold="true"><xsl:value-of select="." /></text>
</xsl:if>
</xsl:template>

<xsl:template match="italic" mode="flattenBold">
<xsl:apply-templates select="node()" mode="flattenBoldItalic" />
</xsl:template>

<xsl:template match="text()" mode="flattenBoldItalic">
<xsl:variable name="normalizedText" select="normalize-space(.)" />
<xsl:if test="$normalizedText">
<text bold="true" italic="true"><xsl:value-of select="." /></text>
</xsl:if>
</xsl:template>

</xsl:stylesheet>
--

Martin Honnen
http://JavaScript.FAQTs.com/

Jul 20 '05 #2

P: n/a
Hi Andy,

aj****@blueyonder.co.uk (Andy Fish) writes:
I'm stuck with an XSL problem - can anyone give me any hints?

I have some XML with nested formatting tags like this:

<text>
this is plain
<bold>
this is bold
<italic>
this is bold-italic
</italic>
</bold>
this is plain
</text>

which I need to 'flatten out' into something like this:

<text>this is plain</text>
<text bold="true">this is bold</text>
<text bold="true" italic="true">this is bold-italic</text>
<text>this is plain</text>

It doesn't have to work with any arbitrary tags - there are only a few
possible ones - but I'm not sure how to "remember" the outer level
formatting nodes when processing the text inside. It seems to be
crying out for some kind of state variable
Here's a brute force version that uses XPath to look at the ancestor
axis. Perhaps not as elegant as Martin's solution, but shorter and
easier to extend (Martin's script will grow as the factorial of the
number of options, I think, whereas this is only linear).

This transformation:

<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"


<xsl:output indent="yes"/>

<xsl:template match="/text">
<doc>
<xsl:apply-templates select="text()|node()"/>
</doc>
</xsl:template>

<xsl:template match="text()">
<xsl:if test="normalize-space(.)">
<text>
<xsl:if test="ancestor::bold">
<xsl:attribute name="bold">true</xsl:attribute>
</xsl:if>
<xsl:if test="ancestor::italic">
<xsl:attribute name="italic">true</xsl:attribute>
</xsl:if>
<xsl:value-of select="normalize-space(.)"/>
</text>
</xsl:if>
</xsl:template>

</xsl:stylesheet>
with this input
<text>
this is plain
<bold>
this is bold
<italic>
this is bold-italic
</italic>
</bold>
this is plain
</text>
gives this output
<?xml version="1.0"?>
<doc>
<text>this is plain</text>
<text bold="true">this is bold</text>
<text bold="true" italic="true">this is bold-italic</text>
<text>this is plain</text>
</doc>

Ben

--
Ben Edgington
Mail to the address above is discarded.
Mail to ben at that address might be read.
http://www.edginet.org/
Jul 20 '05 #3

P: n/a
Ben Edgington <us****@edginet.org> writes:
(Martin's script will grow as the factorial of the
number of options, I think, whereas this is only linear).


Sorry, not factorial, but exponential: there will be 2^n-1
cases for n elements.

--
Ben Edgington
Mail to the address above is discarded.
Mail to ben at that address might be read.
http://www.edginet.org/
Jul 20 '05 #4

P: n/a
Thanks to both yourself and martin for these two solutions.

although there are "only a few" formatting elements, I certainly don't fancy
having to enumerate every possible combination of them.

Fortunately performance will not be an issue so I can use your idea.

Andy

"Ben Edgington" <us****@edginet.org> wrote in message
news:87************@edginet.org...
Ben Edgington <us****@edginet.org> writes:
(Martin's script will grow as the factorial of the
number of options, I think, whereas this is only linear).


Sorry, not factorial, but exponential: there will be 2^n-1
cases for n elements.

--
Ben Edgington
Mail to the address above is discarded.
Mail to ben at that address might be read.
http://www.edginet.org/

Jul 20 '05 #5

P: n/a
"Andy Fish" <aj****@blueyonder.co.uk> writes:
although there are "only a few" formatting elements, I certainly don't fancy
having to enumerate every possible combination of them.

Fortunately performance will not be an issue so I can use your idea.
You don't need to choose!

Whilst watching television with my two-year-old this morning I
realised that, of course, recursion is the proper way to maintain
state-information in XSLT. (The challenge presented by the Fimbles is
limited, you understand.)

This solution combines the best of Martin's and mine: it maintains
state information without recalculating it from scratch every time,
and its size is only linear in the number of options considered.

This XSLT,

<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"


<xsl:output indent="yes"/>

<xsl:template match="/text">
<doc>
<xsl:apply-templates select="text()|node()">
<xsl:with-param name="italic" select="0"/>
<xsl:with-param name="bold" select="0"/>
</xsl:apply-templates>
</doc>
</xsl:template>

<xsl:template match="text()">
<xsl:param name="italic"/>
<xsl:param name="bold"/>
<xsl:if test="normalize-space(.)">
<text>
<xsl:if test="$bold">
<xsl:attribute name="bold">true</xsl:attribute>
</xsl:if>
<xsl:if test="$italic">
<xsl:attribute name="italic">true</xsl:attribute>
</xsl:if>
<xsl:value-of select="normalize-space(.)"/>
</text>
</xsl:if>
</xsl:template>

<xsl:template match="bold">
<xsl:param name="italic"/>
<xsl:apply-templates select="text()|node()">
<xsl:with-param name="italic" select="$italic"/>
<xsl:with-param name="bold" select="1"/>
</xsl:apply-templates>
</xsl:template>

<xsl:template match="italic">
<xsl:param name="bold"/>
<xsl:apply-templates select="text()|node()">
<xsl:with-param name="italic" select="1"/>
<xsl:with-param name="bold" select="$bold"/>
</xsl:apply-templates>
</xsl:template>

</xsl:stylesheet>
with this XML
<text>
this is plain
<bold>
this is bold
<italic>
this is bold-italic
</italic>
</bold>
this is plain
<italic>this is italic
<bold>this is italic-bold</bold>
</italic>
</text>
gives this output
<?xml version="1.0"?>
<doc>
<text>this is plain</text>
<text bold="true">this is bold</text>
<text bold="true" italic="true">this is bold-italic</text>
<text>this is plain</text>
<text italic="true">this is italic</text>
<text bold="true" italic="true">this is italic-bold</text>
</doc>
Hope you enjoy it.

Ben

--
Ben Edgington
Mail to the address above is discarded.
Mail to ben at that address might be read.
http://www.edginet.org/
Jul 20 '05 #6

P: n/a
"Ben Edgington" <us****@edginet.org> wrote in message
news:87************@edginet.org...

Whilst watching television with my two-year-old this morning I
realised that, of course, recursion is the proper way to maintain
state-information in XSLT. (The challenge presented by the Fimbles is
limited, you understand.)

This solution combines the best of Martin's and mine: it maintains
state information without recalculating it from scratch every time,
and its size is only linear in the number of options considered.


ah of course - I'd forgotten about apply-templates...with-param

It's very interesting how different these three approaches are. I had
already thought about something like Martin's idea but I figured it would
get out of hand so I didn't take it all the way to completion. Your original
was the lateral thinking solution I was hoping someone would come up with.
But this last one is the one I'm kicking myself for not thinking of - the
one I didn't think existed.

Andy

Jul 20 '05 #7

P: n/a
"Andy Fish" <aj****@blueyonder.co.uk> writes:
ah of course - I'd forgotten about apply-templates...with-param

It's very interesting how different these three approaches are. I had
already thought about something like Martin's idea but I figured it would
get out of hand so I didn't take it all the way to completion. Your original
was the lateral thinking solution I was hoping someone would come up with.
But this last one is the one I'm kicking myself for not thinking of - the
one I didn't think existed.


Exactly - I had a feeling when I submitted the original version that
there was a better way, which is why it kept bugging me. It's an
example of one of those things which are *obvious* when you spend a
couple of days thinking about them 8^)

Ben

--
Ben Edgington
Mail to the address above is discarded.
Mail to ben at that address might be read.
http://www.edginet.org/
Jul 20 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.