format that can be imported into Framemaker. The challenge, it turns
out, is correctly transforming the flat html header tags (<H1>, <H2>,
etc)
into nested sections inside the xml. I have made significant
progress, but have run into a roadblock.
Here is an example of my input HTML:
<html><body>
<p>abc abc</p>
<h1 class='header'>A</h1>
<p>A abc abc</p>
<h2 class='header'>B</h2>
<p>B abc abc</p>
<h3 class='header'>C</h3>
<p>Cabc abc</p>
<h2 class='header'>D</h2<!-- this is missing in the output --
><p>D abc abc</p<!-- this is missing in the output -->
<h1 class='header'>E</h1>
<p>E abc abc</p>
</body></html>
Here is an example of the output, you'll notice that the <H2>D</h2>
is missing.
<?xml version="1.0" encoding="UTF-8"?>
<article>
<title/>
<para>abc abc</para>
<section depth="1" id="A">
<title>A</title>
<para>A abc abc</para>
<section depth="2" id="B">
<title>B</title>
<para>B abc abc</para>
<section depth="3" id="C">
<title>C</title>
<para>C abc abc</para>
</section>
</section>
</section>
<section depth="1" id="E">
<title>E</title>
<para>E abc abc</para>
</section>
The problem is that my code is currently applying templates to all
nodes following a header who's nearest preceding header is that same
header. For this reason when content follows a header which isn't
it's header (like an <h2following an <h3>) it doesn't get shown.
What I don't understand is how to fix it. Any help would much
appreciated. I'm not really an xsl guru, so I'm doing the best I can
to get through this.
Here is the relevant code from my xsl:
<xsl:template match="body">
<article>
<title>
<xsl:value-of select="$docTitle" />
</title>
<xsl:for-each select='child::*[not(preceding-
sibling::*[@class="header"])][not(@class="header")]'>
<xsl:apply-templates select="."/>
</xsl:for-each>
<xsl:variable name='depth'
select='substring(name(child::*[@class="header"][1]),2)'/>
<xsl:for-each select='child::*[@class="header"]
[substring(name(),
2)<=$depth]'>
<xsl:apply-templates select="."/>
</xsl:for-each>
</article>
</xsl:template>
<xsl:template match="h1 | h2 | h3 | h4 | h5">
<xsl:call-template name="header">
<xsl:with-param name="depth" select="substring(name(),2)"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="header">
<xsl:param name="depth"/>
<section>
<xsl:attribute name="depth">
<xsl:value-of select="$depth"/>
</xsl:attribute>
<xsl:attribute name="id">
<xsl:value-of select="translate(.,' ','')" />
</xsl:attribute>
<title><xsl:value-of select="."/></title>
<xsl:variable name='thisHeader' select='generate-id(.)'/>
<xsl:for-each select='following-sibling::*[$thisHeader=generate-
id(preceding-sibling::*[@class="header"][last()])]
[not(@class="header") or (@class="header" and substring(name(),2)>=
$depth)]'>
<xsl:apply-templates select="."/>
</xsl:for-each>
</section>
</xsl:template>