Has anyone solved the issue of translating lists in Word 2003 (WordML)
into xHTML? I have been trying to get the nested table code for my XSLT
to work for a while now, with no way to get the collection that I need.
To begin, I am using xsltproc that conmes with Cygwin as my processor.
I have no particular affinity to this processor except that it is open
source and standards compliant. I don't like M$, but if using a M$
processing program will fix this transformation, then I will use it.
xsltproc can be gotten here (for Windows platform):
http://www.zlatkovic.com/libxml.en.html
ftp://ftp.zlatkovic.com/pub/libxml/
(This is a windows port of libxslt, that comes with GNOME).
The problem is this:
As those of you who have worked with this type of problem, the WordML
structure is a flat structure where the focus is on visual formatting.
So, instead of a nicely nested list structure like HTML has, WordML has
a linear collection of w:p elements that contain a child <w:listPr>
element, containing the list information. A typical Word paragraph that
represents a list item is shown here:
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore=" 360" wx:wTabAfter="2 40"/>
<wx:font wx:val="Symbol"/>
</w:listPr>
</w:pPr>
<w:r>
<w:t>Bulleted item 1</w:t>
</w:r>
</w:p>
The item <w:ilvl w:val="0"/> tells me that the level of nesting for this
item is "0", i.e. the first level (zero based counting).
My model for processing this list was this: As I encounter the first
<w:p> that is a list item, represented by the xPath
match="w:p[descendant-or-self::w:pPr/w:listPr][1]", then I grab the
entire collection of following-sibling elements that are paragraphs with
listPr children. This is "grabbing the list". I call a template and
pass this list to the template.
The template itself is a recursive template. Whenever I encounter a
"transition al list item" (one that is at a level greater then the
current level being processed by the template), I want to grab the
sub-collection of list elements above my current level, enclose them in
<ol></ol> and then call the template again with the new collection.
So... what is my problem? Let us pretend that my list looks like this:
* Bulleted item 1
* Bulleted item 2
o First level nesting, bulleted item 2-1
o First level nesting, bulleted item 2-2
* Bulleted item 3
* Bulleted item 4
o First level nesting, bulleted item 4-1
o First level nesting, bulleted item 4-2
o First level nesting, bulleted item 4-3
o First level nesting, bulleted item 4-4
* Bulleted item 5
* Bulleted item 6
When I am processing level 0, I don't have any issues until I grab the
items on level 1. When I do, I not only get the items 2-1 and 2-2, but
also 4-1, 4-2, 4-3, and 4-4. I have tried tweaking the xPath for this
list, but to no avail. My output looks like this with my method:
* Bulleted item 1
* Bulleted item 2
o First level nesting, bulleted item 2-1
o First level nesting, bulleted item 2-2
o First level nesting, bulleted item 4-1
o First level nesting, bulleted item 4-2
o First level nesting, bulleted item 4-3
o First level nesting, bulleted item 4-4
* Bulleted item 3
* Bulleted item 4
o First level nesting, bulleted item 4-1
o First level nesting, bulleted item 4-2
o First level nesting, bulleted item 4-3
o First level nesting, bulleted item 4-4
* Bulleted item 5
* Bulleted item 6
This following 2 items are the stripped down WordML and stripped down
XSLT for this transformation, to make this posting not insanely long.
If anyone can contribute to this problem or has already solved it, I
would be most grateful for feedback.
Cliff
*************** *************** *************** *************** *************** *****
XSLT for processing the WordML
*************** *************** *************** *************** *************** *****
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY tab " ">
<!ENTITY sp " ">
<!ENTITY crlf "
">
<!ENTITY nbsp " ">
<!ENTITY bullet "•">
]>
<xsl:styleshe et xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.microso ft.com/office/word/2003/wordml"
xmlns:v="urn:sc hemas-microsoft-com:vml"
xmlns:w10="urn: schemas-microsoft-com:office:word "
xmlns:sl="http://schemas.microso ft.com/schemaLibrary/2003/core"
xmlns:aml="http ://schemas.microso ft.com/aml/2001/core"
xmlns:wx="http://schemas.microso ft.com/office/word/2003/auxHint"
xmlns:o="urn:sc hemas-microsoft-com:office:offi ce"
xmlns:dt="uuid: C2F41010-65B3-11d1-A29F-00AA00C14882"
xmlns:st1="urn: schemas-microsoft-com:office:smar ttags" version="1.0"
exclude-result-prefixes="w v w10 sl aml wx o dt st1">
<!-- START stylesheet commands -->
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"
doctype-system="fubar.d td" />
<xsl:strip-space elements="*" />
<xsl:preserve-space elements="w:bin Data w:tab" />
<!-- End stylesheet commands -->
<!-- START variable declarations -->
<!-- null value for text comparisons -->
<xsl:variable name="null"></xsl:variable>
<!-- null value for text comparisons -->
<xsl:variable name="space">&s p;</xsl:variable>
<!-- null value for text comparisons -->
<xsl:variable name="bullet">À ·</xsl:variable>
<!-- END variable declarations -->
<!-- START template declarations -->
<xsl:template match="/w:wordDocument" >
<html>
<!-- Process the head information -->
<xsl:apply-templates select="//o:DocumentPrope rties" mode="head" />
<!-- Process the body information -->
<xsl:apply-templates select="//w:body" mode="body" />
</html>
</xsl:template>
<xsl:template match="w:body" mode="body">
<body>
<xsl:apply-templates select="*" mode="body" />
</body>
</xsl:template>
<xsl:template match="wx:sect" mode="body">
<xsl:apply-templates mode="body" />
</xsl:template>
<xsl:template match="wx:sub-section" mode="body">
<xsl:apply-templates mode="body" />
</xsl:template>
<xsl:template match="w:p[descendant-or-self::w:pPr/w:listPr][1]"
mode="body">
<!-- <xsl:comment> w:p[1] template match found... </xsl:comment> -->
<xsl:call-template name="listProce ssor" mode="list">
<xsl:with-param name="myCollect ionOfSiblingLis tItems"
select=".|follo wing-sibling::w:p[descendant-or-self::w:pPr/w:listPr]" />
</xsl:call-template>
</xsl:template>
<xsl:template name="listProce ssor" mode="list">
<xsl:param name="myCollect ionOfSiblingLis tItems" />
<xsl:variable name="myCurrent ListLevel"
select="$myColl ectionOfSibling ListItems[1]/w:pPr/w:listPr/w:ilvl/@w:val" />
<ul>
<xsl:for-each select="$myColl ectionOfSibling ListItems">
<xsl:variable name="previousS iblingListLevel "
select="precedi ng-sibling::w:p[position() =
1]/w:pPr/w:listPr/w:ilvl/@w:val" />
<xsl:variable name="myOwnCurr entListLevel"
select="descend ant-or-self::w:p/w:pPr/w:listPr/w:ilvl/@w:val" />
<xsl:variable name="nextSibli ngListLevel"
select="followi ng-sibling::w:p[position() =
1]/w:pPr/w:listPr/w:ilvl/@w:val" />
<xsl:variable name="attempToG etTheRightSetIn toAVariable"
select="followi ng-sibling::w:p[child::w:pPr/w:listPr/w:ilvl/@w:val][generate-id(preceding-sibling::w:p[child::w:pPr/w:listPr/w:ilvl/@w:val
= 0]) = generate-id(current())]" />
<!-- <xsl:comment> current contents: <xsl:value-of
select="current ()" /><xsl:text> </xsl:text></xsl:comment> -->
<!-- <xsl:comment><x sl:text> *****Found a collection of this many
items: </xsl:text><xsl:v alue-of
select="count($ attempToGetTheR ightSetIntoAVar iable)" /><xsl:text>
</xsl:text></xsl:comment> -->
<xsl:choose>
<xsl:when
test="number(de scendant-or-self::w:pPr/w:listPr/w:ilvl/@w:val) =
number($myCurre ntListLevel)">
<li>
<xsl:call-template name="processPa ragraphAsListIt emContents"
mode="list" />
</li>
</xsl:when>
<xsl:when test="(
number(descenda nt-or-self::w:pPr/w:listPr/w:ilvl/@w:val) >
number($myCurre ntListLevel) ) and (
number(descenda nt-or-self::w:pPr/w:listPr/w:ilvl/@w:val) >
number($previou sSiblingListLev el))">
<xsl:variable name="nextListI temIndexOnOrBel owMyLevel"
select="followi ng-sibling::w:p[w:pPr/w:listPr/w:ilvl/@w:val <=
number($myCurre ntListLevel)]" />
<xsl:variable name="subCollec tion"
select=".|follo wing-sibling::w:p[descendant-or-self::w:pPr/w:listPr/w:ilvl/@w:val
> number($myCurre ntListLevel)]"></xsl:variable>
<!-- <xsl:comment> My current list level for recursive call:
<xsl:value-of select="number( $myCurrentListL evel)" /> , with current
contents: <xsl:value-of select="." /><xsl:text>
</xsl:text></xsl:comment> -->
<li>
<xsl:call-template name="listProce ssor" mode="list" >
<xsl:with-param name="myCollect ionOfSiblingLis tItems"
select="$subCol lection" />
</xsl:call-template>
</li>
</xsl:when>
<xsl:otherwis e>
<!-- Do nothing! -->
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</ul>
</xsl:template>
<xsl:template name="processPa ragraphAsListIt emContents" mode="list">
<xsl:if test="descendan t-or-self::text()">
<xsl:apply-templates mode="body" />
</xsl:if>
</xsl:template>
<xsl:template match="w:t" mode="body">
<xsl:value-of select="." />
</xsl:template>
<xsl:template match="w:r|w:b| w:u|w:i" mode="body">
<xsl:apply-templates mode="body" />
</xsl:template>
<xsl:template match="*" mode="body">
<!-- Do nothing... drop content here... -->
</xsl:template>
<!-- END template declarations -->
</xsl:stylesheet>
*************** *************** *************** *************** *************** *****
Sample stripped down WordML
*************** *************** *************** *************** *************** *****
<?xml version="1.0" encoding="UTF-8" standalone="yes "?>
<?mso-application progid="Word.Do cument"?>
<w:wordDocume nt
xmlns:w="http://schemas.microso ft.com/office/word/2003/wordml"
xmlns:v="urn:sc hemas-microsoft-com:vml"
xmlns:w10="urn: schemas-microsoft-com:office:word "
xmlns:sl="http://schemas.microso ft.com/schemaLibrary/2003/core"
xmlns:aml="http ://schemas.microso ft.com/aml/2001/core"
xmlns:wx="http://schemas.microso ft.com/office/word/2003/auxHint"
xmlns:o="urn:sc hemas-microsoft-com:office:offi ce"
xmlns:dt="uuid: C2F41010-65B3-11d1-A29F-00AA00C14882"
w:macrosPresent ="no" w:embeddedObjPr esent="no" w:ocxPresent="n o"
xml:space="pres erve">
<w:body>
<wx:sect>
<wx:sub-section>
<w:p>
<w:pPr>
<w:pStyle w:val="Heading1 "/></w:pPr>
<w:r>
<w:t>Test #9</w:t></w:r></w:p>
<w:p>
<w:r>
<w:t>Here is a bulleted test list with 2 levels deep
nesting:</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore=" 360" wx:wTabAfter="2 40"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Bulleted item 1</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore=" 360" wx:wTabAfter="2 40"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Bulleted item 2</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore=" 1080" wx:wTabAfter="2 10"/>
<wx:font wx:val="Courier New"/></w:listPr></w:pPr>
<w:r>
<w:t>First level nesting, bulleted item 2-1</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore=" 1080" wx:wTabAfter="2 10"/>
<wx:font wx:val="Courier New"/></w:listPr></w:pPr>
<w:r>
<w:t>First level nesting, bulleted item 2-2</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore=" 360" wx:wTabAfter="2 40"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Bulleted item 3</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore=" 360" wx:wTabAfter="2 40"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Bulleted item 4</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore=" 1080" wx:wTabAfter="2 10"/>
<wx:font wx:val="Courier New"/></w:listPr></w:pPr>
<w:r>
<w:t>First level nesting, bulleted item 4-1</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore=" 1080" wx:wTabAfter="2 10"/>
<wx:font wx:val="Courier New"/></w:listPr></w:pPr>
<w:r>
<w:t>First level nesting, bulleted item 4-2</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore=" 1080" wx:wTabAfter="2 10"/>
<wx:font wx:val="Courier New"/></w:listPr></w:pPr>
<w:r>
<w:t>First level nesting, bulleted item 4-3</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore=" 1080" wx:wTabAfter="2 10"/>
<wx:font wx:val="Courier New"/></w:listPr></w:pPr>
<w:r>
<w:t>First level nesting, bulleted item 4-4</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore=" 360" wx:wTabAfter="2 40"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Bulleted item 5</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore=" 360" wx:wTabAfter="2 40"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Bulleted item 6</w:t></w:r></w:p>
<w:p>
<w:r>
<w:t>Here is some following text...</w:t></w:r></w:p></wx:sub-section>
<wx:sub-section>
<w:p>
<w:pPr>
<w:pStyle w:val="Heading1 "/></w:pPr>
<w:r>
<w:t>Test #10</w:t></w:r></w:p>
<w:p>
<w:r>
<w:t>Here is another bulleted test list for testing
purposes:</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore=" 360" wx:wTabAfter="2 40"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Another list entirely, Bulleted item 1</w:t></w:r></w:p>
<w:p>
<w:pPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="·" wx:wTabBefore=" 360" wx:wTabAfter="2 40"/>
<wx:font wx:val="Symbol"/></w:listPr></w:pPr>
<w:r>
<w:t>Another list entirely, Bulleted item 2</w:t></w:r></w:p>
<w:p/>
<w:sectPr>
<w:pgSz w:w="12240" w:h="15840"/>
<w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800"
w:header="720" w:footer="720" w:gutter="0"/>
<w:cols w:space="720"/>
<w:docGrid
w:line-pitch="360"/></w:sectPr></wx:sub-section></wx:sect></w:body></w:wordDocument>