468,103 Members | 1,193 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,103 developers. It's quick & easy.

HTML Via XSLT to Plain Text output

Hi All.
I'm trying to transform a html document into plain text via xslt.
Simple you say! (i hope)
I have got it working, by using the magnificent <xsl:value-of select="."/>.
This returns the whole document, and <xsl:output method="text"/> ensures that the output I get is plain text.
problem:
The html I am transforming has a table, with headings and data. Whilst the output contains all the data form the table, it does not preserve any formatting, and concatenates all the data within the table.
Can you suggest how i could extract the data from the table, and present in plain text? The only formatting I require, is that the spacing between the columns is somewhat preserved.
I am, as im sure you can tell, an xslt noob still, even with many years of application development under my belt!.
Thanks for all your help!

Xslt:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select="."/>
<xsl:value-of select="normalize-space(.)"/>
</xsl:template>
</xsl:stylesheet>

Html:

<HTML xmlns="http://www.w3.org/1999/xhtml">
<HEAD></HEAD>
<BODY>
<p>Dear person,</p>
<p>The following are columns are required to preserve the formatting.</p>
<table cellpadding="0" cellspacing="0" width="50%">
<tr>
<td width="20%">Column 1</td>
<td width="25%">Column 2</td>
<td width="20%">Column 3</td>
<td width="20%">Column 4</td>
</tr>
<tr>
<td><font size="4">01/06/2008</font></td>
<td><font size="4">34.2</font></td>
<td><font size="4">A Name</font></td>
<td><font size="4">42.00</font></td>
</tr>
</table>
</BODY>
</HTML>


result:
somethign like...

Dear person, The following are columns are required to preserve the formatting. Column 1Column 2Column 3Column 401/06/200834.2A Name42.00
;
Any suggestions would be welcome:)
Oct 3 '07 #1
3 8681
Hi I'm also noob, but I manage to find this code for space

You can try adding this within the template:
<xsl:value-of select="'&#x20;'" />
or
<xsl:text> </xsl:text>

Hope it helps.

Cheers,
Gaiason

Hi All.
I'm trying to transform a html document into plain text via xslt.
Simple you say! (i hope)
I have got it working, by using the magnificent <xsl:value-of select="."/>.
This returns the whole document, and <xsl:output method="text"/> ensures that the output I get is plain text.
problem:
The html I am transforming has a table, with headings and data. Whilst the output contains all the data form the table, it does not preserve any formatting, and concatenates all the data within the table.
Can you suggest how i could extract the data from the table, and present in plain text? The only formatting I require, is that the spacing between the columns is somewhat preserved.
I am, as im sure you can tell, an xslt noob still, even with many years of application development under my belt!.
Thanks for all your help!

Xslt:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select="."/>
<xsl:value-of select="normalize-space(.)"/>
</xsl:template>
</xsl:stylesheet>

Html:

<HTML xmlns="http://www.w3.org/1999/xhtml">
<HEAD></HEAD>
<BODY>
<p>Dear person,</p>
<p>The following are columns are required to preserve the formatting.</p>
<table cellpadding="0" cellspacing="0" width="50%">
<tr>
<td width="20%">Column 1</td>
<td width="25%">Column 2</td>
<td width="20%">Column 3</td>
<td width="20%">Column 4</td>
</tr>
<tr>
<td><font size="4">01/06/2008</font></td>
<td><font size="4">34.2</font></td>
<td><font size="4">A Name</font></td>
<td><font size="4">42.00</font></td>
</tr>
</table>
</BODY>
</HTML>


result:
somethign like...

Dear person, The following are columns are required to preserve the formatting. Column 1Column 2Column 3Column 401/06/200834.2A Name42.00
;
Any suggestions would be welcome:)
Oct 3 '07 #2
jkmyoung
2,057 Expert 2GB
Instead of using <xsl:value-of select="."/>
I suggest using <xsl:apply-templates/>
Then have 2 templates like so:
Expand|Select|Wrap|Line Numbers
  1. <xsl:template match="tr">
  2.   <xsl:apply-templates/>
  3.     <xsl:text>
  4. </xsl:text><!-- add a newline -->
  5. </xsl:template>
  6.  
  7. <xsl:template match="td">
  8.   <xsl:text> </xsl:text>
  9.     <xsl:apply-templates/>
  10.   <xsl:text> </xsl:text>
  11. </xsl:template>
Oct 3 '07 #3
Thanks for your replies, I will give it a go.
Might just use regex to strip out all html tags, seems to work ok for our needs.
Oct 4 '07 #4

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

6 posts views Thread by Pete | last post: by
3 posts views Thread by pradeep gummi | last post: by
1 post views Thread by shea | last post: by
5 posts views Thread by eva.mukhija | last post: by
12 posts views Thread by Chris | last post: by
1 post views Thread by Solo | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.