363,925 Members | 2615 Browsing Online
Community for Developers & IT Professionals
Bytes IT Community

converting word to xml

prabha
P: n/a
prabha
Hello Everybody,
I have to conert the word doc to multiple html files,according to the templates in the word doc.

I had converted the word to xml.Also through Exsl ,had finished the multiple output html files.

The problem is while reading through the worddoc paragraph,the special characters are not identified.

So in the xml file,it's just storing that as "?".So I couldn't able to retrive the characters in my ouput html files.




Nov 12 '05 #1
Share this Question
Share on Google+
8 Replies


prabha
P: n/a
prabha
using cdata, XML is identifying the special chars.but while parsing through xsl it's not identifying.

Nov 12 '05 #2

Oleg Tkachenko
P: n/a
Oleg Tkachenko
prabha wrote:
[color=blue]
> using cdata, XML is identifying the special chars.but while parsing through xsl it's not identifying.[/color]

What do you mean?

--
Oleg Tkachenko
XmlInsider
http://blog.tkachenko.com
Nov 12 '05 #3

prabha
P: n/a
prabha

I am reading from the word doc tables,which includes special chars and html tags.when i convert the word doc to a structured xml it was converting that < to &lt; and special characters like "¢" to &cen;.When i pass the xml element text,i used CDATA.CDATA will be storing the same value within that.
In the XSL ,i had used EXSL for converting the xml to multiple html files.
the xsl parser is parsing the < to &lt; .I had tried the disable-output-escaping and the xsl:copy-of.But i couldn't able to convert with that.Then i used HTTPUTILITY's HTMLDECODE method to convert the "&lt;" to "<".
Now my problem is solved.Also For indenting the HTML files i used TIDY Component.

-- this is my XML o/p --
<Screen><Id>4_1_0</Id><Graphics>40</Graphics><Textcontent>60</Textcontent><TitleText><![CDATA[Refinance for a Higher Rate? / Welcome¢£¤]]></TitleText><Text><![CDATA[<UL><LI>fdfdgdgg¢£¤<LI>In this module, you will learn about situations where refinancing for a higher rate is a great benefit for your customers<UL><LI>Rtertert¢£¤<LI>Fseretert<UL><L I>Fsdgj¢£¤ <UL><LI>Ereret¢£¤<LI>Etetewt<UL><LI>dfndgdfg¢ £Â¤<LI>dfsdfdsg<LI>dsfgdsg<UL><LI>dfhdejthjrt©<LI >rrtrttrret<LI>ygfhgfhh<UL><LI>fgfdgdh<LI>d gdfggdfg<LI>dfdhjfdfjkgf</UL></UL></UL><LI>etetretry©</UL><LI>Dgkf<LI>Ldfldg©</UL></UL><LI>DOES THE CLIENT WANT TO ADD TARGETED OBJECTIVES FOR THIS MODULE? (do not code this) ©]]></Text></Screen>

anyway thanks for u'r response.

----- Oleg Tkachenko wrote: -----

prabha wrote:
[color=blue]
> using cdata, XML is identifying the special chars.but while parsing through xsl it's not identifying.[/color]

What do you mean?

--
Oleg Tkachenko
XmlInsider
http://blog.tkachenko.com

Nov 12 '05 #4

Oleg Tkachenko
P: n/a
Oleg Tkachenko
prabha wrote:
[color=blue]
> I am reading from the word doc tables,which includes special chars
> and html tags.when i convert the word doc to a structured xml it was
> converting that < to &lt; and special characters like "¢" to
> &cen;.[/color]
While former is ok, latter means your output encoding doesn't allow ¢
character to be placed natively.
[color=blue]
> When i pass the xml element text,i used CDATA.CDATA will be
> storing the same value within that. In the XSL ,i had used EXSL for
> converting the xml to multiple html files. the xsl parser is parsing
> the < to &lt; .I had tried the disable-output-escaping and the
> xsl:copy-of.But i couldn't able to convert with that.[/color]
That's known limitation of EXSLT.NET implementation of exsl:document
extension element. disable-output-escaping is ignored as always when you
transforming to XmlWriter.
--
Oleg Tkachenko
XmlInsider
http://blog.tkachenko.com
Nov 12 '05 #5

prabha
P: n/a
prabha
Can u just tell me that,whether my approach is ok or not
Is there any efficient approach for my requirements

----- Oleg Tkachenko wrote: ----

prabha wrote
[color=blue]
> I am reading from the word doc tables,which includes special char
> and html tags.when i convert the word doc to a structured xml it wa
> converting that < to &lt; and special characters like "¢" t
>&cen;[/color]
While former is ok, latter means your output encoding doesn't allow ¢
character to be placed natively
[color=blue]
> When i pass the xml element text,i used CDATA.CDATA will b
> storing the same value within that. In the XSL ,i had used EXSL fo
> converting the xml to multiple html files. the xsl parser is parsin
> the < to &lt; .I had tried the disable-output-escaping and th
> xsl:copy-of.But i couldn't able to convert with that[/color]
That's known limitation of EXSLT.NET implementation of exsl:document
extension element. disable-output-escaping is ignored as always when you
transforming to XmlWriter
--
Oleg Tkachenk
XmlInside
http://blog.tkachenko.co

-

I am reading from the word doc tables,which includes special chars and html tags.when i convert the word doc to a structured xml it was converting that < to &lt; and special characters like "¢" to &cen;.When i pass the xml element text,i used CDATA.CDATA will be storing the same value within that.
In the XSL ,i had used EXSL for converting the xml to multiple html files
the xsl parser is parsing the < to &lt; .I had tried the disable-output-escaping and the xsl:copy-of.But i couldn't able to convert with that.Then i used HTTPUTILITY's HTMLDECODE method to convert the "&lt;" to "<"
Now my problem is solved.Also For indenting the HTML files i used TIDY Component

-- this is my XML o/p -
<Screen><Id>4_1_0</Id><Graphics>40</Graphics><Textcontent>60</Textcontent><TitleText><![CDATA[Refinance for a Higher Rate? / Welcome¢£¤]]></TitleText><Text><![CDATA[<UL><LI>fdfdgdgg¢£¤<LI>In this module, you will learn about situations where refinancing for a higher rate is a great benefit for your customers<UL><LI>Rtertert¢£¤<LI>Fseretert<UL><L I>Fsdgj¢£¤ <UL><LI>Ereret¢£¤<LI>Etetewt<UL><LI>dfndgdfg¢ £Â¤<LI>dfsdfdsg<LI>dsfgdsg<UL><LI>dfhdejthjrt©<LI >rrtrttrret<LI>ygfhgfhh<UL><LI>fgfdgdh<LI>d gdfggdfg<LI>dfdhjfdfjkgf</UL></UL></UL><LI>etetretry©</UL><LI>Dgkf<LI>Ldfldg©</UL></UL><LI>DOES THE CLIENT WANT TO ADD TARGETED OBJECTIVES FOR THIS MODULE? (do not code this) ©]]></Text></Screen

anyway thanks for u'r response


Nov 12 '05 #6

Oleg Tkachenko
P: n/a
Oleg Tkachenko
prabha wrote:
[color=blue]
> Can u just tell me that,whether my approach is ok or not?[/color]
I think it's ok.
[color=blue]
> Is there any efficient approach for my requirements?[/color]
The best way would be to avoid escaped HTML markup (make it XHTML for
instance). As you see now escaped markup is almost always a trouble.
--
Oleg Tkachenko
XmlInsider
http://blog.tkachenko.com
Nov 12 '05 #7

memorex
P: n/a
memorex

=?Utf-8?B?cHJhYmhh?= wrote:[color=blue]
> *Hello Everybody,
> I have to conert the word doc to multiple html files,according to th
> templates in the word doc.
>
> I had converted the word to xml.Also through Exsl ,had finished th
> multiple output html files.
>
> The problem is while reading through the worddoc paragraph,th
> special characters are not identified.
>
> So in the xml file,it's just storing that as "?".So I couldn't abl
> to retrive the characters in my ouput html files. *[/color]



*************
Hi Could you please let me know how you were able to convert Word (i
it 2003) to xml, I am looking for something that can parse out custome
supplied xml tags from word 2003 doc. Any help is appreciated.

Thanks,
Me

memore
-----------------------------------------------------------------------
Posted via http://www.mcse.m
-----------------------------------------------------------------------
View this thread: http://www.mcse.ms/message298337.htm

Nov 12 '05 #8

Oleg Tkachenko [MVP]
P: n/a
Oleg Tkachenko [MVP]
memorex wrote:
[color=blue]
> Hi Could you please let me know how you were able to convert Word (is
> it 2003) to xml, I am looking for something that can parse out customer
> supplied xml tags from word 2003 doc. Any help is appreciated.[/color]

If your customer can save Word documents as XML, then this "something"
is XML parser.

--
Oleg Tkachenko [XML MVP, XmlInsider]
http://blog.tkachenko.com
Nov 12 '05 #9

Post your reply

Help answer this question



Didn't find the answer to your .NET Framework question?

You can also browse similar questions: .NET Framework