473,387 Members | 1,590 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

converting word to xml

Hello Everybody,
I have to conert the word doc to multiple html files,according to the templates in the word doc.

I had converted the word to xml.Also through Exsl ,had finished the multiple output html files.

The problem is while reading through the worddoc paragraph,the special characters are not identified.

So in the xml file,it's just storing that as "?".So I couldn't able to retrive the characters in my ouput html files.


Nov 12 '05 #1
8 5698
using cdata, XML is identifying the special chars.but while parsing through xsl it's not identifying.

Nov 12 '05 #2
prabha wrote:
using cdata, XML is identifying the special chars.but while parsing through xsl it's not identifying.


What do you mean?

--
Oleg Tkachenko
XmlInsider
http://blog.tkachenko.com
Nov 12 '05 #3

I am reading from the word doc tables,which includes special chars and html tags.when i convert the word doc to a structured xml it was converting that < to &lt; and special characters like "¢" to &cen;.When i pass the xml element text,i used CDATA.CDATA will be storing the same value within that.
In the XSL ,i had used EXSL for converting the xml to multiple html files.
the xsl parser is parsing the < to &lt; .I had tried the disable-output-escaping and the xsl:copy-of.But i couldn't able to convert with that.Then i used HTTPUTILITY's HTMLDECODE method to convert the "&lt;" to "<".
Now my problem is solved.Also For indenting the HTML files i used TIDY Component.

-- this is my XML o/p --
<Screen><Id>4_1_0</Id><Graphics>40</Graphics><Textcontent>60</Textcontent><TitleText><![CDATA[Refinance for a Higher Rate? / Welcome¢£¤]]></TitleText><Text><![CDATA[<UL><LI>fdfdgdgg¢£¤<LI>In this module, you will learn about situations where refinancing for a higher rate is a great benefit for your customers<UL><LI>Rtertert¢£¤<LI>Fseretert<UL><L I>Fsdgj¢£¤ <UL><LI>Ereret¢£¤<LI>Etetewt<UL><LI>dfndgdfg¢ £Â¤<LI>dfsdfdsg<LI>dsfgdsg<UL><LI>dfhdejthjrt©<LI >rrtrttrret<LI>ygfhgfhh<UL><LI>fgfdgdh<LI>d gdfggdfg<LI>dfdhjfdfjkgf</UL></UL></UL><LI>etetretry©</UL><LI>Dgkf<LI>Ldfldg©</UL></UL><LI>DOES THE CLIENT WANT TO ADD TARGETED OBJECTIVES FOR THIS MODULE? (do not code this) ©]]></Text></Screen>

anyway thanks for u'r response.

----- Oleg Tkachenko wrote: -----

prabha wrote:
using cdata, XML is identifying the special chars.but while parsing through xsl it's not identifying.


What do you mean?

--
Oleg Tkachenko
XmlInsider
http://blog.tkachenko.com

Nov 12 '05 #4
prabha wrote:
I am reading from the word doc tables,which includes special chars
and html tags.when i convert the word doc to a structured xml it was
converting that < to &lt; and special characters like "¢" to
&cen;. While former is ok, latter means your output encoding doesn't allow ¢
character to be placed natively.
When i pass the xml element text,i used CDATA.CDATA will be
storing the same value within that. In the XSL ,i had used EXSL for
converting the xml to multiple html files. the xsl parser is parsing
the < to &lt; .I had tried the disable-output-escaping and the
xsl:copy-of.But i couldn't able to convert with that.

That's known limitation of EXSLT.NET implementation of exsl:document
extension element. disable-output-escaping is ignored as always when you
transforming to XmlWriter.
--
Oleg Tkachenko
XmlInsider
http://blog.tkachenko.com
Nov 12 '05 #5
Can u just tell me that,whether my approach is ok or not
Is there any efficient approach for my requirements

----- Oleg Tkachenko wrote: ----

prabha wrote
I am reading from the word doc tables,which includes special char
and html tags.when i convert the word doc to a structured xml it wa
converting that < to &lt; and special characters like "¢" t
&cen; While former is ok, latter means your output encoding doesn't allow ¢
character to be placed natively
When i pass the xml element text,i used CDATA.CDATA will b
storing the same value within that. In the XSL ,i had used EXSL fo
converting the xml to multiple html files. the xsl parser is parsin
the < to &lt; .I had tried the disable-output-escaping and th
xsl:copy-of.But i couldn't able to convert with that

That's known limitation of EXSLT.NET implementation of exsl:document
extension element. disable-output-escaping is ignored as always when you
transforming to XmlWriter
--
Oleg Tkachenk
XmlInside
http://blog.tkachenko.co

-

I am reading from the word doc tables,which includes special chars and html tags.when i convert the word doc to a structured xml it was converting that < to &lt; and special characters like "¢" to &cen;.When i pass the xml element text,i used CDATA.CDATA will be storing the same value within that.
In the XSL ,i had used EXSL for converting the xml to multiple html files
the xsl parser is parsing the < to &lt; .I had tried the disable-output-escaping and the xsl:copy-of.But i couldn't able to convert with that.Then i used HTTPUTILITY's HTMLDECODE method to convert the "&lt;" to "<"
Now my problem is solved.Also For indenting the HTML files i used TIDY Component

-- this is my XML o/p -
<Screen><Id>4_1_0</Id><Graphics>40</Graphics><Textcontent>60</Textcontent><TitleText><![CDATA[Refinance for a Higher Rate? / Welcome¢£¤]]></TitleText><Text><![CDATA[<UL><LI>fdfdgdgg¢£¤<LI>In this module, you will learn about situations where refinancing for a higher rate is a great benefit for your customers<UL><LI>Rtertert¢£¤<LI>Fseretert<UL><L I>Fsdgj¢£¤ <UL><LI>Ereret¢£¤<LI>Etetewt<UL><LI>dfndgdfg¢ £Â¤<LI>dfsdfdsg<LI>dsfgdsg<UL><LI>dfhdejthjrt©<LI >rrtrttrret<LI>ygfhgfhh<UL><LI>fgfdgdh<LI>d gdfggdfg<LI>dfdhjfdfjkgf</UL></UL></UL><LI>etetretry©</UL><LI>Dgkf<LI>Ldfldg©</UL></UL><LI>DOES THE CLIENT WANT TO ADD TARGETED OBJECTIVES FOR THIS MODULE? (do not code this) ©]]></Text></Screen

anyway thanks for u'r response
Nov 12 '05 #6
prabha wrote:
Can u just tell me that,whether my approach is ok or not? I think it's ok.
Is there any efficient approach for my requirements?

The best way would be to avoid escaped HTML markup (make it XHTML for
instance). As you see now escaped markup is almost always a trouble.
--
Oleg Tkachenko
XmlInsider
http://blog.tkachenko.com
Nov 12 '05 #7

=?Utf-8?B?cHJhYmhh?= wrote:
*Hello Everybody,
I have to conert the word doc to multiple html files,according to th
templates in the word doc.

I had converted the word to xml.Also through Exsl ,had finished th
multiple output html files.

The problem is while reading through the worddoc paragraph,th
special characters are not identified.

So in the xml file,it's just storing that as "?".So I couldn't abl
to retrive the characters in my ouput html files. *


*************
Hi Could you please let me know how you were able to convert Word (i
it 2003) to xml, I am looking for something that can parse out custome
supplied xml tags from word 2003 doc. Any help is appreciated.

Thanks,
Me

memore
-----------------------------------------------------------------------
Posted via http://www.mcse.m
-----------------------------------------------------------------------
View this thread: http://www.mcse.ms/message298337.htm

Nov 12 '05 #8
memorex wrote:
Hi Could you please let me know how you were able to convert Word (is
it 2003) to xml, I am looking for something that can parse out customer
supplied xml tags from word 2003 doc. Any help is appreciated.


If your customer can save Word documents as XML, then this "something"
is XML parser.

--
Oleg Tkachenko [XML MVP, XmlInsider]
http://blog.tkachenko.com
Nov 12 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

21
by: Davinder | last post by:
can anyone recommend a good tool to convert documents to HTML on the fly. I need to integrate this tool with a VB app so it must have an API. thanks in advance Davinder davinder@gujral.co.uk
20
by: Al Moritz | last post by:
Hi all, I was always told that the conversion of Word files to HTML as done by Word itself sucks - you get a lot of unnecessary code that can influence the design on web browsers other than...
4
by: Drew | last post by:
Here at work, we have 2 copies of our policies and procedures. We have a Word format, and a PDF format. The problem is that when a policy is changed (there is a commitee meeting every month), the...
2
by: Asbjørn Ulsberg | last post by:
Hi. I'm trying to convert Brady Hegberg's great RTF2HTML VB 6.0 module to C#. I've managed to convert the VB code to VB.NET, which gave me the following code: Option Strict On Option...
12
by: AMP | last post by:
Hello, I have in c: WORD calcChecksum(BYTE data, WORD length) { WORD* i_data; WORD checksum= 0; BYTE i= 0; i_data= (WORD*)data;
2
anukagni
by: anukagni | last post by:
Hi, iam having an database for that i have created an user manual includes help topics ..I prepared in the word format and i want to covert this to html help . Iam having Ms Html workshop but i...
1
by: =?Utf-8?B?U3FsQmVnaW5uZXI=?= | last post by:
I want to automate a process of converting documents (*.doc) to html pages using C#. Please note that documents might contain images within it. Any pointers in this regard would be of great help...
5
by: Frederik Van Bogaert | last post by:
Hi! I've taken my first steps into the world of c++ by trying to write a text adventure game. Things are proceeding fine, but there's some code in there that isn't very well coded. More...
1
by: ganesh22 | last post by:
Hi, Iam getting the below error while my application is running on IIS. in my application iam converting a text into word format, so i added some .dll from COM for converting word format ...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.