473,396 Members | 1,972 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Re: How to use win32com to convert a MS WORD doc to HTML ?

Lave wrote:
Hi, all !

I'm a totally newbie huh:)

I want to convert MS WORD docs to HTML, I found python windows
extension win32com can make this. But I can't find the method, and I
can't find any document helpful.
You have broadly two approaches here, both
involving automating Word (ie using the
COM object model it exposes, referred to
in another post in this thread).

1) Use the COM model to have Word load your
doc, and SaveAs it in HTML format. Advantage:
it's relatively straightforward. Disadvantage:
you're at the mercy of whatever HTML Word emits.

2) Use the COM model to iterate over the paragraphs
in your document, emitting your own HTML. Advantage:
you get control. Disadvantage: the more complex your
doc, the more work you have to do. (What do you do with
images, for example? Internal links?)

To do the first, just record a macro in Word to
do what you want and then reproduce the macro
in Python. Something like this:

<code>
import win32com.client

doc = win32com.client.GetObject ("c:/data/temp/songs.doc")
doc.SaveAs (FileName="c:/data/temp/songs.html", FileFormat=8)
doc.Close ()

</code>

To do the second, you have to roll your own html
doc. Crudely, this would do it:

<code>
import codecs
import win32com.client
doc = win32com.client.GetObject ("c:/data/temp/songs.doc")
with codecs.open ("c:/data/temp/s2.html", "w", encoding="utf8") as f:
f.write ("<html><body>")
for para in doc.Paragraphs:
text = para.Range.Text
style = para.Style.NameLocal
f.write ('<p class="%(style)s">%(text)s</p>\n' % locals ())

doc.Close ()

</code>

TJG
Aug 19 '08 #1
0 2146

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: RJ | last post by:
Hi, I've been going over the Quick Start to Client side COM and Python and many other sources, but cannot find an example that will get my com/ActiveX .ocx USB device driver imported. The Excel...
1
by: Patrick | last post by:
I am investigating a web-based app wherein I wanted to provide html form frontends to an Excel spreadsheet sitting on a MS Windows server; with each authenticated HTTP user having thier own...
0
by: goermezer | last post by:
Hello, I have some problems to automate a CAD (computer aided design) Software called CATIA V5 from Dassault Systemes. CATIA V5 has a builtin VB-Editor like Word, Excel, … and registers itself...
2
by: Sibylle Koczian | last post by:
Hello, I've installed Python 2.4 and the win32 extensions, using administrator rights, under Windows XP in "C:\Programme". As this is a directory without spaces I didn't expect any problems. But...
5
by: kbperry | last post by:
On my machine, this runs fine, but when I try to run it on someone elses machine it blows up with an attribute error: <code> import win32com.client, pythoncom ...
1
by: SPJ | last post by:
Sorry, forgot to mention Subject in my earlier post, hence reposting. ------------ I am writing a script which need's to convert an excel file to csv (text) format. For that I am using the...
0
by: Lave | last post by:
Hi, all ! I'm a totally newbie huh:) I want to convert MS WORD docs to HTML, I found python windows extension win32com can make this. But I can't find the method, and I can't find any...
0
by: Reedick, Andrew | last post by:
Word Object Model: http://msdn.microsoft.com/en-us/library/bb244515.aspx Specifically look at Document's SaveAs method. ***** The information transmitted is intended only for the person...
0
by: Simon Brunning | last post by:
2008/8/19 Lave <lave.wang.w@gmail.com>: This should be a useful starting point: <http://code.activestate.com/recipes/279003/>. -- Cheers, Simon B. simon@brunningonline.net...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.