469,909 Members | 1,800 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,909 developers. It's quick & easy.

Opening MS Word files via Python

Here comes another small question from me :-)

I am curious as to how I should approach this issue. I would just
want to parse simple text and maybe perhaps tables in the future.
Would I have to save the word file and open it in a text editor? That
would kind of....suck... Has anyone else tackled this issue?

Thanks,
Jul 18 '05 #1
5 2717
Fazer wrote:
I am curious as to how I should approach this issue. I would just
want to parse simple text and maybe perhaps tables in the future.
Would I have to save the word file and open it in a text editor? That
would kind of....suck... Has anyone else tackled this issue?


The win32 extensions for python allow you to get at the COM objects for
applications like Word, and that would let you get the text and tables.
google: win32 python.

word = win32com.client.Dispatch('Word.Application')
word.Documents.Open('C:\\myfile.doc')

But I don't know the best way to find out the methods and properties of
the "word" object.

Rob

Jul 18 '05 #2
fa****@jaredweb.com (Fazer) wrote in message
I am curious as to how I should approach this issue. I would just
want to parse simple text and maybe perhaps tables in the future.
Would I have to save the word file and open it in a text editor? That
would kind of....suck... Has anyone else tackled this issue?


See http://aspn.activestate.com/ASPN/Coo.../Recipe/279003

Cheers,
Simon B.
Jul 18 '05 #3
Rob Nikander <rn*************@adelphia.net> wrote in message news:<i7********************@adelphia.com>...
Fazer wrote:
I am curious as to how I should approach this issue. I would just
want to parse simple text and maybe perhaps tables in the future.
Would I have to save the word file and open it in a text editor? That
would kind of....suck... Has anyone else tackled this issue?


The win32 extensions for python allow you to get at the COM objects for
applications like Word, and that would let you get the text and tables.
google: win32 python.

word = win32com.client.Dispatch('Word.Application')
word.Documents.Open('C:\\myfile.doc')

But I don't know the best way to find out the methods and properties of
the "word" object.

Rob


You can use VBA documentation for Word, and using dot notation and
normal Pythonesque way of calling functions, play with its diverses
objects, methods and attributes...
Here's some pretty straightforward code along these lines:
#************************
import win32com.client
import tkFileDialog

# Launch Word
MSWord = win32com.client.Dispatch("Word.Application")
MSWord.Visible = 0
# Open a specific file
myWordDoc = tkFileDialog.askopenfilename()
MSWord.Documents.Open(myWordDoc)
#Get the textual content
docText = MSWord.Documents[0].Content
# Get a list of tables
listTables= MSWord.Documents[0].Tables
#************************

Happy parsing,

Jean-Marc
Jul 18 '05 #4
jm*********@cvm.qc.ca (jmdeschamps) wrote in message news:<3d**************************@posting.google. com>...
Rob Nikander <rn*************@adelphia.net> wrote in message news:<i7********************@adelphia.com>...
Fazer wrote:
I am curious as to how I should approach this issue. I would just
want to parse simple text and maybe perhaps tables in the future.
Would I have to save the word file and open it in a text editor? That
would kind of....suck... Has anyone else tackled this issue?


The win32 extensions for python allow you to get at the COM objects for
applications like Word, and that would let you get the text and tables.
google: win32 python.

word = win32com.client.Dispatch('Word.Application')
word.Documents.Open('C:\\myfile.doc')

But I don't know the best way to find out the methods and properties of
the "word" object.

Rob


You can use VBA documentation for Word, and using dot notation and
normal Pythonesque way of calling functions, play with its diverses
objects, methods and attributes...
Here's some pretty straightforward code along these lines:
#************************
import win32com.client
import tkFileDialog

# Launch Word
MSWord = win32com.client.Dispatch("Word.Application")
MSWord.Visible = 0
# Open a specific file
myWordDoc = tkFileDialog.askopenfilename()
MSWord.Documents.Open(myWordDoc)
#Get the textual content
docText = MSWord.Documents[0].Content
# Get a list of tables
listTables= MSWord.Documents[0].Tables
#************************

Happy parsing,

Jean-Marc

That is Awesome! Thanks!

How would I save something in word format? I am guessing
MSWord.Docments.Save(myWordDoc) or around those lines? where can I
find more documentatin? Thanks.
Jul 18 '05 #5
Fazer wrote...
jm*********@cvm.qc.ca (jmdeschamps) wrote in message news:<3d**************************@posting.google. com>...
Rob Nikander <rn*************@adelphia.net> wrote in message news:<i7********************@adelphia.com>... <snip>

But I don't know the best way to find out the methods and properties of
the "word" object.
<snip>
How would I save something in word format? I am guessing
MSWord.Docments.Save(myWordDoc) or around those lines? where can I
find more documentatin? Thanks.


Open MS Word and press (ALT + F11), then F2

Jul 18 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Daniel Cloutier | last post: by
3 posts views Thread by news.west.cox.net | last post: by
4 posts views Thread by Thomas Scheiderich | last post: by
8 posts views Thread by gazza67 | last post: by
5 posts views Thread by muwie | last post: by
1 post views Thread by Waqarahmed | last post: by
reply views Thread by Salome Sato | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.