"MrBill" <no****@nospam.com> writes:
I would like to be able to open, read, and extract data from a report that
is produced in MS Word. The doc seems to contain embedded spreadsheets. I
would like to extract some of the data from the spreadsheets and feed it
into another application. I've been reading a little bit about OLE and MS
Word and sure would like to find a module that hides some of this so-called
innovation from me.
:-) Yeah, isn't all that baroque complexity wonderful?
1. Alex Martelli's suggestion on this list: use RTF. Word can import
and export to it. You can automate that from VB or Python in the
usual COM ways (see 3.). I don't know whether you'll get useful
RTF out of embedded Excel sheets, though.
2. Use OpenOffice via PyUNO.
3. As you already know, use the MS Office object models, with Python
for Windows extensions (or ctypes, if you're brave). Perhaps ADO
is what you're looking for? IIRC, ADO isn't too complicated and
can treat Excel sheets as data sources just as it does for
relational databases.
For simpler Word docs (no embedded stuff), there are other tools out
there, but they'd be no use in this case.
A useful tip for 3. is to record a VB macro in Word, then edit it to
something sane. You can keep it in VB, or do the relatively trivial
edits required to convert it to Python. Here's an example on
automating RTF generation:
http://www.google.com/groups?q=autho...box.com&rnum=1
John