Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old September 4th, 2008, 08:55 PM
patrick.waldo@gmail.com
Guest
 
Posts: n/a
Default Converting .doc to .txt in Linux

Hi Everyone,

I had previously asked a similar question,
http://groups.google.com/group/comp....c901da63d8d059

but at that point I was using Windows and now I am using Linux.
Basically, I have some .doc files that I need to convert into txt
files encoded in utf-8. However, win32com.client doesn't work in
Linux.

It's been giving me quite a headache all day. Any ideas would be
greatly appreciated.

Best,
Patrick

#Windows Code:
import glob,os,codecs,shutil,win32com.client
from win32com.client import Dispatch

input = '/home/pwaldo2/work/workbench/current_documents/*.doc'
input_dir = '/home/pwaldo2/work/workbench/current_documents/'
outpath = '/home/pwaldo2/work/workbench/current_documents/TXT/'

for doc in glob.glob1(input):
WordApp = Dispatch("Word.Application")
WordApp.Visible = 1
WordApp.Documents.Open(doc)
WordApp.ActiveDocument.SaveAs(doc,7)
WordApp.ActiveDocument.Close()
WordApp.Quit()

for doc in glob.glob(input):
txt_split = os.path.splitext(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc_path = os.path.join(outpath,txt_doc)
doc_path = os.path.join(input_dir,doc)
shutil.copy(doc_path,txt_doc_path)
  #2  
Old September 4th, 2008, 09:25 PM
Chris Rebert
Guest
 
Posts: n/a
Default Re: Converting .doc to .txt in Linux

I'd recommend using one of the Word->txt converters for Linux and just
running it in a shell script:
* http://wvware.sourceforge.net/
* http://www.winfield.demon.nl/

No compelling reason to use Python in this instance. Right tool for
the right job and all that.

- Chris

On Thu, Sep 4, 2008 at 12:54 PM, <patrick.waldo@gmail.comwrote:
Quote:
Hi Everyone,
>
I had previously asked a similar question,
http://groups.google.com/group/comp....c901da63d8d059
>
but at that point I was using Windows and now I am using Linux.
Basically, I have some .doc files that I need to convert into txt
files encoded in utf-8. However, win32com.client doesn't work in
Linux.
>
It's been giving me quite a headache all day. Any ideas would be
greatly appreciated.
>
Best,
Patrick
>
#Windows Code:
import glob,os,codecs,shutil,win32com.client
from win32com.client import Dispatch
>
input = '/home/pwaldo2/work/workbench/current_documents/*.doc'
input_dir = '/home/pwaldo2/work/workbench/current_documents/'
outpath = '/home/pwaldo2/work/workbench/current_documents/TXT/'
>
for doc in glob.glob1(input):
WordApp = Dispatch("Word.Application")
WordApp.Visible = 1
WordApp.Documents.Open(doc)
WordApp.ActiveDocument.SaveAs(doc,7)
WordApp.ActiveDocument.Close()
WordApp.Quit()
>
for doc in glob.glob(input):
txt_split = os.path.splitext(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc_path = os.path.join(outpath,txt_doc)
doc_path = os.path.join(input_dir,doc)
shutil.copy(doc_path,txt_doc_path)
--
http://mail.python.org/mailman/listinfo/python-list
>


--
Follow the path of the Iguana...
http://rebertia.com
  #3  
Old September 4th, 2008, 09:25 PM
Tommy Nordgren
Guest
 
Posts: n/a
Default Re: Converting .doc to .txt in Linux


On Sep 4, 2008, at 9:54 PM, patrick.waldo@gmail.com wrote:
Quote:
Hi Everyone,
>
I had previously asked a similar question,
http://groups.google.com/group/comp....c901da63d8d059
>
but at that point I was using Windows and now I am using Linux.
Basically, I have some .doc files that I need to convert into txt
files encoded in utf-8. However, win32com.client doesn't work in
Linux.
>
It's been giving me quite a headache all day. Any ideas would be
greatly appreciated.
>
Best,
Patrick
>
#Windows Code:
import glob,os,codecs,shutil,win32com.client
from win32com.client import Dispatch
>
input = '/home/pwaldo2/work/workbench/current_documents/*.doc'
input_dir = '/home/pwaldo2/work/workbench/current_documents/'
outpath = '/home/pwaldo2/work/workbench/current_documents/TXT/'
>
for doc in glob.glob1(input):
WordApp = Dispatch("Word.Application")
WordApp.Visible = 1
WordApp.Documents.Open(doc)
WordApp.ActiveDocument.SaveAs(doc,7)
WordApp.ActiveDocument.Close()
WordApp.Quit()
>
for doc in glob.glob(input):
txt_split = os.path.splitext(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc_path = os.path.join(outpath,txt_doc)
doc_path = os.path.join(input_dir,doc)
shutil.copy(doc_path,txt_doc_path)
--
http://mail.python.org/mailman/listinfo/python-list
You can do it manually with Open Office. <http://www.openoffice.org/>
A free office suite.
-------------------------------------
This sig is dedicated to the advancement of Nuclear Power
Tommy Nordgren
tommy.nordgren@comhem.se




  #4  
Old September 5th, 2008, 05:35 AM
Carl Banks
Guest
 
Posts: n/a
Default Re: Converting .doc to .txt in Linux

On Sep 4, 4:18 pm, Tommy Nordgren <tommy.nordg...@comhem.sewrote:
Quote:
On Sep 4, 2008, at 9:54 PM, patrick.wa...@gmail.com wrote:
>
>
>
Quote:
Hi Everyone,
>
Quote:
I had previously asked a similar question,
http://groups.google.com/group/comp....thread/thread/...
>
Quote:
but at that point I was using Windows and now I am using Linux.
Basically, I have some .doc files that I need to convert into txt
files encoded in utf-8. However, win32com.client doesn't work in
Linux.
>
Quote:
It's been giving me quite a headache all day. Any ideas would be
greatly appreciated.
>
Quote:
Best,
Patrick
>
Quote:
#Windows Code:
import glob,os,codecs,shutil,win32com.client
from win32com.client import Dispatch
>
Quote:
input = '/home/pwaldo2/work/workbench/current_documents/*.doc'
input_dir = '/home/pwaldo2/work/workbench/current_documents/'
outpath = '/home/pwaldo2/work/workbench/current_documents/TXT/'
>
Quote:
for doc in glob.glob1(input):
WordApp = Dispatch("Word.Application")
WordApp.Visible = 1
WordApp.Documents.Open(doc)
WordApp.ActiveDocument.SaveAs(doc,7)
WordApp.ActiveDocument.Close()
WordApp.Quit()
>
Quote:
for doc in glob.glob(input):
txt_split = os.path.splitext(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc_path = os.path.join(outpath,txt_doc)
doc_path = os.path.join(input_dir,doc)
shutil.copy(doc_path,txt_doc_path)
--
http://mail.python.org/mailman/listinfo/python-list
>
You can do it manually with Open Office. <http://www.openoffice.org/>
A free office suite.
On Debian there is a package called "unoconv"--written in Python--that
can do the conversions from the command line. It requires a running
instance of Open Office. However, the doc-to-txt conversion of Open
Office isn't that good. (It wasn't as good as Word's formatted text
converter, last time I used it.)


Carl Banks
 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles