473,499 Members | 1,618 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Converting .doc to .txt in Linux

Hi Everyone,

I had previously asked a similar question,
http://groups.google.com/group/comp....c901da63d8d059

but at that point I was using Windows and now I am using Linux.
Basically, I have some .doc files that I need to convert into txt
files encoded in utf-8. However, win32com.client doesn't work in
Linux.

It's been giving me quite a headache all day. Any ideas would be
greatly appreciated.

Best,
Patrick

#Windows Code:
import glob,os,codecs,shutil,win32com.client
from win32com.client import Dispatch

input = '/home/pwaldo2/work/workbench/current_documents/*.doc'
input_dir = '/home/pwaldo2/work/workbench/current_documents/'
outpath = '/home/pwaldo2/work/workbench/current_documents/TXT/'

for doc in glob.glob1(input):
WordApp = Dispatch("Word.Application")
WordApp.Visible = 1
WordApp.Documents.Open(doc)
WordApp.ActiveDocument.SaveAs(doc,7)
WordApp.ActiveDocument.Close()
WordApp.Quit()

for doc in glob.glob(input):
txt_split = os.path.splitext(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc_path = os.path.join(outpath,txt_doc)
doc_path = os.path.join(input_dir,doc)
shutil.copy(doc_path,txt_doc_path)
Sep 4 '08 #1
3 4763
I'd recommend using one of the Word->txt converters for Linux and just
running it in a shell script:
* http://wvware.sourceforge.net/
* http://www.winfield.demon.nl/

No compelling reason to use Python in this instance. Right tool for
the right job and all that.

- Chris

On Thu, Sep 4, 2008 at 12:54 PM, <pa***********@gmail.comwrote:
Hi Everyone,

I had previously asked a similar question,
http://groups.google.com/group/comp....c901da63d8d059

but at that point I was using Windows and now I am using Linux.
Basically, I have some .doc files that I need to convert into txt
files encoded in utf-8. However, win32com.client doesn't work in
Linux.

It's been giving me quite a headache all day. Any ideas would be
greatly appreciated.

Best,
Patrick

#Windows Code:
import glob,os,codecs,shutil,win32com.client
from win32com.client import Dispatch

input = '/home/pwaldo2/work/workbench/current_documents/*.doc'
input_dir = '/home/pwaldo2/work/workbench/current_documents/'
outpath = '/home/pwaldo2/work/workbench/current_documents/TXT/'

for doc in glob.glob1(input):
WordApp = Dispatch("Word.Application")
WordApp.Visible = 1
WordApp.Documents.Open(doc)
WordApp.ActiveDocument.SaveAs(doc,7)
WordApp.ActiveDocument.Close()
WordApp.Quit()

for doc in glob.glob(input):
txt_split = os.path.splitext(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc_path = os.path.join(outpath,txt_doc)
doc_path = os.path.join(input_dir,doc)
shutil.copy(doc_path,txt_doc_path)
--
http://mail.python.org/mailman/listinfo/python-list


--
Follow the path of the Iguana...
http://rebertia.com
Sep 4 '08 #2

On Sep 4, 2008, at 9:54 PM, pa***********@gmail.com wrote:
Hi Everyone,

I had previously asked a similar question,
http://groups.google.com/group/comp....c901da63d8d059

but at that point I was using Windows and now I am using Linux.
Basically, I have some .doc files that I need to convert into txt
files encoded in utf-8. However, win32com.client doesn't work in
Linux.

It's been giving me quite a headache all day. Any ideas would be
greatly appreciated.

Best,
Patrick

#Windows Code:
import glob,os,codecs,shutil,win32com.client
from win32com.client import Dispatch

input = '/home/pwaldo2/work/workbench/current_documents/*.doc'
input_dir = '/home/pwaldo2/work/workbench/current_documents/'
outpath = '/home/pwaldo2/work/workbench/current_documents/TXT/'

for doc in glob.glob1(input):
WordApp = Dispatch("Word.Application")
WordApp.Visible = 1
WordApp.Documents.Open(doc)
WordApp.ActiveDocument.SaveAs(doc,7)
WordApp.ActiveDocument.Close()
WordApp.Quit()

for doc in glob.glob(input):
txt_split = os.path.splitext(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc_path = os.path.join(outpath,txt_doc)
doc_path = os.path.join(input_dir,doc)
shutil.copy(doc_path,txt_doc_path)
--
http://mail.python.org/mailman/listinfo/python-list
You can do it manually with Open Office. <http://www.openoffice.org/>
A free office suite.
-------------------------------------
This sig is dedicated to the advancement of Nuclear Power
Tommy Nordgren
to************@comhem.se


Sep 4 '08 #3
On Sep 4, 4:18 pm, Tommy Nordgren <tommy.nordg...@comhem.sewrote:
On Sep 4, 2008, at 9:54 PM, patrick.wa...@gmail.com wrote:
Hi Everyone,
I had previously asked a similar question,
http://groups.google.com/group/comp....thread/thread/...
but at that point I was using Windows and now I am using Linux.
Basically, I have some .doc files that I need to convert into txt
files encoded in utf-8. However, win32com.client doesn't work in
Linux.
It's been giving me quite a headache all day. Any ideas would be
greatly appreciated.
Best,
Patrick
#Windows Code:
import glob,os,codecs,shutil,win32com.client
from win32com.client import Dispatch
input = '/home/pwaldo2/work/workbench/current_documents/*.doc'
input_dir = '/home/pwaldo2/work/workbench/current_documents/'
outpath = '/home/pwaldo2/work/workbench/current_documents/TXT/'
for doc in glob.glob1(input):
WordApp = Dispatch("Word.Application")
WordApp.Visible = 1
WordApp.Documents.Open(doc)
WordApp.ActiveDocument.SaveAs(doc,7)
WordApp.ActiveDocument.Close()
WordApp.Quit()
for doc in glob.glob(input):
txt_split = os.path.splitext(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc_path = os.path.join(outpath,txt_doc)
doc_path = os.path.join(input_dir,doc)
shutil.copy(doc_path,txt_doc_path)
--
http://mail.python.org/mailman/listinfo/python-list

You can do it manually with Open Office. <http://www.openoffice.org/>
A free office suite.
On Debian there is a package called "unoconv"--written in Python--that
can do the conversions from the command line. It requires a running
instance of Open Office. However, the doc-to-txt conversion of Open
Office isn't that good. (It wasn't as good as Word's formatted text
converter, last time I used it.)
Carl Banks
Sep 5 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
10272
by: Hal Vaughan | last post by:
If I have a byte and I convert it to string (String sData = new String(byte bData), then convert it back (byte bData = sData.getBytes()), will all data be intact, or do Strings have problems with...
10
8481
by: Maxim Kasimov | last post by:
there are a few questions i can find answer in manual: 1. how to define which is internal encoding of python unicode strings (UTF-8, UTF-16 ...) 2. how to convert string to UCS-2 (Python 2.2.3...
1
2178
by: H. Kaya | last post by:
Hallo, I have a problem converting a XML file to a other. I have no idea how I can do this. I try it for a long time but I can not find a solution. Has anyone a Idea? Below you can find my...
2
1530
by: Harlin | last post by:
Is it possible to convert Windows DB2 databases to Linux DB2? Is it extremely difficult if it's possible? I have been shown a way to get it done but it's very ineffecient. This involves...
2
5466
by: responsible | last post by:
Hi, I am trying to convert a small piece of source code that was initially written for Linux to build under Windows. My main problem though is in two lines... 1. signal(SIGALRM,...
0
1911
by: clinnebur | last post by:
We have an ASP.NET web application (C#) that copies videos from a CCTV truck to a Linux server. What I am trying to do is convert the .AVI videos(which is how they are created on the truck) to .WMV...
5
16939
by: yakir22 | last post by:
Hello experts, I am dealing now in porting our server from windows to linux. our client is running only on windows machine. to avoid the wchar_t size problem ( in windows its 2 bytes and linux is...
0
946
by: Cameron Simpson | last post by:
On 04Sep2008 12:54, patrick.waldo@gmail.com <patrick.waldo@gmail.comwrote: | I had previously asked a similar question, |...
5
6721
by: Matias Surdi | last post by:
aditya shukla escribió: Maybe py2exe can help you.
0
7178
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7223
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
6899
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
5475
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
4602
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3094
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1427
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
665
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
302
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.