473,626 Members | 3,083 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Converting .doc to .txt in Linux

Hi Everyone,

I had previously asked a similar question,
http://groups.google.com/group/comp....c901da63d8d059

but at that point I was using Windows and now I am using Linux.
Basically, I have some .doc files that I need to convert into txt
files encoded in utf-8. However, win32com.client doesn't work in
Linux.

It's been giving me quite a headache all day. Any ideas would be
greatly appreciated.

Best,
Patrick

#Windows Code:
import glob,os,codecs, shutil,win32com .client
from win32com.client import Dispatch

input = '/home/pwaldo2/work/workbench/current_documen ts/*.doc'
input_dir = '/home/pwaldo2/work/workbench/current_documen ts/'
outpath = '/home/pwaldo2/work/workbench/current_documen ts/TXT/'

for doc in glob.glob1(inpu t):
WordApp = Dispatch("Word. Application")
WordApp.Visible = 1
WordApp.Documen ts.Open(doc)
WordApp.ActiveD ocument.SaveAs( doc,7)
WordApp.ActiveD ocument.Close()
WordApp.Quit()

for doc in glob.glob(input ):
txt_split = os.path.splitex t(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc_path = os.path.join(ou tpath,txt_doc)
doc_path = os.path.join(in put_dir,doc)
shutil.copy(doc _path,txt_doc_p ath)
Sep 4 '08 #1
3 4772
I'd recommend using one of the Word->txt converters for Linux and just
running it in a shell script:
* http://wvware.sourceforge.net/
* http://www.winfield.demon.nl/

No compelling reason to use Python in this instance. Right tool for
the right job and all that.

- Chris

On Thu, Sep 4, 2008 at 12:54 PM, <pa***********@ gmail.comwrote:
Hi Everyone,

I had previously asked a similar question,
http://groups.google.com/group/comp....c901da63d8d059

but at that point I was using Windows and now I am using Linux.
Basically, I have some .doc files that I need to convert into txt
files encoded in utf-8. However, win32com.client doesn't work in
Linux.

It's been giving me quite a headache all day. Any ideas would be
greatly appreciated.

Best,
Patrick

#Windows Code:
import glob,os,codecs, shutil,win32com .client
from win32com.client import Dispatch

input = '/home/pwaldo2/work/workbench/current_documen ts/*.doc'
input_dir = '/home/pwaldo2/work/workbench/current_documen ts/'
outpath = '/home/pwaldo2/work/workbench/current_documen ts/TXT/'

for doc in glob.glob1(inpu t):
WordApp = Dispatch("Word. Application")
WordApp.Visible = 1
WordApp.Documen ts.Open(doc)
WordApp.ActiveD ocument.SaveAs( doc,7)
WordApp.ActiveD ocument.Close()
WordApp.Quit()

for doc in glob.glob(input ):
txt_split = os.path.splitex t(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc_path = os.path.join(ou tpath,txt_doc)
doc_path = os.path.join(in put_dir,doc)
shutil.copy(doc _path,txt_doc_p ath)
--
http://mail.python.org/mailman/listinfo/python-list


--
Follow the path of the Iguana...
http://rebertia.com
Sep 4 '08 #2

On Sep 4, 2008, at 9:54 PM, pa***********@g mail.com wrote:
Hi Everyone,

I had previously asked a similar question,
http://groups.google.com/group/comp....c901da63d8d059

but at that point I was using Windows and now I am using Linux.
Basically, I have some .doc files that I need to convert into txt
files encoded in utf-8. However, win32com.client doesn't work in
Linux.

It's been giving me quite a headache all day. Any ideas would be
greatly appreciated.

Best,
Patrick

#Windows Code:
import glob,os,codecs, shutil,win32com .client
from win32com.client import Dispatch

input = '/home/pwaldo2/work/workbench/current_documen ts/*.doc'
input_dir = '/home/pwaldo2/work/workbench/current_documen ts/'
outpath = '/home/pwaldo2/work/workbench/current_documen ts/TXT/'

for doc in glob.glob1(inpu t):
WordApp = Dispatch("Word. Application")
WordApp.Visible = 1
WordApp.Documen ts.Open(doc)
WordApp.ActiveD ocument.SaveAs( doc,7)
WordApp.ActiveD ocument.Close()
WordApp.Quit()

for doc in glob.glob(input ):
txt_split = os.path.splitex t(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc_path = os.path.join(ou tpath,txt_doc)
doc_path = os.path.join(in put_dir,doc)
shutil.copy(doc _path,txt_doc_p ath)
--
http://mail.python.org/mailman/listinfo/python-list
You can do it manually with Open Office. <http://www.openoffice. org/>
A free office suite.
-------------------------------------
This sig is dedicated to the advancement of Nuclear Power
Tommy Nordgren
to************@ comhem.se


Sep 4 '08 #3
On Sep 4, 4:18 pm, Tommy Nordgren <tommy.nordg... @comhem.sewrote :
On Sep 4, 2008, at 9:54 PM, patrick.wa...@g mail.com wrote:
Hi Everyone,
I had previously asked a similar question,
http://groups.google.com/group/comp....thread/thread/...
but at that point I was using Windows and now I am using Linux.
Basically, I have some .doc files that I need to convert into txt
files encoded in utf-8. However, win32com.client doesn't work in
Linux.
It's been giving me quite a headache all day. Any ideas would be
greatly appreciated.
Best,
Patrick
#Windows Code:
import glob,os,codecs, shutil,win32com .client
from win32com.client import Dispatch
input = '/home/pwaldo2/work/workbench/current_documen ts/*.doc'
input_dir = '/home/pwaldo2/work/workbench/current_documen ts/'
outpath = '/home/pwaldo2/work/workbench/current_documen ts/TXT/'
for doc in glob.glob1(inpu t):
WordApp = Dispatch("Word. Application")
WordApp.Visible = 1
WordApp.Documen ts.Open(doc)
WordApp.ActiveD ocument.SaveAs( doc,7)
WordApp.ActiveD ocument.Close()
WordApp.Quit()
for doc in glob.glob(input ):
txt_split = os.path.splitex t(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc_path = os.path.join(ou tpath,txt_doc)
doc_path = os.path.join(in put_dir,doc)
shutil.copy(doc _path,txt_doc_p ath)
--
http://mail.python.org/mailman/listinfo/python-list

You can do it manually with Open Office. <http://www.openoffice. org/>
A free office suite.
On Debian there is a package called "unoconv"--written in Python--that
can do the conversions from the command line. It requires a running
instance of Open Office. However, the doc-to-txt conversion of Open
Office isn't that good. (It wasn't as good as Word's formatted text
converter, last time I used it.)
Carl Banks
Sep 5 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
10298
by: Hal Vaughan | last post by:
If I have a byte and I convert it to string (String sData = new String(byte bData), then convert it back (byte bData = sData.getBytes()), will all data be intact, or do Strings have problems with bytes that are not printable characters? I've tested this and it seems to work fine, but I want to make sure there isn't some condition or situation I'm not aware of that could cause problems. I'm doing this because it's easier to do some of my...
10
8514
by: Maxim Kasimov | last post by:
there are a few questions i can find answer in manual: 1. how to define which is internal encoding of python unicode strings (UTF-8, UTF-16 ...) 2. how to convert string to UCS-2 (Python 2.2.3 on freebsd4) -- Best regards, Maxim
1
2192
by: H. Kaya | last post by:
Hallo, I have a problem converting a XML file to a other. I have no idea how I can do this. I try it for a long time but I can not find a solution. Has anyone a Idea? Below you can find my Input XML Document, Output XML Document and my scratch XSLT file. At the end is my request Output XML. Greetings H. Kaya
2
1542
by: Harlin | last post by:
Is it possible to convert Windows DB2 databases to Linux DB2? Is it extremely difficult if it's possible? I have been shown a way to get it done but it's very ineffecient. This involves exporting data to text files and running a few db2look SQL jobs. Then if all goes well, it involves invoking db2move. My difficulty has been that when setting up the databases structurally, they don't always want to work as they would have if were on...
2
5474
by: responsible | last post by:
Hi, I am trying to convert a small piece of source code that was initially written for Linux to build under Windows. My main problem though is in two lines... 1. signal(SIGALRM, timeoutHandler); // set alarm signal handler 2. alarm (timeout); I am not sure what the best method is to handle those? Are there any constructs in Windows that will make the translation as painless as
0
1921
by: clinnebur | last post by:
We have an ASP.NET web application (C#) that copies videos from a CCTV truck to a Linux server. What I am trying to do is convert the .AVI videos(which is how they are created on the truck) to .WMV in my C# code using Windows Media Encoder. I have a virtual directory to the truck location of the videos. I also have a virtual directory created to the Linux box. The application resides on a Windows Server 2003 and I am using VS 2005, .NET...
5
17007
by: yakir22 | last post by:
Hello experts, I am dealing now in porting our server from windows to linux. our client is running only on windows machine. to avoid the wchar_t size problem ( in windows its 2 bytes and linux is 4 bytes ) we defined #ifdef WIN32 #define t_wchar_t wchar_t #else // LINUX #define t_wchar_t short
0
964
by: Cameron Simpson | last post by:
On 04Sep2008 12:54, patrick.waldo@gmail.com <patrick.waldo@gmail.comwrote: | I had previously asked a similar question, | http://groups.google.com/group/comp.lang.python/browse_thread/thread/2953d6d5d8836c4b/9dc901da63d8d059?lnk=gst&q=convert+doc+txt#9dc901da63d8d059 | | but at that point I was using Windows and now I am using Linux. | Basically, I have some .doc files that I need to convert into txt | files encoded in utf-8. However,...
5
6735
by: Matias Surdi | last post by:
aditya shukla escribió: Maybe py2exe can help you.
0
8259
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8192
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8696
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8637
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8358
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
5571
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4195
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2621
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1504
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.