473,287 Members | 1,800 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,287 software developers and data experts.

Convert Word .doc to Acrobat .pdf files

Hi all,

Background:
I need some help. I am trying to streamline a process for one of our
technical writers. He is using Perforce (version control system), and
is constantly changing his word documents, and then converts them to
both .pdf and "Web page" format (to publish to the web). He has a
licensed copy of Adobe Acrobat Professional (7.x).

Questions:
Does Acrobat Pro, have some way to interface with it command-line (I
tried searching, but couldn't find anything)? Is there any other good
way to script word to pdf conversion?

Note: The word documents do contain images, and lots of stuff besides
just text.

Mar 23 '06 #1
13 11570
I wrote a script which uses OpenOffice. It can
convert and read a lot of formats.

#!/usr/bin/env python
#Old: !/optlocal/OpenOffice.org/program/python
# (c) 2003-2006 Thomas Guettler http://www.tbz-pariv.de/

# OpenOffice1.1 comes with its own python interpreter.
# This Script needs to be run with the python from OpenOffice1:
# /opt/OpenOffice.org/program/python
# Start the Office before connecting:
# soffice "-accept=socket,host=localhost,port=2002;urp;"
#
# With OpenOffice2 you can use the default Python-Interpreter (at least on SuSE)
#

# Python Imports
import os
import re
import sys
import getopt

default_path="/usr/lib/ooo-2.0/program"
sys.path.insert(0, default_path)

# pyUNO Imports
try:
import uno
from com.sun.star.beans import PropertyValue
except:
print "This Script needs to be run with the python from OpenOffice.org"
print "Example: /opt/OpenOffice.org/program/python %s" % (
os.path.basename(sys.argv[0]))
print "Or you need to insert the right path at the top, where uno.py is."
print "Default: %s" % default_path

raise
sys.exit(1)

extension=None
format=None

def usage():
scriptname=os.path.basename(sys.argv[0])
print """Usage: %s [--extension pdf --format writer_pdf_Export] files
All files or directories will be converted to HTML.

You must start the office with this line before starting
this script:
soffice "-accept=socket,host=localhost,port=2002;urp;"

If you want to export to something else, you need to use give the extension *and*
the format.

For a list of possible export formats see
http://framework.openoffice.org/file...scription.html

or

/opt/OpenOffice.org/share/registry/data/org/openoffice/Office/TypeDetection.xcu

or

grep -ri MYEXTENSION /usr/lib/ooo-2.0/share/registry/modules/org/openoffice/TypeDetection/
the format is <node oor:name="FORMAT" ...

Attention: Writer (.xls) needs an other export format than Writer (.doc)
Example: calc_pdf_Export instead of writer_pdf_Export
""" % (scriptname)

def do_dir(dir, desktop):
# Load File
dir=os.path.abspath(dir)
if os.path.isfile(dir):
files=[dir]
else:
files=os.listdir(dir)
files.sort()
for file in files:
if file.startswith("."):
continue
file=os.path.join(dir, file)
if os.path.isdir(file):
do_dir(file, desktop)
else:
do_file(file, desktop)

def do_file(file, desktop):
file_l=file.lower()

global format
if extension=="html":
if file_l.endswith(".xls"):
format="HTML (StarCalc)"
elif file_l.endswith(".doc"):
format="HTML (StarWriter)"
else:
print "%s: unkown extension" % file
return

assert(format)
assert(extension)

file_save="%s.%s" % (file, extension)
properties=[]
p=PropertyValue()
p.Name="Hidden"
p.Value=True
properties.append(p)
doc=desktop.loadComponentFromURL(
"file://%s" % file, "_blank", 0, tuple(properties));
if not doc:
print "Failed to open '%s'" % file
return
# Save File
properties=[]
p=PropertyValue()
p.Name="Overwrite"
p.Value=True
properties.append(p)
p=PropertyValue()
p.Name="FilterName"
p.Value=format
properties.append(p)
p=PropertyValue()
p.Name="Hidden"
p.Value=True
try:
doc.storeToURL(
"file://%s" % file_save, tuple(properties))
print "Created %s" % file_save
except ValueError:
import sys
import traceback
import cStringIO
(exc_type, exc_value, tb) = sys.exc_info()
error_file = cStringIO.StringIO()
traceback.print_exception(exc_type, exc_value, tb,
file=error_file)
stacktrace=error_file.getvalue()
print "Failed while writing: '%s'" % file_save
print stacktrace
doc.dispose()

def init_openoffice():
# Init: Connect to running soffice process
context = uno.getComponentContext()
resolver=context.ServiceManager.createInstanceWith Context(
"com.sun.star.bridge.UnoUrlResolver", context)
try:
ctx = resolver.resolve(
"uno:socket,host=localhost,port=2002;urp;StarOffic e.ComponentContext")
except:
print "Could not connect to running openoffice."
usage()
sys.exit(1)
smgr=ctx.ServiceManager
desktop = smgr.createInstanceWithContext("com.sun.star.frame .Desktop",ctx)
return desktop

def main():
try:
opts, args = getopt.getopt(sys.argv[1:], "", [
"extension=", "format="])
except getopt.GetoptError,e:
print e
usage()
sys.exit(1)

global extension
global format
for o, a in opts:
if o=="--extension":
extension=a
assert(not extension.startswith("."))
elif o=="--format":
format=a
else:
raise("Internal Error, undone option: %s %s" % (
o, a))
if (not extension) and (not format):
extension="html"
elif extension and format:
pass
else:
print "You need to set format and extension."
usage()
sys.exit(1)

if not args:
usage()
sys.exit(1)

desktop=init_openoffice()
for file in args:
do_dir(file, desktop)

if __name__=="__main__":
main()

Mar 23 '06 #2
kbperry wrote:
Hi all,

Background:
I need some help. I am trying to streamline a process for one of our
technical writers. He is using Perforce (version control system), and
is constantly changing his word documents, and then converts them to
both .pdf and "Web page" format (to publish to the web). He has a
licensed copy of Adobe Acrobat Professional (7.x).

Questions:
Does Acrobat Pro, have some way to interface with it command-line (I
tried searching, but couldn't find anything)? Is there any other good
way to script word to pdf conversion?


As I remember, Acrobat monitors a directory and converts anything it
finds there, so you don't need to script Acrobat at all, just script
printing the documents. However, it sounds as though you are talking
about running Acrobat on a server and his license probably doesn't
permit that.

Alternatively use OpenOffice: it will convert word documents to
pdf or html and can be scripted in Python.
Mar 23 '06 #3
Thanks for the replys!

I need to stick with Word (not my choice, but I would rather keep
everything like he has it).

Duncan,
I was just trying the printing thing. When installing Adobe Acrobat,
it installs a printer called "Adobe PDF," and I have been trying to
print to there, but the "Save" window keeps popping up. I need to
figure out a way to keep it in the background.

Mar 23 '06 #4
kbperry wrote:
Thanks for the replys!

I need to stick with Word (not my choice, but I would rather keep
everything like he has it).
That shouldn't be a problem: you can use stick with Word for editing the
documents and just use OpenOffice to do the conversion.

Duncan,
I was just trying the printing thing. When installing Adobe Acrobat,
it installs a printer called "Adobe PDF," and I have been trying to
print to there, but the "Save" window keeps popping up. I need to
figure out a way to keep it in the background.

I'm afraid its a while since I used Acrobat to generate PDF files. I think
there are configuration options to tell it to do the conversion
automatically and not prompt you, but I can't remember where.
Mar 23 '06 #5
Thanks again Duncan!

I will use the OpenOffice solution as a last resort. It isn't the
standard office suite at my corp. I would like the code to be as
portable as possible, and it would seem like a pain in the arse to have
the end user install OpenOffice just to run my script. Sure it would
just be a one time deal, but I can hear the groans already.
If you happend to come across a way to suppress the save window when
doing print option, please let me know.

Mar 23 '06 #6
## this creates a postscript file which you can then convert to PDF
## using Acrobat Distiller
##
## BTW, I spent an hour trying to get this working with
## win32com.client.Dispatch
## (the save file dialog still appeared)
## Then I remembered win32com.client.dynamic.Dispatch
##
## Can somebody please explain why this happened using
## win32com.client.Dispatch?
import win32com.client.dynamic

if __name__ == '__main__':
printer = "Adobe PDF on NE03:"
docpath = r"E:\Documents and Settings\justin\Desktop\test.doc"
pspath = r"E:\Documents and Settings\justin\Desktop\test.ps"

word = win32com.client.dynamic.Dispatch("Word.Application ")
try:
document = word.Documents.Open(docpath, 0, -1)
try:
remember = word.ActivePrinter
word.ActivePrinter = printer
try:
document.PrintOut(Background=0, OutputFileName=pspath,
PrintToFile=-1)
finally:
word.ActivePrinter = remember
finally:
document.Close(0)
del document
finally:
word.Quit(0)
del word

Mar 24 '06 #7
Justin,
While I was salivating when reading your post, it doesn't work for me,
but I am not sure why.

I keep getting an error:

Titled: Adobe PDF
"When you create a PostScript file you have to send the host fonts.
Please go to the printer properties, "Adboe PDF Settings" page and turn
OFF the option "Do not send fonts to Distiller".

Keith
www.301labs.com

Mar 28 '06 #8

kbperry wrote:
Questions:
Does Acrobat Pro, have some way to interface with it command-line (I
tried searching, but couldn't find anything)? Is there any other good
way to script word to pdf conversion?

Note: The word documents do contain images, and lots of stuff besides
just text.


The Acrobat Distiller installs (or van install) a Word VBS macro which
allows Word to Save As .PDF. It's easy to call from Python:

doc = "somefile.doc"
import win32com.client

# Create COM-object
wordapp = win32com.client.gencache.EnsureDispatch("Word.Appl ication")

wordapp.Documents.Open(doc)
wordapp.Run("'!CreatePDFAndCloseDoc") # the name of the macro for
Acrobat 6.0
wordapp.ActiveDocument.Close()
wordapp.Quit()

You'll probably wrap this in more logic, but it works.

Mar 28 '06 #9
"When you create a PostScript file you have to send the host fonts.
Please go to the printer properties, "Adboe PDF Settings" page and turn
OFF the option "Do not send fonts to Distiller".

kbperry,

sorry about that.
go to "Printers and Faxes"
go to properties for the "Adobe PDF" printer
go to the "General" tab, "Printing Preferences" button
"Adobe PDF Settings" tab
uncheck the "Do not send fonts..." box

rune,
"'!CreatePDFAndCloseDoc"


I had Adobe 6 once and recalled it had Word macros you could call.
However, when I installed Adobe 7, I could not find the macros.
Perhaps it has something to do with the naming.
Thanks for the post. Will check it out.

Mar 29 '06 #10
Wow...thanks again for the replies!

I will try both of these out at work tomorrow. (I only work 3 days a
week because of school).

Thanks,

Keith

www.301labs.com

Mar 29 '06 #11
Justin,
Your way appeared to work great, but now I just realized that the using
the printer way destroys the table of contents and bookmark links.

Rune's way would be perfect, but I don't see a macro created like that.
I tried to create one from scratch, but it didn't work.

I am now trying to see if there is a way to call PDFMWord.dll through a
command-line using rundll32.exe.

Mar 30 '06 #12
If you can find some API documentation for PDFMWord.dll, you can call
its methods with the ctypes python module.

Caleb

Mar 30 '06 #13
The question is where is the API?

Mar 31 '06 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
by: Mike | last post by:
I am trying to write a web page where a person can click on a word document and MS Word is launched instead of the document being displayed in the IE browser. I have been able to launch MS Word but...
2
by: jimfortune | last post by:
I have modified my GetAcroPath function to: Function GetAcroDir() As String Const LatestVer = 7 Dim strAcroDir(LatestVer - 2) As String Dim I As Integer strAcroDir(1) = "C:\Program...
2
by: BikeToWork | last post by:
I found code online which uses Windows API's to save Access reports to pdf format files. I'm using the full version of Adobe Acrobat Professional 7.0. The problem is that regardless of what I put...
3
by: crashonyou | last post by:
hello again..i've been searching for quite some time now already looking for a solution to printing word documents with python..same thing for internet explorer..i was experimenting around with some...
11
by: digbydog | last post by:
Hello, I have been looking at trying to find a third party product to convert a word document to a pdf is ASP DotNet. We previously did this in non web environnment using acrobat distiller but I...
4
by: PW | last post by:
Any ideas? Thanks! -paulw
2
by: progvar | last post by:
Hi! can any one help me by providing the method when i open any text file and convert into pdf format. I searched on the net and i got some code but i am not understanding this code and it also...
0
by: Dinil Karun | last post by:
hi, I am using the below code but i am getting a error saying pyUno module not found. can u please help. Regards Dinil ...
0
by: =?Utf-8?B?QWxoYW1icmEgRWlkb3MgRGVzYXJyb2xsbw==?= | last post by:
Hi all people, everybody, We have multiple versions of Acrobat Reader from 5.x to 8.x, I want to create a method in C# or VB.NET to check to see if the registry key for the versions...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: marcoviolo | last post by:
Dear all, I would like to implement on my worksheet an vlookup dynamic , that consider a change of pivot excel via win32com, from an external excel (without open it) and save the new file into a...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.