can't open word document after string replacements

Antoine De Groote

Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members). I
open both input and output file in binary mode and do the
transformation. However, I can't open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?

Oh, and is this approach pythonic anyway? (I have a strong Java background.)

Regards,
antoine
import os

members = somelist

os.chdir(somefolder)

doc = file('ttt.doc', 'rb')
docout = file('ttt1.doc', 'wb')

counter = 0

for line in doc:
while line.find('ABCDEF') -1:
try:
line = line.replace('ABCDEF', members[counter], 1)
docout.write(line)
counter += 1
except:
docout.write(line.replace('ABCDEF', '', 1))
else:
docout.write(line)

doc.close()
docout.close()

Oct 24 '06 #1

Subscribe Post Reply

2047

Daniel Dittmar

Antoine De Groote wrote:

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members). I
open both input and output file in binary mode and do the
transformation. However, I can't open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?

The Word document format probably contains some length information about
paragraphs etc. If you change a string to another one of a different
length, this length information will no longer match the data and the
document structure will be hosed.

Possible solutions:
1. Use OLE automation (in the python win32 package) to open the file in
Word and use Word search and replace. Your script could then directly
print the document, which you probably have to do anyway.

2. Export the template document to RTF. This is a text format and can be
more easily manipulated with Python.

for line in doc:

I don't think that what you get here is actually a line of you document.
Due to the binary nature of the format, it is an arbitrary chunk.

Daniel

Oct 24 '06 #2

Bruno Desthuilliers

Antoine De Groote wrote:

Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members).

Do you know that MS Word already provides this kind of features ?

I
open both input and output file in binary mode and do the
transformation. However, I can't open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?

Hand-editing a non-documented binary format may lead to undesirable
results...

Oh, and is this approach pythonic anyway?

The pythonic approach is usually to start looking for existing
solutions... In this case, using Word's builtin features and Python/COM
integration would be a better choice IMHO.

(I have a strong Java
background.)

Nobody's perfect !-)

Regards,
antoine
import os

members = somelist

os.chdir(somefolder)

doc = file('ttt.doc', 'rb')
docout = file('ttt1.doc', 'wb')

counter = 0

for line in doc:

Since you opened the file as binary, you should use file.read() instead.
Ever wondered what your 'lines' look like ?-)

while line.find('ABCDEF') -1:

..doc is a binary format. You may find such a byte sequence in it's
content in places that are *not* text content.

try:
line = line.replace('ABCDEF', members[counter], 1)
docout.write(line)

You're writing back the whole chunk on each iteration. No surprise the
resulting document is corrupted.

counter += 1

seq = list("abcd")
for indice, item in enumerate(seq):
print "%02d : %s" % (indice, item)

except:
docout.write(line.replace('ABCDEF', '', 1))
else:
docout.write(line)

doc.close()
docout.close()

--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"

Oct 24 '06 #3

Jon Clements

Antoine De Groote wrote:

Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members). I
open both input and output file in binary mode and do the
transformation. However, I can't open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?

Oh, and is this approach pythonic anyway? (I have a strong Java background.)

Regards,
antoine
import os

members = somelist

os.chdir(somefolder)

doc = file('ttt.doc', 'rb')
docout = file('ttt1.doc', 'wb')

counter = 0

for line in doc:
while line.find('ABCDEF') -1:
try:
line = line.replace('ABCDEF', members[counter], 1)
docout.write(line)
counter += 1
except:
docout.write(line.replace('ABCDEF', '', 1))
else:
docout.write(line)

doc.close()
docout.close()

Errr.... I wouldn't even attempt to do this; how do you know each
'line' isn't going to be split arbitarily, and that 'ABCDEF' doesn't
happen to be part of an image. As you've noted, this is binary data so
you can't assume anything about it. Doing it this way is a Bad Idea
(tm).

If you want to do something like this, why not use templated HTML, or
possibly templated PDFs? Or heaven forbid, Word's mail-merge facility?
(I think MS Office documents are effectively self-contained file
systems, so there is probably some module out there which can
read/write them).

Jon.

Oct 24 '06 #4

Antoine De Groote

Thank you all for your comments.

I ended up saving the word document in XML and then using (a slightly
modified version of) my script of the OP. For those interested, there
was also a problem with encodings.

Regards,
antoine

Oct 24 '06 #5

Antoine De Groote

Bruno Desthuilliers wrote:

Antoine De Groote wrote:
>Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members).

Do you know that MS Word already provides this kind of features ?

No, I don't. Sounds interesting... What is this feature called?

>
>I
open both input and output file in binary mode and do the
transformation. However, I can't open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?

Hand-editing a non-documented binary format may lead to undesirable
results...

>Oh, and is this approach pythonic anyway?

The pythonic approach is usually to start looking for existing
solutions... In this case, using Word's builtin features and Python/COM
integration would be a better choice IMHO.

>(I have a strong Java
background.)

Nobody's perfect !-)

>Regards,
antoine
import os

members = somelist

os.chdir(somefolder)

doc = file('ttt.doc', 'rb')
docout = file('ttt1.doc', 'wb')

counter = 0

for line in doc:

Since you opened the file as binary, you should use file.read() instead.
Ever wondered what your 'lines' look like ?-)

> while line.find('ABCDEF') -1:

.doc is a binary format. You may find such a byte sequence in it's
content in places that are *not* text content.

> try:
line = line.replace('ABCDEF', members[counter], 1)
docout.write(line)

You're writing back the whole chunk on each iteration. No surprise the
resulting document is corrupted.

> counter += 1

seq = list("abcd")
for indice, item in enumerate(seq):
print "%02d : %s" % (indice, item)

> except:
docout.write(line.replace('ABCDEF', '', 1))
else:
docout.write(line)

doc.close()
docout.close()

Oct 24 '06 #6

Bruno Desthuilliers

Antoine De Groote wrote:

Bruno Desthuilliers wrote:
>Antoine De Groote wrote:
>>Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members).

Do you know that MS Word already provides this kind of features ?

No, I don't. Sounds interesting... What is this feature called?

I don't know how it's named in english, but in french it's (well - it
was last time I used MS Word, which is quite some times ago???) "fusion
de documents".

--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"

Oct 24 '06 #7

Steve Holden

Antoine De Groote wrote:

Bruno Desthuilliers wrote:

>>Antoine De Groote wrote:

>>>Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members).

Do you know that MS Word already provides this kind of features ?

No, I don't. Sounds interesting... What is this feature called?

Mail-merge, I believe.

However, if your document can be adequately represented in RTF
(rich-text format) then you could consider doing string replacements on
that. I invoice the PyCon sponsors using this rather inelegant technique.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Oct 24 '06 #8

Richie Hindle

[Antoine]

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members).

[Bruno]

I don't know how it's named in english, but in french it's (well - it
was last time I used MS Word, which is quite some times ago???) "fusion
de documents".

"Mail Merge"?

--
Richie Hindle
ri****@entrian.com

Oct 24 '06 #9

Frederic Rentsch

Antoine De Groote wrote:

Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members). I
open both input and output file in binary mode and do the
transformation. However, I can't open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?

Oh, and is this approach pythonic anyway? (I have a strong Java background.)

Regards,
antoine
import os

members = somelist

os.chdir(somefolder)

doc = file('ttt.doc', 'rb')
docout = file('ttt1.doc', 'wb')

counter = 0

for line in doc:
while line.find('ABCDEF') -1:
try:
line = line.replace('ABCDEF', members[counter], 1)
docout.write(line)
counter += 1
except:
docout.write(line.replace('ABCDEF', '', 1))
else:
docout.write(line)

doc.close()
docout.close()

DOC files contain housekeeping info which becomes inconsistent if you
change text. Possibly you can exchange stuff of equal length but that
wouldn't serve your purpose. RTF files let you do substitutions and they
save a lot of space too. But I kind of doubt whether RTF files can
contain pictures.

Frederic

Oct 24 '06 #10

Duncan Booth

Frederic Rentsch <an***********@vtxmail.chwrote:

DOC files contain housekeeping info which becomes inconsistent if you
change text. Possibly you can exchange stuff of equal length but that
wouldn't serve your purpose. RTF files let you do substitutions and they
save a lot of space too. But I kind of doubt whether RTF files can
contain pictures.

They wouldn't be a lot of use as a document file format if they couldn't
contain pictures. RTF files can contain just about anything, they can even
embed other non-rtf objects. Whether rtf applications apart from Word can
actually handle all of the tags is, of course, another question.

Oct 25 '06 #11

Similar topics

Word mail merge from Access

by: S Taylor | last post by:

I am running MSWord VBA code from within Access VBA that merges a Word mail merge document to the printer, using data in Access. In Office 97 it worked fine, but in Word 2003 a new message comes up...

Microsoft Access / VBA

How to on word doc output (page setup, streaming html and datagrid, open file)

by: Andrew | last post by:

I'm adding this as it to me a while to figure out all the pieces to be able to do this without using Microsoft.Office.Interop which caused me problems on the web-server. Streaming is the easy...

ASP.NET

String.Replace dont work?

by: Crirus | last post by:

dim pp as string pp="{X=356, Y=256}{X=356, Y=311.2285}{X=311.2285, Y=356}{X=256, Y=356}{X=200.7715, Y=356}{X=156, Y=311.2285}{X=156, Y=256}{X=156, Y=200.7715}{X=200.7715, Y=156}{X=256,...

Visual Basic .NET

Replacing a word in a string

by: jacob navia | last post by:

Hi guys! I like C because is fun. So, I wrote this function for the lcc-win32 standard library: strrepl. I thought that with so many "C heads" around, maybe we could improve it in a...

C / C++

Document.Open Office Example

by: Fabian | last post by:

Hello, I want to open a Word Document in my C# Programm. I tried this : Word.Application wordapp = new Word.Application(); object path = "TEST.DOC"; object vk_read_only = false; object...

C# / C Sharp

How to replace a word with HTML?

by: gregpinero | last post by:

Hi guys, What I'm trying to do is find all instances of an acronymn such as IBM on a webpage and replace it with <acronym title="International Business Machines">IBM</acronym>. However in my...

Javascript

Can't make this page work

by: scottyman | last post by:

I can't make this script work properly. I've gone as far as I can with it and the rest is out of my ability. I can do some html editing but I'm lost in the Java world. The script at the bottom of...

Javascript

Can't able to Open Word File in Client Machine which is Stored in Server Machine

by: senthilavs | last post by:

Hi, Im having word document in the Server Machine. While im trying to open the file in Client Machine file is opening in Server only. I need to open in the client only. This is an ASP.NET project...

.NET Framework

Can Access create Word documents?

by: etuncer | last post by:

Hello All, I have Access 2003, and am trying to build a database for my small company. I want to be able to create a word document based on the data entered through a form. the real question is...

Microsoft Access / VBA

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware