473,698 Members | 2,491 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

can't open word document after string replacements

Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members). I
open both input and output file in binary mode and do the
transformation. However, I can't open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?

Oh, and is this approach pythonic anyway? (I have a strong Java background.)

Regards,
antoine
import os

members = somelist

os.chdir(somefo lder)

doc = file('ttt.doc', 'rb')
docout = file('ttt1.doc' , 'wb')

counter = 0

for line in doc:
while line.find('ABCD EF') -1:
try:
line = line.replace('A BCDEF', members[counter], 1)
docout.write(li ne)
counter += 1
except:
docout.write(li ne.replace('ABC DEF', '', 1))
else:
docout.write(li ne)

doc.close()
docout.close()

Oct 24 '06 #1
10 2070
Antoine De Groote wrote:
I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members). I
open both input and output file in binary mode and do the
transformation. However, I can't open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?
The Word document format probably contains some length information about
paragraphs etc. If you change a string to another one of a different
length, this length information will no longer match the data and the
document structure will be hosed.

Possible solutions:
1. Use OLE automation (in the python win32 package) to open the file in
Word and use Word search and replace. Your script could then directly
print the document, which you probably have to do anyway.

2. Export the template document to RTF. This is a text format and can be
more easily manipulated with Python.
for line in doc:
I don't think that what you get here is actually a line of you document.
Due to the binary nature of the format, it is an arbitrary chunk.

Daniel
Oct 24 '06 #2
Antoine De Groote wrote:
Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members).
Do you know that MS Word already provides this kind of features ?
I
open both input and output file in binary mode and do the
transformation. However, I can't open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?
Hand-editing a non-documented binary format may lead to undesirable
results...
Oh, and is this approach pythonic anyway?
The pythonic approach is usually to start looking for existing
solutions... In this case, using Word's builtin features and Python/COM
integration would be a better choice IMHO.
(I have a strong Java
background.)
Nobody's perfect !-)
Regards,
antoine
import os

members = somelist

os.chdir(somefo lder)

doc = file('ttt.doc', 'rb')
docout = file('ttt1.doc' , 'wb')

counter = 0

for line in doc:
Since you opened the file as binary, you should use file.read() instead.
Ever wondered what your 'lines' look like ?-)
while line.find('ABCD EF') -1:
..doc is a binary format. You may find such a byte sequence in it's
content in places that are *not* text content.
try:
line = line.replace('A BCDEF', members[counter], 1)
docout.write(li ne)
You're writing back the whole chunk on each iteration. No surprise the
resulting document is corrupted.
counter += 1
seq = list("abcd")
for indice, item in enumerate(seq):
print "%02d : %s" % (indice, item)

except:
docout.write(li ne.replace('ABC DEF', '', 1))
else:
docout.write(li ne)

doc.close()
docout.close()


--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom. gro'.split('@')])"
Oct 24 '06 #3

Antoine De Groote wrote:
Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members). I
open both input and output file in binary mode and do the
transformation. However, I can't open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?

Oh, and is this approach pythonic anyway? (I have a strong Java background.)

Regards,
antoine
import os

members = somelist

os.chdir(somefo lder)

doc = file('ttt.doc', 'rb')
docout = file('ttt1.doc' , 'wb')

counter = 0

for line in doc:
while line.find('ABCD EF') -1:
try:
line = line.replace('A BCDEF', members[counter], 1)
docout.write(li ne)
counter += 1
except:
docout.write(li ne.replace('ABC DEF', '', 1))
else:
docout.write(li ne)

doc.close()
docout.close()
Errr.... I wouldn't even attempt to do this; how do you know each
'line' isn't going to be split arbitarily, and that 'ABCDEF' doesn't
happen to be part of an image. As you've noted, this is binary data so
you can't assume anything about it. Doing it this way is a Bad Idea
(tm).

If you want to do something like this, why not use templated HTML, or
possibly templated PDFs? Or heaven forbid, Word's mail-merge facility?
(I think MS Office documents are effectively self-contained file
systems, so there is probably some module out there which can
read/write them).

Jon.

Oct 24 '06 #4
Thank you all for your comments.

I ended up saving the word document in XML and then using (a slightly
modified version of) my script of the OP. For those interested, there
was also a problem with encodings.

Regards,
antoine
Oct 24 '06 #5
Bruno Desthuilliers wrote:
Antoine De Groote wrote:
>Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members).

Do you know that MS Word already provides this kind of features ?

No, I don't. Sounds interesting... What is this feature called?
>
>I
open both input and output file in binary mode and do the
transformation . However, I can't open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?

Hand-editing a non-documented binary format may lead to undesirable
results...
>Oh, and is this approach pythonic anyway?

The pythonic approach is usually to start looking for existing
solutions... In this case, using Word's builtin features and Python/COM
integration would be a better choice IMHO.
>(I have a strong Java
background.)

Nobody's perfect !-)
>Regards,
antoine
import os

members = somelist

os.chdir(somef older)

doc = file('ttt.doc', 'rb')
docout = file('ttt1.doc' , 'wb')

counter = 0

for line in doc:

Since you opened the file as binary, you should use file.read() instead.
Ever wondered what your 'lines' look like ?-)
> while line.find('ABCD EF') -1:

.doc is a binary format. You may find such a byte sequence in it's
content in places that are *not* text content.
> try:
line = line.replace('A BCDEF', members[counter], 1)
docout.write(li ne)

You're writing back the whole chunk on each iteration. No surprise the
resulting document is corrupted.
> counter += 1

seq = list("abcd")
for indice, item in enumerate(seq):
print "%02d : %s" % (indice, item)

> except:
docout.write(li ne.replace('ABC DEF', '', 1))
else:
docout.write(li ne)

doc.close()
docout.close ()


Oct 24 '06 #6
Antoine De Groote wrote:
Bruno Desthuilliers wrote:
>Antoine De Groote wrote:
>>Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members).

Do you know that MS Word already provides this kind of features ?


No, I don't. Sounds interesting... What is this feature called?
I don't know how it's named in english, but in french it's (well - it
was last time I used MS Word, which is quite some times ago???) "fusion
de documents".

--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom. gro'.split('@')])"
Oct 24 '06 #7
Antoine De Groote wrote:
Bruno Desthuilliers wrote:
>>Antoine De Groote wrote:
>>>Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members).

Do you know that MS Word already provides this kind of features ?

No, I don't. Sounds interesting... What is this feature called?
Mail-merge, I believe.

However, if your document can be adequately represented in RTF
(rich-text format) then you could consider doing string replacements on
that. I invoice the PyCon sponsors using this rather inelegant technique.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Oct 24 '06 #8

[Antoine]
I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members).
[Bruno]
I don't know how it's named in english, but in french it's (well - it
was last time I used MS Word, which is quite some times ago???) "fusion
de documents".
"Mail Merge"?

--
Richie Hindle
ri****@entrian. com
Oct 24 '06 #9
Antoine De Groote wrote:
Hi there,

I have a word document containing pictures and text. This documents
holds several 'ABCDEF' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members). I
open both input and output file in binary mode and do the
transformation. However, I can't open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?

Oh, and is this approach pythonic anyway? (I have a strong Java background.)

Regards,
antoine
import os

members = somelist

os.chdir(somefo lder)

doc = file('ttt.doc', 'rb')
docout = file('ttt1.doc' , 'wb')

counter = 0

for line in doc:
while line.find('ABCD EF') -1:
try:
line = line.replace('A BCDEF', members[counter], 1)
docout.write(li ne)
counter += 1
except:
docout.write(li ne.replace('ABC DEF', '', 1))
else:
docout.write(li ne)

doc.close()
docout.close()

DOC files contain housekeeping info which becomes inconsistent if you
change text. Possibly you can exchange stuff of equal length but that
wouldn't serve your purpose. RTF files let you do substitutions and they
save a lot of space too. But I kind of doubt whether RTF files can
contain pictures.

Frederic

Oct 24 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1730
by: S Taylor | last post by:
I am running MSWord VBA code from within Access VBA that merges a Word mail merge document to the printer, using data in Access. In Office 97 it worked fine, but in Word 2003 a new message comes up in Word when it tries to import data. The message is : ********************** Opening this document will run the following SQL command: SELECT * FROM 'Tbl.' WHERE 'Fld.' = 1
1
4353
by: Andrew | last post by:
I'm adding this as it to me a while to figure out all the pieces to be able to do this without using Microsoft.Office.Interop which caused me problems on the web-server. Streaming is the easy part, but I couldn't initially work out how to manipulate the page setup to change page margins and orientation, that's why I was looking at Microsoft.Office.Interop. But with Microsoft.Office.Interop I couldn't fiure out how to stream HTML...
9
2153
by: Crirus | last post by:
dim pp as string pp="{X=356, Y=256}{X=356, Y=311.2285}{X=311.2285, Y=356}{X=256, Y=356}{X=200.7715, Y=356}{X=156, Y=311.2285}{X=156, Y=256}{X=156, Y=200.7715}{X=200.7715, Y=156}{X=256, Y=156}{X=311.2285, Y=156}{X=356, Y=200.7715}{X=356, Y=256}{X=200, Y=150}{X=200, Y=177.6142}{X=177.6142, Y=200}{X=150, Y=200}{X=122.3858, Y=200}{X=100, Y=177.6142}{X=100, Y=150}{X=100, Y=122.3858}{X=122.3858, Y=100}{X=150, Y=100}{X=177.6142, Y=100}{X=200,...
35
5820
by: jacob navia | last post by:
Hi guys! I like C because is fun. So, I wrote this function for the lcc-win32 standard library: strrepl. I thought that with so many "C heads" around, maybe we could improve it in a collective brainstorming session. Let's discuss some C here, for a change :-)
4
18640
by: Fabian | last post by:
Hello, I want to open a Word Document in my C# Programm. I tried this : Word.Application wordapp = new Word.Application(); object path = "TEST.DOC"; object vk_read_only = false; object vk_visible = true; object vk_false = false; object vk_true = true;
3
2484
by: gregpinero | last post by:
Hi guys, What I'm trying to do is find all instances of an acronymn such as IBM on a webpage and replace it with <acronym title="International Business Machines">IBM</acronym>. However in my code below it replace the <, and > with &lt; and &gt;. Thus it replaces IBM with: &lt;acronym title="International Business Machines"&gt;IBM&lt;/acronym&gt;
6
4883
by: scottyman | last post by:
I can't make this script work properly. I've gone as far as I can with it and the rest is out of my ability. I can do some html editing but I'm lost in the Java world. The script at the bottom of the html page controls the form fields that are required. It doesn't function like it's supposed to and I can leave all the fields blank and it still submits the form. Also I can't get it to transfer the file in the upload section. The file name...
0
2410
by: senthilavs | last post by:
Hi, Im having word document in the Server Machine. While im trying to open the file in Client Machine file is opening in Server only. I need to open in the client only. This is an ASP.NET project done in VB.NET. The Code i used to Open the file --------------------------------------------- Dim oWordApplic As New Word.ApplicationClass Dim oDoc As Word.Document Dim strFileName As String =...
4
12437
by: etuncer | last post by:
Hello All, I have Access 2003, and am trying to build a database for my small company. I want to be able to create a word document based on the data entered through a form. the real question is this: can Access create the document and place it as an OLE object to the relevant table? Any help is greatly appreciated. Ricky
0
8676
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, weíll explore What is ONU, What Is Router, ONU & Routerís main usage, and What is the difference between ONU and Router. Letís take a closer look ! Part I. Meaning of...
0
8608
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9161
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9029
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
5860
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4619
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3050
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2332
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2006
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.