By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,814 Members | 1,136 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,814 IT Pros & Developers. It's quick & easy.

Regular expression worries

P: n/a
folks
I am new to python, so excuse me if i am asking stupid questions.

I have a txt file and here are some lines of it

Document<Keyword<date:2006-08-19Keyword<time:11:00:43>
Keyword<username:YOURBOTNICKKeyword<data:localhost .localdomain>
Keyword<logon:localhost.localdomain
Keyword<date:2006-08-19Keyword<time:11:00:44Keyword<sender:>
Keyword<receiver:Keyword<data::+iwxKeyword<mode::+ iwx

I am writing a python program to replace the tags and word Document
with Doc.

Here is my python program

#! /usr/local/bin/python

import sys
import string
import re

def replace():
filename='/root/Desktop/project/chatlog_20060819_110043.xml.txt'
try:
fh=open(filename,'r')
except:
print 'file not opened'
sys.exit(1)
for l in
open('/root/Desktop/project/chatlog_20060819_110043.xml.txt'):

l=l.replace("Document", "DOC")
fh.close()

if __name__=="__main__":
replace()

But it does not replace Document with Doc in the txt file

Is there anything wrong i am doing

thanks

Oct 11 '06 #1
Share this Question
Share on Google+
3 Replies


P: n/a
You are opening the same file twice, reading its contents line-by-line
into memory, replacing "Document" with "Doc" *in memory*, never writing
that to disk, and then discarding the line you just read into memory.

If your file is short, you could read the entire thing into memory as
one string using the .read() method of fh (your file object). Then,
call .replace on the string, and then write to disk.

If your file is long, then you want to do the replace line by line,
writing as you go to a second file. You can later rename that file to
the original file's name and delete the original.

Also, you aren't using regular expressions at all. You do not
therefore need the re module.

CSUIDL PROGRAMMEr wrote:
folks
I am new to python, so excuse me if i am asking stupid questions.

I have a txt file and here are some lines of it

Document<Keyword<date:2006-08-19Keyword<time:11:00:43>
Keyword<username:YOURBOTNICKKeyword<data:localhost .localdomain>
Keyword<logon:localhost.localdomain
Keyword<date:2006-08-19Keyword<time:11:00:44Keyword<sender:>
Keyword<receiver:Keyword<data::+iwxKeyword<mode::+ iwx

I am writing a python program to replace the tags and word Document
with Doc.

Here is my python program

#! /usr/local/bin/python

import sys
import string
import re

def replace():
filename='/root/Desktop/project/chatlog_20060819_110043.xml.txt'
try:
fh=open(filename,'r')
except:
print 'file not opened'
sys.exit(1)
for l in
open('/root/Desktop/project/chatlog_20060819_110043.xml.txt'):

l=l.replace("Document", "DOC")
fh.close()

if __name__=="__main__":
replace()

But it does not replace Document with Doc in the txt file

Is there anything wrong i am doing

thanks
Oct 11 '06 #2

P: n/a
for l in
open('/root/Desktop/project/chatlog_20060819_110043.xml.txt'):

l=l.replace("Document", "DOC")
fh.close()

But it does not replace Document with Doc in the txt file
In addition to closing the file handle for the loop *within* the
loop, you're changing "l" (side note: a bad choice of names, as
in most fonts, it's difficult to visually discern from the number
"1"), but you're not writing it back out any place. One would do
something like

outfile = open('out.txt', 'w')
infile = open(filename)
for line in infile:
outfile.write(line.replace("Document", "DOC"))
outfile.close()
infile.close()

You could even let garbage collection take care of the file
handle for you:
outfile = open('out.txt', 'w')
for line in open(filename):
outfile.write(line.replace("Document", "DOC"))
outfile.close()
If needed, you can then move the 'out.txt' overtop of the
original file.

Or, you could just use

sed 's/Document/DOC/g' $FILENAME out.txt

or with an accepting version, do it in-place with

sed -i 's/Document/DOC/g' $FILENAME

if you have sed available on your system.

Oh...and it doesn't look like your code is using regexps for
anything, despite the subject-line of your email :) I suspect
they'll come in later for the "replace the tags" portion you
mentioned, but that ain't in the code.

-tkc

Oct 11 '06 #3

P: n/a
CSUIDL PROGRAMMEr wrote:
folks
I am new to python, so excuse me if i am asking stupid questions.
From what I see, you seem to be new to programming in general !-)
I have a txt file and here are some lines of it

Document<Keyword<date:2006-08-19Keyword<time:11:00:43>
Keyword<username:YOURBOTNICKKeyword<data:localhost .localdomain>
Keyword<logon:localhost.localdomain
Keyword<date:2006-08-19Keyword<time:11:00:44Keyword<sender:>
Keyword<receiver:Keyword<data::+iwxKeyword<mode::+ iwx

I am writing a python program to replace the tags and word Document
with Doc.

Here is my python program

#! /usr/local/bin/python

import sys
import string
import re

def replace():
filename='/root/Desktop/project/chatlog_20060819_110043.xml.txt'
try:
fh=open(filename,'r')
except:
print 'file not opened'
sys.exit(1)
You open your file a first time, and bind the reference to the file
object to fh.
for l in
open('/root/Desktop/project/chatlog_20060819_110043.xml.txt'):
And then you open the file a second time...
l=l.replace("Document", "DOC")
This modifies the string referenced by l (talk about a bad name) and
rebind to the same name
fh.close()
Then you close fh... and discard the modifications to l.
if __name__=="__main__":
replace()

But it does not replace Document with Doc in the txt file
Why should it ? You didn't asked for it !-)
Is there anything wrong i am doing
Yes.

The canonical way to modify a text file is to read from original / do
transformations / *write modifications to a tmp file* / replace the
original with the tmp file.
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"
Oct 11 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.