473,326 Members | 2,013 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Seeking assistance - string processing.

I've been working on some code to search for specific textstrings and
act upon them insome way. I've got the conversion sorted however there
is 1 problem remaining.

I am trying to work out how to make it find a string like this "==="
and when it has found it, I want it to add "===" to the end of the
line.

For example.

The text file contains this:

===Heading

and I am trying to make it be processed and outputted as a .dat file
with the contents

===Heading===

Here's the code I have got so far.

import string
import glob
import os

mydir = os.getcwd()
newdir = mydir#+"\\Test\\";

for filename in glob.glob1(newdir,"*.txt"):
#print "This is an input file: " + filename
fileloc = newdir+"\\"+filename
#print fileloc

outputname = filename
outputfile = string.replace(outputname,'.txt','.dat')
#print filename
#print a

print "This is an input file: " + filename + ". Output file:
"+outputfile

#temp = newdir + "\\" + outputfile
#print temp
fpi = open(fileloc);
fpo = open(outputfile,"w+");

output_lines = []
lines = fpi.readlines()

for line in lines:
if line.rfind("--------------------") is not -1:
new = line.replace("--------------------","----")
elif line.rfind("img:") is not -1:
new = line.replace("img:","[[Image:")
elif line.rfind(".jpg") is not -1:
new = line.replace(".jpg",".jpg]]")
elif line.rfind(".gif") is not -1:
new = line.replace(".gif",".gif]]")
else:
output_lines.append(line);
continue
output_lines.append(new);

for line in output_lines:
fpo.write(line)

fpi.close()
fpo.flush()
fpo.close()
I hope this gets formatted correctly :-p

Cheers, hope you can help.

Nov 14 '06 #1
11 1207
bi**************@googlemail.com wrote:
I am trying to work out how to make it find a string like this "==="
and when it has found it, I want it to add "===" to the end of the
line.
how about

if line.startswith("==="):
line = line + "==="

or

if "===" in line: # anywhere
line = line + "==="

?
if line.rfind("--------------------") is not -1:
new = line.replace("--------------------","----")
it's not an error to use replace on a string that doesn't contain the
pattern, so that rfind is rather unnecessary.

(and for cases where you need to look first, searching from the left
is usually faster than searching backwards; use "pattern in line" or
"line.find(pattern)" instead of rfind.

</F>

Nov 14 '06 #2
Thanks so much, a really elegant solution indeed.

I have another question actually which I'm praying you can help me
with:

with regards to the .jpg conversion to .jpg]] and .gif -.gif]]

this works, but only when .jpg/.gif is on it's own line.

i.e:

..jpg

will get converted to:

..jpg]]

but

Image:test.jpg

gets converted to:

[[Image:test.jpg

rather than

[[Image:test.jpg]]

------------------

Hope you can help again! Cheers

Nov 14 '06 #3
bi**************@googlemail.com wrote:
Thanks so much, a really elegant solution indeed.

I have another question actually which I'm praying you can help me
with:

with regards to the .jpg conversion to .jpg]] and .gif -.gif]]

this works, but only when .jpg/.gif is on it's own line.

i.e:

.jpg

will get converted to:

.jpg]]

but

Image:test.jpg

gets converted to:

[[Image:test.jpg

rather than

[[Image:test.jpg]]

------------------

Hope you can help again! Cheers
It does not do the right thing in all cases, but maybe you can get away with

for line in lines:
if line.startswith("==="):
line = line.rstrip() + "===\n"
line = line.replace("--------------------","----")
line = line.replace("img:","[[Image:")
line = line.replace(".jpg",".jpg]]")
line = line.replace(".gif",".gif]]")
output_lines.append(line)

Peter

Nov 14 '06 #4
Cheers for the reply.

But I'm still having a spot of bother with the === addition

it would seem that if there is no whitespace after the ===test
then the new === gets added to the next line

e.g file contains:

===test (and then no whitesapace/carriage returns or anything)

and the result is:

===test
===

I tried fidding aruond trying to make it add whitespace but it didnt
work.

What do you think I should do?

Cheers

Nov 14 '06 #5
bi**************@googlemail.com wrote:
But I'm still having a spot of bother with the === addition

it would seem that if there is no whitespace after the ===test
then the new === gets added to the next line

e.g file contains:

===test (and then no whitesapace/carriage returns or anything)

and the result is:

===test
===
that's probably because it *does* contain a newline. try printing the
line with

print repr(line)

before and after you make the change, to see what's going on.
I tried fidding aruond trying to make it add whitespace but it didnt
work.
peter's complete example contains one way to solve that:

if line.startswith("==="):
line = line.rstrip() + "===\n"
What do you think I should do?
reading the chapter on strings in your favourite Python tutorial once
again might help, I think. python have plenty of powerful tools for
string processing, and most of them are quite easy to learn and use; a
quick read of the tutorial and a little more trial and error before
posting should be all you need.

</F>

Nov 14 '06 #6
bi**************@googlemail.com wrote:
Cheers for the reply.

But I'm still having a spot of bother with the === addition

it would seem that if there is no whitespace after the ===test
then the new === gets added to the next line

e.g file contains:

===test (and then no whitesapace/carriage returns or anything)

and the result is:

===test
===
You'd get the above with Fredrik's solution if there is a newline. That's
why I put in the rstrip() method call (which removes trailing whitespace)
and added an explicit "\n" (the Python way to spell newline). With my
approach

if line.startswith("==="):
line = line.rstrip() + "===\n"

you should always get

===test===(and then a newline)

Peter
Nov 14 '06 #7
bi**************@googlemail.com wrote:
I've been working on some code to search for specific textstrings and
act upon them insome way. I've got the conversion sorted
What does that mean? There is no sort in the computer sense, and if you
mean as in "done" ...
however there
is 1 problem remaining.

I am trying to work out how to make it find a string like this "==="
and when it has found it, I want it to add "===" to the end of the
line.
The answer is at the end. Now take a deep breath, and read on carefully
and calmly:
>
For example.

The text file contains this:

===Heading

and I am trying to make it be processed and outputted as a .dat file
with the contents

===Heading===

Here's the code I have got so far.

import string
Not needed for this task. In fact the string module has only minimal
use these days. From what book or tutorial did you get the idea to use
result = string.replace(source_string, old, new) instead of result =
source_string.replace(old, new) sometimes? You should be using the
result = source_string.replace(old, new) way all the time.

What version of Python are you using?
import glob
import os

mydir = os.getcwd()
newdir = mydir#+"\\Test\\";
Try and make a real comment obvious; don't do what you did -- *delete*
unwanted code; alternatively if it may be wanted in the future, put in
a real comment to say why.

What was the semicolon for?

Consider using os.path.join() -- it's portable. Don't say "But my code
will only ever be run on Windows". If you write code like that, it will
be a self-fulfilling prophecy -- no-one will want try to run it
anywhere else.
>
for filename in glob.glob1(newdir,"*.txt"):
#print "This is an input file: " + filename
No it isn't; it's a *name* of a file
fileloc = newdir+"\\"+filename
#print fileloc

outputname = filename
outputfile = string.replace(outputname,'.txt','.dat')
No again, it's not a file.

Try outputname = filename.replace('.txt', '.dat')
Also consider what happens if the name of the input file is foo.txt.txt
[can happen]
#print filename
#print a

print "This is an input file: " + filename + ". Output file:
"+outputfile
No it isn't.

>
#temp = newdir + "\\" + outputfile
#print temp
fpi = open(fileloc);
fpo = open(outputfile,"w+");
Why the "+"?
Semi-colons?
>
output_lines = []
Why not just write as you go? What happens with a 1GB file? How much
memory do you have on your computer?

lines = fpi.readlines()
Whoops. That's now 2GB min of memory you need
>
for line in lines:
No, use "for line in fpi"
if line.rfind("--------------------") is not -1:
Quick, somebody please count the "-" signs in there; we'd really like
to know what this program is doing. If there are more identical
characters than you have fingers on your hand, don't do that. Use
character.repeat(count). Then consider giving it a name. Consider
putting in a comment to explain what your code is doing. If you can,
like why use rfind instead of find -- both will give the same result if
there are 0 or 1 occurrences of the sought string, and you aren't using
the position if there are 1 or more occurences. Then consider that if
you need a a comment for code like that, then maybe your variable names
are not very meaningful.
new = line.replace("------------------","----")
Is that the same number of "-"? Are you sure?
elif line.rfind("img:") is not -1:
new = line.replace("img:","[[Image:")
elif line.rfind(".jpg") is not -1:
new = line.replace(".jpg",".jpg]]")
That looks like a pattern to me. Consider setting up a list of (old,
new) tuples and looping over it.
elif line.rfind(".gif") is not -1:
new = line.replace(".gif",".gif]]")
else:
output_lines.append(line);
continue
output_lines.append(new);
Try this:
else:
new = line
fpo.write(new)
for line in output_lines:
fpo.write(line)

fpi.close()
fpo.flush()
News to me that close() doesn't automatically do flush() on a file
that's been open for writing.
fpo.close()
I hope this gets formatted correctly :-p

Cheers, hope you can help.
Answer to your question:

string1 in string2 beats string2.[r]find(string1) for readability and
(maybe) for speed too

elif "===" in line: # should be same to assume your audience can count
to 3
new = line[:-1] + "===\n"

HTH,
John

Nov 14 '06 #8

John Machin wrote:
new = line[:-1] + "===\n"
To allow for cases where the last line in the file is not terminated
[can happen],
this should be:

new = line.rstrip("\n") + "===\n"
# assuming you want to fix the unterminated problem.

Cheers,
John

Nov 14 '06 #9
Thanks Fredrik, Peter and John for your help.

John, I especially enjoyed your line by line assasination of my code,
keep it up.

I'm not a programmer, I dislike programming, I'm bad at it. I just
agreed to do this to help someone out, I didn't even know what python
was 3 days ago.

In case you were wondering about all the crazyness with the -------'s -
it's because I am trying to batch convert 1600 files into new versions
with slightly altered syntax.

It all works for now, hurrah, now it's time to break it again.

Cheerio fellas (for now, I'll be back I'm sure ;-D)

Nov 14 '06 #10
bi**************@googlemail.com wrote:
I've been working on some code to search for specific textstrings and
act upon them insome way. I've got the conversion sorted however there
is 1 problem remaining.

I am trying to work out how to make it find a string like this "==="
and when it has found it, I want it to add "===" to the end of the
line.

For example.

The text file contains this:

===Heading

and I am trying to make it be processed and outputted as a .dat file
with the contents

===Heading===

Here's the code I have got so far.

import string
import glob
import os

mydir = os.getcwd()
newdir = mydir#+"\\Test\\";

for filename in glob.glob1(newdir,"*.txt"):
#print "This is an input file: " + filename
fileloc = newdir+"\\"+filename
#print fileloc

outputname = filename
outputfile = string.replace(outputname,'.txt','.dat')
#print filename
#print a

print "This is an input file: " + filename + ". Output file:
"+outputfile

#temp = newdir + "\\" + outputfile
#print temp
fpi = open(fileloc);
fpo = open(outputfile,"w+");

output_lines = []
lines = fpi.readlines()

for line in lines:
if line.rfind("--------------------") is not -1:
new = line.replace("--------------------","----")
elif line.rfind("img:") is not -1:
new = line.replace("img:","[[Image:")
elif line.rfind(".jpg") is not -1:
new = line.replace(".jpg",".jpg]]")
elif line.rfind(".gif") is not -1:
new = line.replace(".gif",".gif]]")
else:
output_lines.append(line);
continue
output_lines.append(new);

for line in output_lines:
fpo.write(line)

fpi.close()
fpo.flush()
fpo.close()
I hope this gets formatted correctly :-p

Cheers, hope you can help.

Here's a suggestion:
>>import SE
Editor = SE.SE ('--------------------==---- img:=[[Image:
..jpg=.jpg]] .gif=.gif]]')
>>Editor ('-------------------- img: .jpg .gif') # See if it works
'------------------------ [[Image: .jpg]] .gif]]'

It works. (Add in other replacements if the need arises.)

Works linewise
>>for line in f:
new_line = Editor
(line)
...

Or filewise, which comes in handy in your case:
>>for in_filename in glob.glob (newdir+'/*.txt'):
out_filename = in_filename.replace ('.txt','.dat')
Editor (in_filename, out_filename)
See if that helps. Find SE here: http://cheeseshop.python.org/pypi/SE/2.3

Frederic
Nov 14 '06 #11

bi**************@googlemail.com wrote:
Thanks Fredrik, Peter and John for your help.

John, I especially enjoyed your line by line assasination of my code,
keep it up.

I'm not a programmer, I dislike programming, I'm bad at it. I just
agreed to do this to help someone out, I didn't even know what python
was 3 days ago.
I would have to disagree strongly with the "I'm bad at it". Everything
is relative. I've seen mind-bogglingly fugly incoherent messes produced
by people who claim to be professional programmers with 3 *years*
experience in a language (only rarely in Python). To have produced what
you did -- it was clear enough what you were trying to do, and it
"worked" well enough for a one-off job -- with 3 *days* experience with
Python was a remarkable achievement IMHO.

What's this "dislike programming" business? No such concept :-)

Cheers,
John

Nov 14 '06 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: clutterjoe | last post by:
Anyone know of a commercial classified ad script that has a shopping cart module for (real time or manual CC processing ok)? TIA!
7
by: Richard Maher | last post by:
Hi, I am seeking the help of volunteers to test some software that I've developed which facilitates distributed two-phase commit transactions, encompassing any resource manager (e.g. SQL/Server...
1
by: Dave | last post by:
Hello all, I have written an expression interpreter, and a profiler has told me that a specific part of it is in need of optimization. My purpose in posting to this newsgroup is to solicit...
22
by: Daniel Antonson | last post by:
Fellow programmers, As one of you pointed out, I've been taking 3 online courses (2 are done) and have run into a time crunch. I started these courses in Oct04, but between work (US Army in...
24
by: Joseph Geretz | last post by:
Up to this point, our application has been using Windows File Sharing to transfer files to and from our application document repository. This approach does not lend itself toward a secure...
1
by: BinnuChowdary | last post by:
Very Good news for all freshers and especially those who want to shift to Dotnet Technologies We have established an Training and Development Center Named Z-Axis Technologies in Our City at KPHB,...
15
by: Kay Schluehr | last post by:
I have a list of strings ls = and want to create a regular expression sx from it, such that sx.match(s) yields a SRE_Match object when s starts with an s_i for one i in . There might be...
13
by: B. Williams | last post by:
I have written some code to accept input and place this input at the beginning or end of a list, but I need assistance including a class so that it will allow input of a phone number that is...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.