By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,536 Members | 1,397 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,536 IT Pros & Developers. It's quick & easy.

re sub help

P: n/a
hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.

i have read the tutorial and came across negative/positive lookahead
and i think it can solve the problem.but am confused on how to use it.
anyone can give me some advice? or is there better way other than
lookaheads ...thanks..

Nov 5 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
s9************@yahoo.com writes:
hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.


Well, I'm not an expert on re's - I've only been using them for three
decades - but I'm not sure this can be done with a single re, as the
pattern you're interested in depends on context, and re's don't handle
that well.

On the

--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 5 '05 #2

P: n/a
s9************@yahoo.com writes:
hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.


Well, I'm not an expert on re's - I've only been using them for three
decades - but I'm not sure this can be done with a single re, as the
pattern you're interested in depends on context, and re's don't handle
that well.

On the other hand, this is fairly straightforward with simple string
operations:
a = "this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"
sd = '[startdelim]'
ed = '[enddelim]'
s, r = a.split(sd, 1)
m, e = r.split(ed, 1)
a = s + sd + m.replace('\n', '') + ed + e
a 'this\nis\na\nsentence[startdelim]thisisanother[enddelim]this\nis\n'


<mike

--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 5 '05 #3

P: n/a
thanks for the reply.

i am still interested about using re, i find it useful. am still
learning it's uses.
so i did something like this for a start, trying to get everything in
between [startdelim] and [enddelim]

a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

t = re.compile(r"\[startdelim\](.*)\[enddelim\]")

t.findall(a)
but it gives me []. it's the "\n" that prevents the results.
why can't (.*) work in this case? Or am i missing some steps to "read"
in the "\n"..?
thanks.

Nov 5 '05 #4

P: n/a
<s9************@yahoo.com> wrote:
i am still interested about using re, i find it useful. am still
learning it's uses.
so i did something like this for a start, trying to get everything in
between [startdelim] and [enddelim]

a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

t = re.compile(r"\[startdelim\](.*)\[enddelim\]")
"*" is greedy (=searches backwards from the right end), so that won't
do the right thing if you have multiple delimiters

to fix this, use "*?" instead.
t.findall(a)
but it gives me []. it's the "\n" that prevents the results.
why can't (.*) work in this case? Or am i missing some steps to "read"
in the "\n"..?


http://docs.python.org/lib/re-syntax.html

(Dot.) In the default mode, this matches any character except
a newline. If the DOTALL flag has been specified, this matches any
character including a newline.

to fix this, pass in re.DOTALL or re.S as the flag argument, or
prepend (?s) to the expression.

</F>

Nov 5 '05 #5

P: n/a
s9************@yahoo.com writes:
i am still interested about using re, i find it useful. am still
learning it's uses.
so i did something like this for a start, trying to get everything in
between [startdelim] and [enddelim]

a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

t = re.compile(r"\[startdelim\](.*)\[enddelim\]")

t.findall(a)
but it gives me []. it's the "\n" that prevents the results.
why can't (.*) work in this case? Or am i missing some steps to "read"
in the "\n"..?
thanks.


Newlines are magic to regular expressions. You use the flags in re to
change that. In this case, you want . to match them, so you use the
DOTALL flag:
a = "this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"
t = re.compile(r"\[startdelim\](.*)\[enddelim\]", re.DOTALL)
t.findall(a) ['this\nis\nanother']


<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 5 '05 #6

P: n/a
s9************@yahoo.com wrote:
hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.


Here is a solution using re.sub and a class that maintains state. It works when the input text contains multiple startdelim/enddelim pairs.

import re

a = "this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n" * 2

class subber(object):
def __init__(self):
self.delimiterSeen = False

def __call__(self, m):
text = m.group()
if text == 'startdelim':
self.delimiterSeen = True
return text

if text == 'enddelim':
self.delimiterSeen = False
return text

if self.delimiterSeen:
return text

return ''

delimRe = re.compile('\n|startdelim|enddelim')

newText = delimRe.sub(subber(), a)
print repr(newText)
Kent
Nov 5 '05 #7

P: n/a
On 4 Nov 2005 22:49:03 -0800, s9************@yahoo.com wrote:
hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.

i have read the tutorial and came across negative/positive lookahead
and i think it can solve the problem.but am confused on how to use it.
anyone can give me some advice? or is there better way other than
lookaheads ...thanks..


Sometimes splitting and processing the pieces selectively can be a solution, e.g.,
if delimiters are properly paired, splitting (with parens to keep matches) should
give you a repeating pattern modulo 4 of
<"everywhere else" as you said><first delim><between><second delim> ...
a = "this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"
import re
splitter = re.compile(r'(?s)(\[startdelim\]|\[enddelim\])')
sp = splitter.split(a)
sp ['this\nis\na\nsentence', '[startdelim]', 'this\nis\nanother', '[enddelim]', 'this\nis\n'] ''.join([(lambda s:s, lambda s:s.replace('\n',''))[not i%4](s) for i,s in enumerate(sp)]) 'thisisasentence[startdelim]this\nis\nanother[enddelim]thisis' print ''.join([(lambda s:s, lambda s:s.replace('\n',''))[not i%4](s) for i,s in enumerate(sp)]) thisisasentence[startdelim]this
is
another[enddelim]thisis

I haven't checked for corner cases, but HTH
Maybe I'll try two pairs of delimiters:
a += "2222\n33\n4\n55555555[startdelim]6666\n77\n8888888[enddelim]9999\n00\n"
sp = splitter.split(a)
print ''.join([(lambda s:s, lambda s:s.replace('\n',''))[not i%4](s) for i,s in enumerate(sp)]) thisisasentence[startdelim]this
is
another[enddelim]thisis222233455555555[startdelim]6666
77
8888888[enddelim]999900

which came from sp ['this\nis\na\nsentence', '[startdelim]', 'this\nis\nanother', '[enddelim]', 'this\nis\n2222\n33
\n4\n55555555', '[startdelim]', '6666\n77\n8888888', '[enddelim]', '9999\n00\n']

Which had the replacing when not i%4 was true
for i,s in enumerate(sp): print '%6s: %r'%(not i%4,s)

...
True: 'this\nis\na\nsentence'
False: '[startdelim]'
False: 'this\nis\nanother'
False: '[enddelim]'
True: 'this\nis\n2222\n33\n4\n55555555'
False: '[startdelim]'
False: '6666\n77\n8888888'
False: '[enddelim]'
True: '9999\n00\n'

Regards,
Bengt Richter
Nov 6 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.