469,284 Members | 2,519 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,284 developers. It's quick & easy.

Regex - where do I make a mistake?

I have
string="""<span class="test456">55</span>.
<td><span class="test123">128</span>
<span class="test789">170</span>
"""

where I need to replace
<span class="test456">55</span>.
<span class="test789">170</span>

by space.
So I tried

#############
import re
string="""<td><span class="test456">55</span>.<span
class="test123">128</span><span class="test789">170</span>
"""
Newstring=re.sub(r'<span class="test(?!123)">.*</span>'," ",string)
###########

But it does NOT work.
Can anyone explain why?
Thank you
L.

Feb 16 '07 #1
4 1125
Johny wrote:
I have
string="""<span class="test456">55</span>.
<td><span class="test123">128</span>
<span class="test789">170</span>
"""

where I need to replace
<span class="test456">55</span>.
<span class="test789">170</span>

by space.
So I tried

#############
import re
string="""<td><span class="test456">55</span>.<span
class="test123">128</span><span class="test789">170</span>
"""
Newstring=re.sub(r'<span class="test(?!123)">.*</span>'," ",string)
###########

But it does NOT work.
Can anyone explain why?
"(?!123)" is a negative "lookahead assertion", i. e. it ensures that "test"
is not followed by "123", but /doesn't/ consume any characters. For your
regex to match "test" must be /immediately/ followed by a '"'.

Regular expressions are too lowlevel to use on HTML directly. Go with
BeautifulSoup instead of trying to fix the above.

Peter
Feb 16 '07 #2
On Feb 16, 2:14 pm, Peter Otten <__pete...@web.dewrote:
Johny wrote:
I have
string="""<span class="test456">55</span>.
<td><span class="test123">128</span>
<span class="test789">170</span>
"""
where I need to replace
<span class="test456">55</span>.
<span class="test789">170</span>
by space.
So I tried
#############
import re
string="""<td><span class="test456">55</span>.<span
class="test123">128</span><span class="test789">170</span>
"""
Newstring=re.sub(r'<span class="test(?!123)">.*</span>'," ",string)
###########
But it does NOT work.
Can anyone explain why?

"(?!123)" is a negative "lookahead assertion", i. e. it ensures that "test"
is not followed by "123", but /doesn't/ consume any characters. For your
regex to match "test" must be /immediately/ followed by a '"'.

Regular expressions are too lowlevel to use on HTML directly. Go with
BeautifulSoup instead of trying to fix the above.

Peter- Hide quoted text -

- Show quoted text -
Yes, I know "(?!123)" is a negative "lookahead assertion",
but do not know excatly why it does not work.I thought that

(?!...)
Matches if ... doesn't match next. For example, Isaac (?!Asimov) will
match 'Isaac ' only if it's not followed by 'Asimov'.

Feb 16 '07 #3
Johny wrote:
On Feb 16, 2:14 pm, Peter Otten <__pete...@web.dewrote:
>Johny wrote:
I have
string="""<span class="test456">55</span>.
<td><span class="test123">128</span>
<span class="test789">170</span>
"""
where I need to replace
<span class="test456">55</span>.
<span class="test789">170</span>
by space.
So I tried
#############
import re
string="""<td><span class="test456">55</span>.<span
class="test123">128</span><span class="test789">170</span>
"""
Newstring=re.sub(r'<span class="test(?!123)">.*</span>'," ",string)
###########
But it does NOT work.
Can anyone explain why?

"(?!123)" is a negative "lookahead assertion", i. e. it ensures that
"test" is not followed by "123", but /doesn't/ consume any characters.
For your regex to match "test" must be /immediately/ followed by a '"'.

Regular expressions are too lowlevel to use on HTML directly. Go with
BeautifulSoup instead of trying to fix the above.

Peter- Hide quoted text -

- Show quoted text -

Yes, I know "(?!123)" is a negative "lookahead assertion",
but do not know excatly why it does not work.I thought that

(?!...)
Matches if ... doesn't match next. For example, Isaac (?!Asimov) will
match 'Isaac ' only if it's not followed by 'Asimov'.
The problem is that your regex does not end with the lookahead assertion and
there is nothing to consume the '456' or '789'. To illustrate:
>>for example in ["before123after", "before234after", "beforeafter"]:
.... re.findall("before(?!123)after", example)
....
[]
[]
['beforeafter']
>>for example in ["before123after", "before234after", "beforeafter"]:
.... re.findall(r"before(?!123)\d\d\dafter", example)
....
[]
['before234after']
[]

Peter
Feb 16 '07 #4
On Fri, 2007-02-16 at 05:34 -0800, Johny wrote:
On Feb 16, 2:14 pm, Peter Otten <__pete...@web.dewrote:
Johny wrote:
I have
string="""<span class="test456">55</span>.
<td><span class="test123">128</span>
<span class="test789">170</span>
"""
where I need to replace
<span class="test456">55</span>.
<span class="test789">170</span>
by space.
So I tried
#############
import re
string="""<td><span class="test456">55</span>.<span
class="test123">128</span><span class="test789">170</span>
"""
Newstring=re.sub(r'<span class="test(?!123)">.*</span>'," ",string)
###########
But it does NOT work.
Can anyone explain why?
"(?!123)" is a negative "lookahead assertion", i. e. it ensures that "test"
is not followed by "123", but /doesn't/ consume any characters. For your
regex to match "test" must be /immediately/ followed by a '"'.

Regular expressions are too lowlevel to use on HTML directly. Go with
BeautifulSoup instead of trying to fix the above.
Yes, I know "(?!123)" is a negative "lookahead assertion",
but do not know excatly why it does not work.
It *does* work, it just doesn't do what you think it does.

The lookahead assertion is a zero-width match that doesn't match any
actual characters from the subject. It matches an imaginary vertical
line between two consecutive characters of the subject.

Nothing in your pattern matches the string of digits that follows
"test", hence the subject fails to match the pattern.

Also, please note Peter's advice that Regular Expressions are almost
always the wrong tool for working with HTML. It may work in very limited
cases, and maybe you have such a limited case, but you'd better make
sure that you'll never ever have to handle anything beyond this limited
case.

-Carsten
Feb 16 '07 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

20 posts views Thread by jeevankodali | last post: by
17 posts views Thread by clintonG | last post: by
7 posts views Thread by lgbjr | last post: by
3 posts views Thread by jg | last post: by
1 post views Thread by gs | last post: by
5 posts views Thread by skavan | last post: by
5 posts views Thread by GS | last post: by
2 posts views Thread by erikcw | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.