By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,548 Members | 1,495 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,548 IT Pros & Developers. It's quick & easy.

regex module, or don't work as expected

P: n/a
Howdy,
i have the following regex "iface lo[\w\t\n\s]+(?=(iface)|$)"

If "iface" don't follow after the regex "iface lo[\w\t\n\s]" the rest of
the text should be selected.
But ?=(iface) is ignored, it is always the whole texte selected.
What is wrong?
many thanks

greetings

Fabian
Jul 4 '06 #1
Share this Question
Share on Google+
3 Replies


P: n/a
In <44**********************@read.cnntp.org>, Fabian Holler wrote:
Howdy,
i have the following regex "iface lo[\w\t\n\s]+(?=(iface)|$)"

If "iface" don't follow after the regex "iface lo[\w\t\n\s]" the rest of
the text should be selected.
But ?=(iface) is ignored, it is always the whole texte selected.
What is wrong?
The ``+`` after the character class means at least one of the characters
in the class or more. If you have a text like:

iface lox iface

Then the it matches the space and the word ``iface`` because the space
(``\s``) and word characters (``\w``) are part of the character class and
``+`` is "greedy". It consumes as many characters as possible and the
rest of the regex is only evaluated when there are no matches anymore.

If you want to match non-greedy then put a ``?`` after the ``+``::

iface lo[\w\t\n\s]+?(?=(iface)|$)

Now only "iface lox " is matched in the example above.

Ciao,
Marc 'BlackJack' Rintsch
Jul 4 '06 #2

P: n/a
Hello Marc,

thank you for your answer.

Marc 'BlackJack' Rintsch wrote:
In <44**********************@read.cnntp.org>, Fabian Holler wrote:
>i have the following regex "iface lo[\w\t\n\s]+(?=(iface)|$)"

If "iface" don't follow after the regex "iface lo[\w\t\n\s]" the rest of
the text should be selected.
But ?=(iface) is ignored, it is always the whole texte selected.
What is wrong?

The ``+`` after the character class means at least one of the characters
in the class or more. If you have a text like:
Yes thats right, but that isn't my problem.
The problem is in the "(?=(iface)|$)" part.

I have i.e. the text:

"auto lo eth0
<MATCH START>iface lo inet loopback
bla
blub

<MATCH END>iface eth0 inet dhcp
hostname debian"
My regex should match the marked text.
But it matchs the whole text starting from iface.
If there is only one iface entry, the whole text starting from iface
should be matched.

greetings

Fabian
Jul 4 '06 #3

P: n/a
Fabian Holler wrote:
Yes thats right, but that isn't my problem.
The problem is in the "(?=(iface)|$)" part.
no, the problem is that you're thinking "procedural string matching from
left to right", but that's not how regular expressions work.
I have i.e. the text:

"auto lo eth0
<MATCH START>iface lo inet loopback
bla
blub

<MATCH END>iface eth0 inet dhcp
hostname debian"
My regex should match the marked text.
But it matchs the whole text starting from iface.
which is perfectly valid, since a plain "+" is greedy, and you've asked
for "iface lo" followed by some text followed by *either* end of string
or another "iface". the rest of the string is a perfectly valid string.

if you want a non-greedy match, use "+?" instead.

however, if you just want the text between two string literals, it's
often more efficient to just split the string twice:

text = text.split("iface lo", 1)[1].split("iface", 1)[0]

</F>

Jul 4 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.