473,725 Members | 2,118 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to get the "longest possible" match with Python's RE module?

Basically, the problem is this:
>>p = re.compile("do| dolittle")
p.match("doli ttle").group()
'do'

Python's NFA regexp engine trys only the first option, and happily
rests on that. There's another example:
>>p = re.compile("one (self)?(selfsuf ficient)?")
p.match("ones elfsufficient") .group()
'oneself'

The Python regular expression engine doesn't exaust all the
possibilities, but in my application I hope to get the longest possible
match, starting from a given point.

Is there a way to do this in Python?

Sep 12 '06 #1
32 14682
Licheng Fang wrote:
Basically, the problem is this:
>p = re.compile("do| dolittle")
p.match("dolit tle").group()
'do'
>From what I understand, this isn't python specific, it is the expected
behavior of that pattern in any implementation. You are using
alternation, which means "either, or", and you have the shorter
subexpression first, so the condition is satisfied by just 'do' and the
matching terminates.
There's another example:
>p = re.compile("one (self)?(selfsuf ficient)?")
p.match("onese lfsufficient"). group()
'oneself'
Again, I don't think this has anything to do with python. You pattern
basically means "match 'one' whether it is followed by 'self' or not,
and whether it is followed by 'selfsufficient ' or not". For this
particular example, you'd want something like
"one(self)?(suf ficient)?".

I think you could construct a pattern that would do what you want in
python without any problem. If you post a (short) example of your data,
I'm sure someone could help you with it.

Regards,
Jordan

Sep 12 '06 #2
Licheng Fang wrote:
Basically, the problem is this:
>p = re.compile("do| dolittle")
p.match("dolit tle").group()
'do'

Python's NFA regexp engine trys only the first option, and happily
rests on that. There's another example:
>p = re.compile("one (self)?(selfsuf ficient)?")
p.match("onese lfsufficient"). group()
'oneself'

The Python regular expression engine doesn't exaust all the
possibilities, but in my application I hope to get the longest possible
match, starting from a given point.

Is there a way to do this in Python?
This is the way the regexp works python doesn't has anything to do with
it. It starts parsing the data with the pattern given. It returns the
matched string acording the pattern and doesn't go back to find the
other combinations.

So to get all the combinations you would probably require to give
different patterns each time.

Sep 12 '06 #3

MonkeeSage wrote:
Licheng Fang wrote:
Basically, the problem is this:
>>p = re.compile("do| dolittle")
>>p.match("doli ttle").group()
'do'
From what I understand, this isn't python specific, it is the expected
behavior of that pattern in any implementation. You are using
alternation, which means "either, or", and you have the shorter
subexpression first, so the condition is satisfied by just 'do' and the
matching terminates.
There's another example:
>>p = re.compile("one (self)?(selfsuf ficient)?")
>>p.match("ones elfsufficient") .group()
'oneself'

Again, I don't think this has anything to do with python. You pattern
basically means "match 'one' whether it is followed by 'self' or not,
and whether it is followed by 'selfsufficient ' or not". For this
particular example, you'd want something like
"one(self)?(suf ficient)?".

I think you could construct a pattern that would do what you want in
python without any problem. If you post a (short) example of your data,
I'm sure someone could help you with it.

Regards,
Jordan
Hi, according to these regexp engine discussions, it's NOT a behavior
true to any implementation.

http://www.softec.st/en/OpenSource/D...onEngines.html
http://www.softec.st/en/OpenSource/D...onEngines.html

Python's NFA engine reads along the input string, matching it to the
pattern, and backtracking when needed. By contrast a DFA engine, to my
understanding, constructs a DFA and uses it to munch as many characters
as possible. Maybe it's like this:

Pattern: one(self)?(self sufficient)?

PYTHON'S NFA ENGINE:

one self, none selfsufficient, none
(start)------->((1))------------>((2))----------------------->((3))

DFA ENGINE:

one self
(start)------->((123))------------>((23))
|
|
| selfsufficient
--------------->((3))

I want to know if there is some way to make Python RE behave like grep
does, or do I have to change to another engine?

Sep 12 '06 #4
>
I want to know if there is some way to make Python RE behave like grep
does, or do I have to change to another engine?
Maybe if you posted a (tested) grep example and data, that does as you
want, the group could better understand what you are asking for?

- Paddy.

Sep 12 '06 #5
[Licheng Fang]
...
I want to know if there is some way to make Python RE behave like grep
does,
Not in general, no. The matching strategies couldn't be more
different, and that's both deep and intentional. See Friedl's book
for details:

http://regex.info/
or do I have to change to another engine?
Yes, if POSIX regexp semantics are what you require. Several years
ago there was an extension module for Python supplying POSIX
semantics, but I couldn't find anything current just now in a minute
of searching. You may be more motivated to search longer ;-)
Sep 12 '06 #6
Licheng Fang wrote:
Hi, according to these regexp engine discussions, it's NOT a behavior
true to any implementation.
[snip]
Well, I just double-checked in ruby (oniguruma regexp engine):

r = Regexp.new("do| dolittle")
puts r.match("dolitt le")[0]
# do

r = Regexp.new("one (self)?(suffici ent)?")
puts r.match("onesel fsufficient")[0]
# oneself

And perl:

if ("doolittle" =~
/(do|dolittle)/) {
print "$1\n";
# do
}

if ("oneselfsuffic ient" =~
/(one(self)?(sel fsufficient)?)/) {
print "$1\n";
# oneself
}

And Javascript (whatever regexp engine Spidermonkey uses):

var r = new RegExp(/do|dolittle/);
alert("dolittle ".match(r)[0]);

var r = new RegExp(/one(self)?(self sufficient)?/);
alert("oneselfs ufficient".matc h(r)[0]);

So, it seems they are all broken, or python is correct as well.

Regards,
Jordan

Sep 12 '06 #7
MonkeeSage wrote:
So, it seems they are all broken, or python is correct as well.
Aha, sorry about that Licheng (re: Tim's post). I guess "broken"
depends on if you are expecting perl-compatible behavior or otherwise.
I have my own scripts I use to do (f)grep and sed-like operations, so I
almost never use those programs and forgot about the different pattern
semantics (part of the reason I made my own scripts).

Regards,
Jordan

Sep 12 '06 #8
Oh, please do have a look at the second link I've posted. There's a
table comparing the regexp engines. The engines you've tested probably
all use an NFA implementation.

MonkeeSage wrote:
Licheng Fang wrote:
Hi, according to these regexp engine discussions, it's NOT a behavior
true to any implementation.
[snip]

Well, I just double-checked in ruby (oniguruma regexp engine):

r = Regexp.new("do| dolittle")
puts r.match("dolitt le")[0]
# do

r = Regexp.new("one (self)?(suffici ent)?")
puts r.match("onesel fsufficient")[0]
# oneself

And perl:

if ("doolittle" =~
/(do|dolittle)/) {
print "$1\n";
# do
}

if ("oneselfsuffic ient" =~
/(one(self)?(sel fsufficient)?)/) {
print "$1\n";
# oneself
}

And Javascript (whatever regexp engine Spidermonkey uses):

var r = new RegExp(/do|dolittle/);
alert("dolittle ".match(r)[0]);

var r = new RegExp(/one(self)?(self sufficient)?/);
alert("oneselfs ufficient".matc h(r)[0]);

So, it seems they are all broken, or python is correct as well.

Regards,
Jordan
Sep 12 '06 #9
Licheng Fang wrote:
Oh, please do have a look at the second link I've posted. There's a
table comparing the regexp engines. The engines you've tested probably
all use an NFA implementation.
Sorry! *blush* I admit I skipped over your links. I'll have a look now.

BTW, just an idea that may or may not work. What about finding all
matches that meet the absolute baseline pattern, then taking the
longest of them...somethin g like this mabye:

def matcher(strings , pattern):
out = ''
reg = re.compile(patt ern)
for string in strings:
match = reg.match(strin g).group()
if (len(match) >= len(out)): # should this use or >= ?
out = match
return out # empty is no matches, else longest match

p = ['dodad', 'dolittle', 'dodaday']
print matcher(p, r'do.*')
# dolittle

Just a thought...

Regards,
Jordan

Sep 12 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
5086
by: Chris Stromberger | last post by:
When issuing updates in mysql (in the console window), mysql will tell you if any rows matched and how many rows were updated (see below). I know how to get number of rows udpated using MySQLdb, but is there any way to get the number of rows matched? I want to find out, when rows updated = 0, if there were no updates because the row wasn't found (rows matched will = 0) or because the update would not have changed any data (rows matched =...
33
9168
by: Jim Hill | last post by:
I've done some Googling around on this and it seems like creating a here document is a bit tricky with Python. Trivial via triple-quoted strings if there's no need for variable interpolation but requiring a long, long formatted arglist via (%s,%s,%s,ad infinitum) if there is. So my question is: Is there a way to produce a very long multiline string of output with variables' values inserted without having to resort to this wacky """v...
3
1757
by: Mark Shroyer | last post by:
I guess this sort of falls under the "shameless plug" category, but here it is: Recently I used a custom metaclass in a Python program I've been working on, and I ended up doing a sort of write-up on it, as an example of what a "real life" __metaclass__ might do for those who may never have seen such a thing themselves. http://markshroyer.com/blog/2007/11/09/tilting-at-metaclass-windmills/ So what's the verdict? Incorrect? Missed the...
3
1515
by: Evan | last post by:
Hello, one of my PC is window system, and in "control panel -Network Connections", I can see some network connections such as PPPOE or VPN which I created by click "create a new connection". My question is, is it possible to create a new connection by using Python script? which means I do not want to use Window UI (via "control panel"), if it is possible, I can save so many time to create various network connection when I want to do...
15
14776
by: Kurda Yon | last post by:
Hi, I try to "build" and "install" pysqlite? After I type "python setup.py build" I get a lot of error messages? The first error is "src/ connection.h:33:21: error: sqlite3.h: No such file or directory". So, I assume that the absence of the "sqlite3.h" is the origin of the problem. I found on the web, that this file should be either in "/usr/local/ include" or in "/usr/local/lib". I check this directories and I really
0
8888
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8752
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9257
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8097
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4519
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4784
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3221
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2635
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2157
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.