Basically, the problem is this:
>>p = re.compile("do| dolittle") p.match("doli ttle").group()
'do'
Python's NFA regexp engine trys only the first option, and happily
rests on that. There's another example:
>>p = re.compile("one (self)?(selfsuf ficient)?") p.match("ones elfsufficient") .group()
'oneself'
The Python regular expression engine doesn't exaust all the
possibilities, but in my application I hope to get the longest possible
match, starting from a given point.
Is there a way to do this in Python? 32 14682
Licheng Fang wrote:
Basically, the problem is this:
>p = re.compile("do| dolittle") p.match("dolit tle").group()
'do'
>From what I understand, this isn't python specific, it is the expected
behavior of that pattern in any implementation. You are using
alternation, which means "either, or", and you have the shorter
subexpression first, so the condition is satisfied by just 'do' and the
matching terminates.
There's another example:
>p = re.compile("one (self)?(selfsuf ficient)?") p.match("onese lfsufficient"). group()
'oneself'
Again, I don't think this has anything to do with python. You pattern
basically means "match 'one' whether it is followed by 'self' or not,
and whether it is followed by 'selfsufficient ' or not". For this
particular example, you'd want something like
"one(self)?(suf ficient)?".
I think you could construct a pattern that would do what you want in
python without any problem. If you post a (short) example of your data,
I'm sure someone could help you with it.
Regards,
Jordan
Licheng Fang wrote:
Basically, the problem is this:
>p = re.compile("do| dolittle") p.match("dolit tle").group()
'do'
Python's NFA regexp engine trys only the first option, and happily
rests on that. There's another example:
>p = re.compile("one (self)?(selfsuf ficient)?") p.match("onese lfsufficient"). group()
'oneself'
The Python regular expression engine doesn't exaust all the
possibilities, but in my application I hope to get the longest possible
match, starting from a given point.
Is there a way to do this in Python?
This is the way the regexp works python doesn't has anything to do with
it. It starts parsing the data with the pattern given. It returns the
matched string acording the pattern and doesn't go back to find the
other combinations.
So to get all the combinations you would probably require to give
different patterns each time.
MonkeeSage wrote:
Licheng Fang wrote:
Basically, the problem is this:
>>p = re.compile("do| dolittle")
>>p.match("doli ttle").group()
'do'
From what I understand, this isn't python specific, it is the expected
behavior of that pattern in any implementation. You are using
alternation, which means "either, or", and you have the shorter
subexpression first, so the condition is satisfied by just 'do' and the
matching terminates.
There's another example:
>>p = re.compile("one (self)?(selfsuf ficient)?")
>>p.match("ones elfsufficient") .group()
'oneself'
Again, I don't think this has anything to do with python. You pattern
basically means "match 'one' whether it is followed by 'self' or not,
and whether it is followed by 'selfsufficient ' or not". For this
particular example, you'd want something like
"one(self)?(suf ficient)?".
I think you could construct a pattern that would do what you want in
python without any problem. If you post a (short) example of your data,
I'm sure someone could help you with it.
Regards,
Jordan
Hi, according to these regexp engine discussions, it's NOT a behavior
true to any implementation. http://www.softec.st/en/OpenSource/D...onEngines.html http://www.softec.st/en/OpenSource/D...onEngines.html
Python's NFA engine reads along the input string, matching it to the
pattern, and backtracking when needed. By contrast a DFA engine, to my
understanding, constructs a DFA and uses it to munch as many characters
as possible. Maybe it's like this:
Pattern: one(self)?(self sufficient)?
PYTHON'S NFA ENGINE:
one self, none selfsufficient, none
(start)------->((1))------------>((2))----------------------->((3))
DFA ENGINE:
one self
(start)------->((123))------------>((23))
|
|
| selfsufficient
--------------->((3))
I want to know if there is some way to make Python RE behave like grep
does, or do I have to change to another engine?
>
I want to know if there is some way to make Python RE behave like grep
does, or do I have to change to another engine?
Maybe if you posted a (tested) grep example and data, that does as you
want, the group could better understand what you are asking for?
- Paddy.
[Licheng Fang]
...
I want to know if there is some way to make Python RE behave like grep
does,
Not in general, no. The matching strategies couldn't be more
different, and that's both deep and intentional. See Friedl's book
for details: http://regex.info/
or do I have to change to another engine?
Yes, if POSIX regexp semantics are what you require. Several years
ago there was an extension module for Python supplying POSIX
semantics, but I couldn't find anything current just now in a minute
of searching. You may be more motivated to search longer ;-)
Licheng Fang wrote:
Hi, according to these regexp engine discussions, it's NOT a behavior
true to any implementation.
[snip]
Well, I just double-checked in ruby (oniguruma regexp engine):
r = Regexp.new("do| dolittle")
puts r.match("dolitt le")[0]
# do
r = Regexp.new("one (self)?(suffici ent)?")
puts r.match("onesel fsufficient")[0]
# oneself
And perl:
if ("doolittle" =~
/(do|dolittle)/) {
print "$1\n";
# do
}
if ("oneselfsuffic ient" =~
/(one(self)?(sel fsufficient)?)/) {
print "$1\n";
# oneself
}
And Javascript (whatever regexp engine Spidermonkey uses):
var r = new RegExp(/do|dolittle/);
alert("dolittle ".match(r)[0]);
var r = new RegExp(/one(self)?(self sufficient)?/);
alert("oneselfs ufficient".matc h(r)[0]);
So, it seems they are all broken, or python is correct as well.
Regards,
Jordan
MonkeeSage wrote:
So, it seems they are all broken, or python is correct as well.
Aha, sorry about that Licheng (re: Tim's post). I guess "broken"
depends on if you are expecting perl-compatible behavior or otherwise.
I have my own scripts I use to do (f)grep and sed-like operations, so I
almost never use those programs and forgot about the different pattern
semantics (part of the reason I made my own scripts).
Regards,
Jordan
Oh, please do have a look at the second link I've posted. There's a
table comparing the regexp engines. The engines you've tested probably
all use an NFA implementation.
MonkeeSage wrote:
Licheng Fang wrote:
Hi, according to these regexp engine discussions, it's NOT a behavior
true to any implementation.
[snip]
Well, I just double-checked in ruby (oniguruma regexp engine):
r = Regexp.new("do| dolittle")
puts r.match("dolitt le")[0]
# do
r = Regexp.new("one (self)?(suffici ent)?")
puts r.match("onesel fsufficient")[0]
# oneself
And perl:
if ("doolittle" =~
/(do|dolittle)/) {
print "$1\n";
# do
}
if ("oneselfsuffic ient" =~
/(one(self)?(sel fsufficient)?)/) {
print "$1\n";
# oneself
}
And Javascript (whatever regexp engine Spidermonkey uses):
var r = new RegExp(/do|dolittle/);
alert("dolittle ".match(r)[0]);
var r = new RegExp(/one(self)?(self sufficient)?/);
alert("oneselfs ufficient".matc h(r)[0]);
So, it seems they are all broken, or python is correct as well.
Regards,
Jordan
Licheng Fang wrote:
Oh, please do have a look at the second link I've posted. There's a
table comparing the regexp engines. The engines you've tested probably
all use an NFA implementation.
Sorry! *blush* I admit I skipped over your links. I'll have a look now.
BTW, just an idea that may or may not work. What about finding all
matches that meet the absolute baseline pattern, then taking the
longest of them...somethin g like this mabye:
def matcher(strings , pattern):
out = ''
reg = re.compile(patt ern)
for string in strings:
match = reg.match(strin g).group()
if (len(match) >= len(out)): # should this use or >= ?
out = match
return out # empty is no matches, else longest match
p = ['dodad', 'dolittle', 'dodaday']
print matcher(p, r'do.*')
# dolittle
Just a thought...
Regards,
Jordan This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Chris Stromberger |
last post by:
When issuing updates in mysql (in the console window), mysql will tell
you if any rows matched and how many rows were updated (see below). I
know how to get number of rows udpated using MySQLdb, but is there any
way to get the number of rows matched? I want to find out, when rows
updated = 0, if there were no updates because the row wasn't found
(rows matched will = 0) or because the update would not have changed
any data (rows matched =...
|
by: Jim Hill |
last post by:
I've done some Googling around on this and it seems like creating a here
document is a bit tricky with Python. Trivial via triple-quoted strings
if there's no need for variable interpolation but requiring a long, long
formatted arglist via (%s,%s,%s,ad infinitum) if there is. So my
question is:
Is there a way to produce a very long multiline string of output with
variables' values inserted without having to resort to this wacky
"""v...
|
by: Mark Shroyer |
last post by:
I guess this sort of falls under the "shameless plug" category, but
here it is: Recently I used a custom metaclass in a Python program
I've been working on, and I ended up doing a sort of write-up on it,
as an example of what a "real life" __metaclass__ might do for those
who may never have seen such a thing themselves.
http://markshroyer.com/blog/2007/11/09/tilting-at-metaclass-windmills/
So what's the verdict? Incorrect? Missed the...
|
by: Evan |
last post by:
Hello,
one of my PC is window system, and in "control panel -Network
Connections", I can see some network connections such as PPPOE or VPN
which I created by click "create a new connection".
My question is, is it possible to create a new connection by using
Python script? which means I do not want to use Window UI (via
"control panel"), if it is possible, I can save so many time to
create various network connection when I want to do...
|
by: Kurda Yon |
last post by:
Hi,
I try to "build" and "install" pysqlite? After I type "python setup.py
build" I get a lot of error messages? The first error is "src/
connection.h:33:21: error: sqlite3.h: No such file or directory". So,
I assume that the absence of the "sqlite3.h" is the origin of the
problem.
I found on the web, that this file should be either in "/usr/local/
include" or in "/usr/local/lib". I check this directories and I really
| |
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |