Hello,
The program given below returns the lines:
a
ab
Is there a way to use python regular expressions such that the program
would return the following lines?
ab
ab
############### ############### ############### ############### ############
import re
rx1 = re.compile("(a| ab)")
rx2 = re.compile("(ab |a)")
out1 = rx1.search("ab" )
out2 = rx2.search("ab" )
print out1.group(1)
print out2.group(1)
############### ############### ############### ############### ############
Jörg 12 2974
Joerg Schuster wrote: The program given below returns the lines:
a ab
Is there a way to use python regular expressions such that the program would return the following lines?
ab ab
############### ############### ############### ############### ############
import re
rx1 = re.compile("(a| ab)") rx2 = re.compile("(ab |a)")
Have you checked the documentation for "re"?
It reads:
"|" A|B, where A and B can be arbitrary REs, creates a regular expression
that will match either A or B. An arbitrary number of REs can be separated
by the "|" in this way. This can be used inside groups (see below) as well.
As the target string is scanned, REs separated by "|" are tried from left to
right. When one pattern completely matches, that branch is accepted. This
means that once A matches, B will not be tested further, even if it would
produce a longer overall match. In other words, the "|" operator is never
greedy.
------
Seems pretty clear and explicit to me. Your example is basically a working
proof of the above code, so I'm not sure what you were expecting differently.
-Peter
Peter Hansen <pe***@engcorp. com> writes: produce a longer overall match. In other words, the "|" operator is never greedy.
O.k. Thanks for pointing this out. Maybe I should have formulated my
question differently: Is there a trick (be it dirty or not) to make
"|" greedy in python?
Jörg
Peter Hansen <pe***@engcorp. com> writes: produce a longer overall match. In other words, the "|" operator is never greedy.
O.k. Thanks for pointing this out. Maybe I should have formulated my
question differently: Is there a trick (be it dirty or not) to make
"|" greedy in python?
Jörg
Joerg Schuster wrote: Peter Hansen <pe***@engcorp. com> writes:
produce a longer overall match. In other words, the "|" operator is never greedy.
O.k. Thanks for pointing this out. Maybe I should have formulated my question differently: Is there a trick (be it dirty or not) to make "|" greedy in python?
sort the re by size first?
Pádraig. Pa*****@Linux.i e writes: Joerg Schuster wrote: Peter Hansen <pe***@engcorp. com> writes:
produce a longer overall match. In other words, the "|" operator is never greedy. O.k. Thanks for pointing this out. Maybe I should have formulated my question differently: Is there a trick (be it dirty or not) to make "|" greedy in python?
sort the re by size first?
The point is not to get the match of the longest part of the
disjunction, but to get the match of that part of the disjunction
which is the longest one. (The match of ".*" may be much longer than
the match of "abc", although the latter regex contains more
characters.)
Jörg
Joerg Schuster <js@cis.uni-muenchen.de> writes: Pa*****@Linux.i e writes:
Joerg Schuster wrote: Peter Hansen <pe***@engcorp. com> writes:
> produce a longer overall match. In other words, the "|" operator > is never greedy. O.k. Thanks for pointing this out. Maybe I should have formulated my question differently: Is there a trick (be it dirty or not) to make "|" greedy in python?
sort the re by size first?
The point is not to get the match of the longest part of the disjunction, but to get the match of that part of the disjunction which is the longest one. (The match of ".*" may be much longer than the match of "abc", although the latter regex contains more characters.)
Jörg
I should have read your mail more carefully. You wrote about sorting
the *re*, not about sorting the regex. Sorry.
Jörg
Joerg Schuster wrote: Peter Hansen <pe***@engcorp. com> writes:
produce a longer overall match. In other words, the "|" operator is never greedy.
O.k. Thanks for pointing this out. Maybe I should have formulated my question differently: Is there a trick (be it dirty or not) to make "|" greedy in python?
I don't think so - as regexps are finite state automata, their capabilities
are limited to what these machines can do. So they can't backtrack to a
part of the disjunction that was a shorter match if the longer match
suddenly will fail. Consider this example:
rex: a|abc
string: abd
If the state-machine was greedy, it would follow the second path and fail.
The only thing you can do to avoid is is to split your disjunctions and
search separately, and afterwards takt the longest part. That could be done
using a small grammar, that allows you to detect the disjunctions and
create separate rexes for it.
Diez
Joerg Schuster wrote: I should have read your mail more carefully. You wrote about sorting the *re*, not about sorting the regex. Sorry.
what's the difference?
</F>
Joerg Schuster wrote: O.k. Thanks for pointing this out. Maybe I should have formulated my question differently: Is there a trick (be it dirty or not) to make "|" greedy in python?
sort the re by size first?
The point is not to get the match of the longest part of the disjunction, but to get the match of that part of the disjunction which is the longest one. (The match of ".*" may be much longer than the match of "abc", although the latter regex contains more characters.)
you can use "sre_parse.pars e(x).getwidth() " on a subexpression, to
get the shortest/longest possible match. from sre_parse import parse parse("a?").get width()
(0, 1) parse("ab").get width()
(2, 2) parse(".+").get width()
(1, 65535)
(where >=65535 should be interpreted as "any number")
</F> This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: alexk |
last post by:
Hi,
I would like to request your help.
My problem is as follows. I want to match urls, and therefore I have a
group
of long valid domain names in my regex:
..... (?:com|org|net|biz|info|ac|cc|gs|ms|
sh|st|tc|tf|tj|to|vg|ad|ae|af|ag|
|
by: Keith Chadwick |
last post by:
XML
<data>
<option>this is test 1</option>
<option>this is test 11</option>
<option>this is test 111</option>
<option>this is test 1111</option>
<option>this is test 11111</option>
</data>
|
by: Horizon |
last post by:
Hi,
I would like to build a regex that returns me the table list of a SQL SELECT
request like :
-SELECT * FROM tab1, tab2, tab3
-SELECT * FROM tab1, tab2 WHERE col="abc"
-SELECT * FROM tab1, tab2 LIMIT 1
I tried something like that :
---
|
by: Brian Henry |
last post by:
I have a messaging application that has a data grid with information like an
email list would have (from, subject, time sent, size) but the subject
could be very long in theory, and then it would word wrap.. the subject is a
variable width column (resizes to fit space) while the others are fixed
size... the problem is that i dont want the text to ever word wrap.. but
truncate the visible subject to match the current cell width... (kinda...
|
by: Licheng Fang |
last post by:
Basically, the problem is this:
'do'
Python's NFA regexp engine trys only the first option, and happily
rests on that. There's another example:
'oneself'
The Python regular expression engine doesn't exaust all the
| |
by: EXI-Andrews, Jack |
last post by:
the '*' and '+' don't seem to be greedy.. they will consume less in
order to match a string:
('aa', 'ab')
this is the sort of behaviour i'd expect from
'(a+?)(ab)'
a+ should greedily consume a's at the expense of the string not matching
|
by: hn.ft.pris |
last post by:
"Forward maximum match" mean that there is a dictionary contains
thousands of items which are single words. For example, "kid", "nice",
"best"...
And here get a string "kidnicebs", the forward maximum match algorithm
need to find the longest string that matches the string in dictionary.
Because the longest string in the dictionary has a length of 4, so the
algoritm pick the first four characters of the string(that's why it's
called...
|
by: Thomas Mlynarczyk |
last post by:
Hello,
I have two arrays like this:
$aSearch = array( 'A', 'B', 'C.D', 'E', 'F' );
$aSubject = array( 'A' =0, 'A.B' =1, 'X' =2, 'C.D.E' =3, 'A.B.C' =>
4 );
Now I want to search $aSubject for the longest key that consists of the
first x elements of $aSearch concatenated by "." (where x = 1...count(
|
by: Marek Kubica |
last post by:
Hi,
I am trying to get this stuff working, but I still fail.
I have a format which consists of three elements:
\d{4}M?-\d (4 numbers, optional M, dash, another number)
EMPTY (the <EMPTYtoken)
(the <PAGEBREAKtoken. The line may contain whitespaces,
but nothing else)
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |