473,520 Members | 2,188 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Capturing repeating group matches in regular expressions

Is it possible to capture the results of repeating group matches in the
python regular expression module?

To illustrate, what I want is:
re1 = re.compile("([a-z]W)([a-z]X)+([a-z]Y)");
mo1 = re.match("aWbXcXdXeXfY"); print mo1.groupsButNotAsWeKnowIt() ('aW','bX','cX','dX','eX','fY')

instead of
print mo1.groups()

("aW", "eX", "fY")

.... which captures only the last match from the second group.

Of course, one option is to break out the substring containing the
repeating group and then use split() or findall() within the substring,
but, but, but ... I'd like to do it in one hit if possible.

I believe someone has raised a similar question before, but I can't find
a definitive answer. It may be so stunningly obvious that nobody ever
bothers to answer - if so, could some kind soul please humour me and at
least point me to what I'm not seeing in the Fine Manual.

Thanks,
James.
Jul 18 '05 #1
4 7838
James Collier <ja***********@xtra.co.nz> writes:
Is it possible to capture the results of repeating group matches in
the python regular expression module?


Not easily; there's a small dicussion on python-dev at the moment
about this. Erik Heneryd hacked up something that might be useful:

http://mail.python.org/pipermail/pyt...structmatch.py

And there's always the "use a real parser" option :-)

Cheers,
mwh

--
All obscurity will buy you is time enough to contract venereal
diseases. -- Tim Peters, python-dev
Jul 18 '05 #2
It's a bit wordy, but perhaps the ability to easily structure and retrieve
your returned tokens may sway you.

Download pyparsing at http://pyparsing.sourceforge.net

-- Paul
from pyparsing import Word,OneOrMore

# define parse grammar
lowers = "abcdefghijklmnopqrstuvwxyz"
endsWithW = Word(lowers,"W",exact=2)
endsWithX = Word(lowers,"X",exact=2)
endsWithY = Word(lowers,"Y",exact=2)

patt = endsWithW.setResultsName("W") + \
OneOrMore( endsWithX ).setResultsName("X") + \
endsWithY.setResultsName("Y")

# extract tokens from input string
tokens = patt.parseString("aWbXcXdXeXfY")

# tokens can be accessed as a list
print "tokens:",tokens

# tokens can be coerced to be a true list
print "tokens.asList():",tokens.asList()

# tokens can be a dictionary, if results names specified
print "tokens.keys():",tokens.keys()
print "tokens['W']:",tokens['W']
print "tokens['X']:",tokens['X']
print "tokens['Y']:",tokens['Y']

# if results names are valid attribute names, can even look like attribute
print "tokens.W:",tokens.W
print "tokens.X:",tokens.X
print "tokens.Y:",tokens.Y
Gives:

tokens: ['aW', 'bX', 'cX', 'dX', 'eX', 'fY']
tokens.asList(): ['aW', 'bX', 'cX', 'dX', 'eX', 'fY']
tokens.keys(): ['Y', 'X', 'W']
tokens['W']: aW
tokens['X']: ['bX', 'cX', 'dX', 'eX']
tokens['Y']: fY
tokens.W: aW
tokens.X: ['bX', 'cX', 'dX', 'eX']
tokens.Y: fY

Jul 18 '05 #3
Michael Hudson [ mwh at python.net ] writes:
James Collier <ja***********@xtra.co.nz> writes:
Is it possible to capture the results of repeating group matches in
the python regular expression module?


Not easily; there's a small dicussion on python-dev at the moment
about this. Erik Heneryd hacked up something that might be useful:

http://mail.python.org/pipermail/pyt...structmatch.py

And there's always the "use a real parser" option :-)

Cheers,
mwh


Many thanks for the answer Michael - I take your point on the "real parser"
option, but I don't feel that the nut I'm cracking has that thick a shell.

It is some coincidence that this should be under current discussion on
python-dev. For what it's worth, I'd support Mike Coleman's PEP.

To give some more background, I'm tweaking someone else's code and therefore
I want to keep the change as concise as is reasonable. structmatch() is exactly
what I'm looking for - but for now I'll just split the task into two parts.

Thanks again -- James.
Jul 18 '05 #4
James Collier wrote:
Michael Hudson [ mwh at python.net ] writes:

James Collier <ja***********@xtra.co.nz> writes:

Is it possible to capture the results of repeating group matches in
the python regular expression module?


Not easily; there's a small dicussion on python-dev at the moment
about this. Erik Heneryd hacked up something that might be useful:

http://mail.python.org/pipermail/pyt...structmatch.py


Should be added that this is a hack in the true sense of the word. I
wouldn't use it for anything other than for what it was written -
experimenting.
Erik
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
3247
by: André Søreng | last post by:
With the re/sre module included with Python 2.4: pattern = "(?P<id1>avi)|(?P<id2>avi|mp3)" string2match = "some string with avi in it" matches = re.finditer(pattern, string2match) .... matches.groupdict() {'id2': None, 'id1': 'avi'} Which was expected since overlapping matches are ignored.
6
2557
by: Michael Winter | last post by:
I'll be frank: what is the point? The use, and usefulness, of a set of capturing parentheses in a regular expression is clear. However, isn't an expression such as "/(?:x)/", exactly the same as "/x/", or is the former a better way of including a literal substring within an expression? I know there must be a valid reason for including them,...
6
3248
by: Joseph | last post by:
I tried to find a script that would filter out repeating characters before saving the string. But no luck so far. For exemple, if a user writes 'haaaaaaaaaaaaaaa', i would like to get "haaa". If he writes "!!!!!!!!!!!!!", i would like "!!!"... Any suggestions? Thanks
33
5562
by: Joerg Schuster | last post by:
Hello, Python regular expressions must not have more than 100 capturing groups. The source code responsible for this reads as follows: # XXX: <fl> get rid of this limitation! if p.pattern.groups > 100: raise AssertionError( "sorry, but this version only supports 100 named groups"
0
1193
by: WALDO | last post by:
Assuming the following code: Dim strPattern As String = "(\d{3})(-)(\d{4})" Dim strMatch As String = "555-1234" Dim regExp As New RegEx(strPattern) Dim matches As MatchCollection = regExp.Matches(strMatch) Dim match As Match = matches(0) Dim groups As GroupCollection = match.Groups Dim group As Group = groups(1)
9
3341
by: Pete Davis | last post by:
I'm using regular expressions to extract some data and some links from some web pages. I download the page and then I want to get a list of certain links. For building regular expressions, I use an app call The Regulator, which makes it pretty easy to build and test regular expressions. As a warning, I'm real weak with regular...
1
4362
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find the first regular expression that matches the string. I've gor the regular expressions ordered so that the highest priority is first (if two or more...
6
2382
by: Jeff | last post by:
I always have trouble with javascript regexes... I want to parse apart a string and remember the matches and use them elswhere. Say I have: var test_string='one_two-three'; In perl I would do this;
3
6885
by: bagelman | last post by:
Hi, I want to find repeating words in a long string with Regular Expressions. I tried to write a regular expression but it didn't work. "\b(?<word>\w+)\s+(\k<word>)\b" This RegEx finds repeating words only if they are in consecutive orderçç I need a regular expression pattern that finds repeating words even if they are not ...
0
7225
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7618
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7184
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7582
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
5759
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5145
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3293
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3288
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1667
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.