Hello,
Is there a simple flag to set to allow overlapping matches
for the findall() regular expression method? In other words,
if a string contains five occurrences of the string pattern
"cat", calling findall on the string returns a list
containing five "cat" strings. Is it possible for findall()
to just return one "cat" string?
Thanks 5 8446
"Mystilleef" wrote: Is there a simple flag to set to allow overlapping matches for the findall() regular expression method? In other words, if a string contains five occurrences of the string pattern "cat", calling findall on the string returns a list containing five "cat" strings. Is it possible for findall() to just return one "cat" string?
your definition of "overlapping" seems to be a bit odd, but assuming
your description is correct, the answer is no.
on the other hand, if you only want one hit, why not use "search"
instead of "findall" ?
</F>
Hello,
Thanks for your response. I was going by the definition in
the manual. I believe a search only returns the first
match of a regular expression pattern in a string and then
stops further searches if one is found. That's not what I
want.
I want a pattern that scans the entire string but avoids
returning duplicate matches. For example "cat", "cate",
"cater" may all well be valid matches, but I don't want
duplicate matches of any of them. I know I can filter the
list containing found matches myself, but that is somewhat
expensive for a list containing thousands of matches.
Thanks
On 15 Dec 2005 12:26:07 -0800, Mystilleef <my********@gmail.com> wrote: I want a pattern that scans the entire string but avoids returning duplicate matches. For example "cat", "cate", "cater" may all well be valid matches, but I don't want duplicate matches of any of them. I know I can filter the list containing found matches myself, but that is somewhat expensive for a list containing thousands of matches.
Probably the cheapest way of de-duping the list would be to dump it
straight into a set, provided that you aren't concerned about the
order.
--
Cheers,
Simon B, si***@brunningonline.net, http://www.brunningonline.net/simon/blog/
On Thu, 15 Dec 2005 20:33:42 +0000, Simon Brunning <si***@brunningonline.net> wrote: On 15 Dec 2005 12:26:07 -0800, Mystilleef <my********@gmail.com> wrote: I want a pattern that scans the entire string but avoids returning duplicate matches. For example "cat", "cate", "cater" may all well be valid matches, but I don't want duplicate matches of any of them. I know I can filter the list containing found matches myself, but that is somewhat expensive for a list containing thousands of matches.
Probably the cheapest way of de-duping the list would be to dump it straight into a set, provided that you aren't concerned about the order.
Or if concerned, maybe try a combination like: s = """\
... I want a pattern that scans the entire string but avoids
... returning duplicate matches. For example "cat", "cate",
... "cater" may all well be valid matches, but I don't want
... duplicate matches of any of them. I know I can filter the
... list containing found matches myself, but that is somewhat
... expensive for a list containing thousands of matches.
... """ import re rxo = re.compile(r'cat(?:er|e)?') rxo.findall(s)
['cate', 'cat', 'cate', 'cater', 'cate'] seen = set() [w for w in (m.group(0) for m in rxo.finditer(s)) if w not in seen and not seen.add(w)]
['cate', 'cat', 'cater']
BTW, note to put longer ambiguous match first in re, e.g., not r'cat(?:e|er)?') for above.
Regards,
Bengt Richter
Mystilleef wrote: Thanks for your response. I was going by the definition in the manual.
"non-overlapping" in that context means that if you e.g. search for "(ba)+"
in the string "bababa", you get one match ("bababa"), not three or six.
in your case, it sounds like you want a search for "ba" to return only one
match.
I know I can filter the list containing found matches myself, but that is somewhat expensive for a list containing thousands of matches.
if the order doesn't matter, you don't have to build a list: text = "cat catched catnip cat catatonic cat cat cat kat" set(m.group() for m in re.finditer("cat\w*", text))
set(['catatonic', 'catnip', 'catched', 'cat'])
if you need to preserve the order, you could use a combination of a
list and a set (or a dictionary):
s = set(); w = [] for m in re.finditer("cat\w*", text):
.... m = m.group()
.... if m not in s:
.... s.add(m); w.append(m)
.... w
['cat', 'catched', 'catnip', 'catatonic']
</F> This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Kenneth McDonald |
last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate
feedback, suggestions, and criticism as I work towards finalizing the
API and feature sets. rex is a module intended to make...
|
by: André Søreng |
last post by:
With the re/sre module included with Python 2.4:
pattern = "(?P<id1>avi)|(?P<id2>avi|mp3)"
string2match = "some string with avi in it"
matches = re.finditer(pattern, string2match)
.......
|
by: Chris Lasher |
last post by:
Hey guys and gals,
This is a followup of my "Counting all permutations of a substring"
thread (see...
|
by: Mike |
last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in
matches. I would like to get what the actual regular expression is.
In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
|
by: Chris |
last post by:
I need a pattern that matches a string that has the same number of '('
as ')':
findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) =
Can anybody help me out?
Thanks for any help!
|
by: Noah Hoffman |
last post by:
I have been trying to write a regular expression that identifies a
block of text enclosed by (potentially nested) parentheses. I've found
solutions using other regular expression engines (for...
|
by: Julien |
last post by:
Hi,
I'm fairly new in Python and I haven't used the regular expressions
enough to be able to achieve what I want.
I'd like to select terms in a string, so I can then do a search in my
database....
|
by: phasma |
last post by:
Hi, I'm trying extract all alphabetic characters from string.
reg = re.compile('(?u)(+)', re.UNICODE)
buf = re.match(string)
But it's doesn't work. If string starts from Cyrillic character,...
|
by: scsoce |
last post by:
MRAB wrote:
Yes, you are right, but this way findall() capture only the 'top' group.
What I really need to do is to capture nested and repated patterns, say,
<tabletag in html contains many <tr>, ...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
|
by: af34tf |
last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
| |