Regexps and lists

Paddy

I don't know enough to write an R.E. engine so forgive me if I am
being naive.
I have had to atch text involving lists in the past. These are usually
comma separated words such as
"egg,beans,ham,spam,spam"
you can match that with:
r"(\w+)(,\w+)*"
and when you look at the groups you get the following

>>import re
re.match(r"(\w+)(,\w+)*", "egg,beans,ham,spam,spam").groups()

('egg', ',spam')

>>>

Notice how you only get the last match as the second groups value.

It would be nice if a repeat operator acting on a group turned that
group into a sequence returning every match, in order. (or an empty
sequence for no matches).

The above exaple would become:

>>import re
re.newmatch(r"(\w+)(,\w+)*", "egg,beans,ham,spam,spam").groups()

('egg', ('beans', 'ham', 'spam', ',spam'))

>>>

1, Is it possible? do any other RE engines do this?
2, Should it be added to Python?

- Paddy.

Feb 11 '07 #1

Subscribe Post Reply

984

John Machin

On Feb 12, 9:08 am, "Paddy" <paddy3...@googlemail.comwrote:

I don't know enough to write an R.E. engine so forgive me if I am
being naive.
I have had to atch text involving lists in the past. These are usually
comma separated words such as
"egg,beans,ham,spam,spam"
you can match that with:
r"(\w+)(,\w+)*"

You *can*, but why do that? What are you trying to achieve? What is
the point of distinguishing the first element from the remainder?

See if any of the following do what you want:

| >>s = "egg,beans,ham,spam,spam"
| >>s.split(',')
| ['egg', 'beans', 'ham', 'spam', 'spam']
| >>import re
| >>re.split(r",", s)
| ['egg', 'beans', 'ham', 'spam', 'spam']
| >>re.split(r"(,)", s)
| ['egg', ',', 'beans', ',', 'ham', ',', 'spam', ',', 'spam']

and when you look at the groups you get the following

>import re
re.match(r"(\w+)(,\w+)*", "egg,beans,ham,spam,spam").groups()

('egg', ',spam')

Notice how you only get the last match as the second groups value.

It would be nice if a repeat operator acting on a group turned that
group into a sequence returning every match, in order. (or an empty
sequence for no matches).

The above exaple would become:

>>import re>>re.newmatch(r"(\w+)(,\w+)*", "egg,beans,ham,spam,spam").groups()

('egg', ('beans', 'ham', 'spam', ',spam'))

And then what are you going to do with the answer? Something like
this, maybe:

| >>actual_answer = ('egg', ('beans', 'ham', 'spam', ',spam'))
| >>[actual_answer[0]] +list(actual_answer[1])
| ['egg', 'beans', 'ham', 'spam', ',spam']

1, Is it possible?

Maybe, but I doubt the utility ...

do any other RE engines do this?

If your Google is not working, then mine isn't either.

2, Should it be added to Python?

No.

HTH,

John

Feb 11 '07 #2

Similar topics

Can somebody help out with RegExps?

by: R. Tarazi | last post by:

Hello together, I'm having extreme difficulties using RegExps for a specific problem and would really appreciate any help and hope somebody will read through my "long" posting... 1. <?php...

PHP

Expanding regexps

by: Klaus Alexander Seistrup | last post by:

Hi, Is there a way to "expand" simple regexps? Something along the lines of: #v+ >>> rx = '(a|b)c?(d|f)' >>> expand_regexp(rx)

Python

Recursive regexps?

by: Magnus Lie Hetland | last post by:

Hi! I've been looking at ways of dealing with nested structures in regexps (becuase I figured that would be faster than the Python parsing code I've currently got) and came across a few...

Python

Definition lists as question/answer lists

by: Dave H | last post by:

Hello, I have a query regarding definition lists. Is it good practice semantically to use the dt and dd elements to mark up questions and answers in a frequently asked questions list, or FAQ? ...

HTML / CSS

collating (mixing) two linked lists using C

by: s_subbarayan | last post by:

Dear all, 1)In one of our implementation for an application we are supposed to collate two linked lists.The actual problem is like this: There are two singularly linked lists, the final output...

C / C++

Finding Upper-case characters in regexps, unicode friendly.

by: possibilitybox | last post by:

I'm trying to make a unicode friendly regexp to grab sentences reasonably reliably for as many unicode languages as possible, focusing on european languages first, hence it'd be useful to be able...

Python

finding items that occur before or after an item in lists

by: Simon Forman | last post by:

I've got a function that I'd like to improve. It takes a list of lists and a "target" element, and it returns the set of the items in the lists that appear either before or after the target...

Python

how to find the longst element list of lists

by: Michael M. | last post by:

How to find the longst element list of lists? I think, there should be an easier way then this: s1 = s2 = s3 = if len(s1) >= len(s2) and len(s1) >= len(s3): sx1=s1 ## s1 ist längster

Python

regexps: dollar sign, lookaheads/behinds and speedquestions

by: Yorian | last post by:

I just started to try regexps in php and I didn't have too many problems, however I found a few when trying to build a templte engine. The first one is found is the dollar sign. In my template I...

PHP

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp