473,408 Members | 1,830 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,408 software developers and data experts.

Regexps and lists

I don't know enough to write an R.E. engine so forgive me if I am
being naive.
I have had to atch text involving lists in the past. These are usually
comma separated words such as
"egg,beans,ham,spam,spam"
you can match that with:
r"(\w+)(,\w+)*"
and when you look at the groups you get the following
>>import re
re.match(r"(\w+)(,\w+)*", "egg,beans,ham,spam,spam").groups()
('egg', ',spam')
>>>
Notice how you only get the last match as the second groups value.

It would be nice if a repeat operator acting on a group turned that
group into a sequence returning every match, in order. (or an empty
sequence for no matches).

The above exaple would become:
>>import re
re.newmatch(r"(\w+)(,\w+)*", "egg,beans,ham,spam,spam").groups()
('egg', ('beans', 'ham', 'spam', ',spam'))
>>>
1, Is it possible? do any other RE engines do this?
2, Should it be added to Python?

- Paddy.

Feb 11 '07 #1
1 984
On Feb 12, 9:08 am, "Paddy" <paddy3...@googlemail.comwrote:
I don't know enough to write an R.E. engine so forgive me if I am
being naive.
I have had to atch text involving lists in the past. These are usually
comma separated words such as
"egg,beans,ham,spam,spam"
you can match that with:
r"(\w+)(,\w+)*"
You *can*, but why do that? What are you trying to achieve? What is
the point of distinguishing the first element from the remainder?

See if any of the following do what you want:

| >>s = "egg,beans,ham,spam,spam"
| >>s.split(',')
| ['egg', 'beans', 'ham', 'spam', 'spam']
| >>import re
| >>re.split(r",", s)
| ['egg', 'beans', 'ham', 'spam', 'spam']
| >>re.split(r"(,)", s)
| ['egg', ',', 'beans', ',', 'ham', ',', 'spam', ',', 'spam']
and when you look at the groups you get the following
>import re
re.match(r"(\w+)(,\w+)*", "egg,beans,ham,spam,spam").groups()
('egg', ',spam')

Notice how you only get the last match as the second groups value.

It would be nice if a repeat operator acting on a group turned that
group into a sequence returning every match, in order. (or an empty
sequence for no matches).

The above exaple would become:
>>import re>>re.newmatch(r"(\w+)(,\w+)*", "egg,beans,ham,spam,spam").groups()

('egg', ('beans', 'ham', 'spam', ',spam'))
And then what are you going to do with the answer? Something like
this, maybe:

| >>actual_answer = ('egg', ('beans', 'ham', 'spam', ',spam'))
| >>[actual_answer[0]] +list(actual_answer[1])
| ['egg', 'beans', 'ham', 'spam', ',spam']

1, Is it possible?
Maybe, but I doubt the utility ...
do any other RE engines do this?
If your Google is not working, then mine isn't either.
2, Should it be added to Python?
No.

HTH,

John

Feb 11 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: R. Tarazi | last post by:
Hello together, I'm having extreme difficulties using RegExps for a specific problem and would really appreciate any help and hope somebody will read through my "long" posting... 1. <?php...
5
by: Klaus Alexander Seistrup | last post by:
Hi, Is there a way to "expand" simple regexps? Something along the lines of: #v+ >>> rx = '(a|b)c?(d|f)' >>> expand_regexp(rx)
4
by: Magnus Lie Hetland | last post by:
Hi! I've been looking at ways of dealing with nested structures in regexps (becuase I figured that would be faster than the Python parsing code I've currently got) and came across a few...
9
by: Dave H | last post by:
Hello, I have a query regarding definition lists. Is it good practice semantically to use the dt and dd elements to mark up questions and answers in a frequently asked questions list, or FAQ? ...
3
by: s_subbarayan | last post by:
Dear all, 1)In one of our implementation for an application we are supposed to collate two linked lists.The actual problem is like this: There are two singularly linked lists, the final output...
4
by: possibilitybox | last post by:
I'm trying to make a unicode friendly regexp to grab sentences reasonably reliably for as many unicode languages as possible, focusing on european languages first, hence it'd be useful to be able...
1
by: Simon Forman | last post by:
I've got a function that I'd like to improve. It takes a list of lists and a "target" element, and it returns the set of the items in the lists that appear either before or after the target...
16
by: Michael M. | last post by:
How to find the longst element list of lists? I think, there should be an easier way then this: s1 = s2 = s3 = if len(s1) >= len(s2) and len(s1) >= len(s3): sx1=s1 ## s1 ist längster
2
by: Yorian | last post by:
I just started to try regexps in php and I didn't have too many problems, however I found a few when trying to build a templte engine. The first one is found is the dollar sign. In my template I...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.