469,282 Members | 2,000 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,282 developers. It's quick & easy.

Regexps and lists

I don't know enough to write an R.E. engine so forgive me if I am
being naive.
I have had to atch text involving lists in the past. These are usually
comma separated words such as
"egg,beans,ham,spam,spam"
you can match that with:
r"(\w+)(,\w+)*"
and when you look at the groups you get the following
>>import re
re.match(r"(\w+)(,\w+)*", "egg,beans,ham,spam,spam").groups()
('egg', ',spam')
>>>
Notice how you only get the last match as the second groups value.

It would be nice if a repeat operator acting on a group turned that
group into a sequence returning every match, in order. (or an empty
sequence for no matches).

The above exaple would become:
>>import re
re.newmatch(r"(\w+)(,\w+)*", "egg,beans,ham,spam,spam").groups()
('egg', ('beans', 'ham', 'spam', ',spam'))
>>>
1, Is it possible? do any other RE engines do this?
2, Should it be added to Python?

- Paddy.

Feb 11 '07 #1
1 933
On Feb 12, 9:08 am, "Paddy" <paddy3...@googlemail.comwrote:
I don't know enough to write an R.E. engine so forgive me if I am
being naive.
I have had to atch text involving lists in the past. These are usually
comma separated words such as
"egg,beans,ham,spam,spam"
you can match that with:
r"(\w+)(,\w+)*"
You *can*, but why do that? What are you trying to achieve? What is
the point of distinguishing the first element from the remainder?

See if any of the following do what you want:

| >>s = "egg,beans,ham,spam,spam"
| >>s.split(',')
| ['egg', 'beans', 'ham', 'spam', 'spam']
| >>import re
| >>re.split(r",", s)
| ['egg', 'beans', 'ham', 'spam', 'spam']
| >>re.split(r"(,)", s)
| ['egg', ',', 'beans', ',', 'ham', ',', 'spam', ',', 'spam']
and when you look at the groups you get the following
>import re
re.match(r"(\w+)(,\w+)*", "egg,beans,ham,spam,spam").groups()
('egg', ',spam')

Notice how you only get the last match as the second groups value.

It would be nice if a repeat operator acting on a group turned that
group into a sequence returning every match, in order. (or an empty
sequence for no matches).

The above exaple would become:
>>import re>>re.newmatch(r"(\w+)(,\w+)*", "egg,beans,ham,spam,spam").groups()

('egg', ('beans', 'ham', 'spam', ',spam'))
And then what are you going to do with the answer? Something like
this, maybe:

| >>actual_answer = ('egg', ('beans', 'ham', 'spam', ',spam'))
| >>[actual_answer[0]] +list(actual_answer[1])
| ['egg', 'beans', 'ham', 'spam', ',spam']

1, Is it possible?
Maybe, but I doubt the utility ...
do any other RE engines do this?
If your Google is not working, then mine isn't either.
2, Should it be added to Python?
No.

HTH,

John

Feb 11 '07 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by R. Tarazi | last post: by
5 posts views Thread by Klaus Alexander Seistrup | last post: by
4 posts views Thread by Magnus Lie Hetland | last post: by
9 posts views Thread by Dave H | last post: by
3 posts views Thread by s_subbarayan | last post: by
16 posts views Thread by Michael M. | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.