473,473 Members | 2,002 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Why does re.sub('.*?', '-', 'abc') return '-a-b-c-' instead of '-------'?

1 New Member
This is the results from python2.7.

>>> re.sub('.*?', '-', 'abc')
'-a-b-c-'
The results I thought should be as follows.

>>> re.sub('.*?', '-', 'abc')
'-------'
But it's not. Why?
Aug 1 '18 #1
1 2174
zmbd
5,501 Recognized Expert Moderator Expert
Reference: Regular expression operations

Breaking down the expression
. - dot ignore newline match all other characters
* - match zero or more repetitions of the proceeding RE
such that "ab*" will match "ab" "abb" "abbbb" etc...
? - match zero or 1 repetitions of the proceeding RE
such that "ab?" it will match "a" or "ab"

So we now have the construct *?
Adding the ? after the * says only match the first instance.
That is to say * by itself will match as many as possible - it's greedy.
*? is not greedy and matches as few as possible.
SO for string="abc"; rex=".*?"; will match only the "a"

Change the expression to - with global and multi set true:
re.sub('.','-','abc')
this should render to ' - - - '

I know that this will render "---" also with global and multi set true:
re.sub('[a-zA-Z0-9]','-','abc')

You need a good primer on Regular Expressions

>I've found this one: https://www.rexegg.com/
It appears to cover the basics and a bit of extra too.
Aug 6 '18 #2

Sign in to post your reply or Sign up for a free account.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.