423,103 Members | 1,428 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 423,103 IT Pros & Developers. It's quick & easy.

Why does re.sub('.*?', '-', 'abc') return '-a-b-c-' instead of '-------'?

P: 1
This is the results from python2.7.

>>> re.sub('.*?', '-', 'abc')
'-a-b-c-'
The results I thought should be as follows.

>>> re.sub('.*?', '-', 'abc')
'-------'
But it's not. Why?
1 Week Ago #1
Share this Question
Share on Google+
1 Reply


zmbd
Expert Mod 5K+
P: 5,202
Reference: Regular expression operations

Breaking down the expression
. - dot ignore newline match all other characters
* - match zero or more repetitions of the proceeding RE
such that "ab*" will match "ab" "abb" "abbbb" etc...
? - match zero or 1 repetitions of the proceeding RE
such that "ab?" it will match "a" or "ab"

So we now have the construct *?
Adding the ? after the * says only match the first instance.
That is to say * by itself will match as many as possible - it's greedy.
*? is not greedy and matches as few as possible.
SO for string="abc"; rex=".*?"; will match only the "a"

Change the expression to - with global and multi set true:
re.sub('.','-','abc')
this should render to ' - - - '

I know that this will render "---" also with global and multi set true:
re.sub('[a-zA-Z0-9]','-','abc')

You need a good primer on Regular Expressions

>I've found this one: https://www.rexegg.com/
It appears to cover the basics and a bit of extra too.
1 Week Ago #2

Post your reply

Sign in to post your reply or Sign up for a free account.