<ei***********@ yahoo.comescrib ió en el mensaje
news:11******** *************@l 53g2000cwa.goog legroups.com...
hi
suppose i have a string like
test1?test2t-test3*test4*tes t5$test6#test7* test8
how can i construct the regexp to get test3*test4*tes t5 and
test7*test8, ie, i want to match * and the words before and after?
thanks
I suppose this is just an example and you mean "any word" instead of test1,
test2, etc.
So your pattern would be: word*word*word* word, that is, word* repeated many
times, followed by another word.
To match a word we'll use "\w+", to match an * we have to use "\*" (it's a
special character)
So the regexp would be: "(\w+\*)+\w +"
Since we are not interested in the () as a group by itself -it was just to
describe the repeating pattern- we change it into a non-grouping
parenthesis.
Final version: "(?:\w+\*)+ \w+"
import re
rexp = re.compile(r"(? :\w+\*)+\w+")
lines = [
'test1?test2t-test3*test4*tes t5$test6#test7* test8',
'test1?test2t-test3*test4$tes t6#test7_test8' ,
'test1?nada-que-ver$esto.no.mat chea',
'test1?test2t-test3*test4*',
'test1?test2t-test3*test4',
'test1?test2t-test3*',
]
for line in lines:
print line
for txt in rexp.findall(li ne):
print '->', txt
Test it with some corner cases and see if it does what you expect: no "*",
starting with "*", ending with "*", embedded whitespace before and after the
"*", whitespace inside a word, the very definition of "word"...
--
Gabriel Genellina