By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,100 Members | 2,979 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,100 IT Pros & Developers. It's quick & easy.

Need a Regular expression to remove a char for Unicode text

P: n/a
Hai friends,
Can any one tell me how can i remove a character from a unocode text.
కల్*&హార is a Telugu word in Unicode. Here i want to
remove '&' but not replace with a zero width char. And one more thing,
if any whitespaces are there before and after '&' char, the text should
be kept as it is. Please tell me how can i workout this with regular
expressions.

Thanks and regards
Srinivasa Raju Datla

Oct 13 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a

శ్రీనివాస wrote:
Hai friends,
Can any one tell me how can i remove a character from a unocode text.
కల్*&హార is a Telugu word in Unicode. Here i want to
remove '&' but not replace with a zero width char. And one more thing,
if any whitespaces are there before and after '&' char, the text should
be kept as it is. Please tell me how can i workout this with regular
expressions.

Thanks and regards
Srinivasa Raju Datla
Don't know anything about Telugu, but is this the approach you want?
>>x=u'\xfe\xff & \xfe\xff \xfe\xff&\xfe\xff'
noampre = re.compile('(?<!\s)&(?!\s)', re.UNICODE).sub
noampre('', x)
u'\xfe\xff & \xfe\xff \xfe\xff\xfe\xff'

The regular expression has negative look behind and look ahead
assertions to check that there is no whitespace surrounding the '&'
character. Each match then found is then replaced with the empty string

Oct 13 '06 #2

P: n/a
శ్రీనివాస enlightened us with:
Can any one tell me how can i remove a character from a unocode
text. కల్<200c>&హార is a Telugu word in Unicode. Here i want to
remove '&' but not replace with a zero width char. And one more
thing, if any whitespaces are there before and after '&' char, the
text should be kept as it is.
So basically, you want to match <200c>& and replace it with <200c>,
but only if it's not surrounded by whitespace, right?

r"(?<!\s)\x200c&(?!\s)" should match. I'm sure you'll be able to take
it from there.

Sybren
--
Sybren Stüvel
Stüvel IT - http://www.stuvel.eu/
Oct 13 '06 #3

P: n/a


On Oct 13, 4:44*am, harvey.tho...@informa.com wrote:
శ్రీనివాస wrote:
Hai friends,
Can any one tell me how can i remove a character from a unocode text.
కల్*&హార is a Telugu word in Unicode. Here i want to
remove '&' but not replace with a zero width char. And one more thing,
if any whitespaces are there before and after '&' char, the text should
be kept as it is. Please tell me how can i workout this with regular
expressions.
Thanks and regards
Srinivasa Raju DatlaDon't know anything about Telugu, but is this the approach you want?
>x=u'\xfe\xff & \xfe\xff \xfe\xff&\xfe\xff'
noampre = re.compile('(?<!\s)&(?!\s)', re.UNICODE).sub
noampre('', x)
He wants to replace & with zero width joiner so the last call should be
noampre(u"\u200D", x)

Oct 13 '06 #4

P: n/a
On Oct 13, 4:55*am, "Leo Kislov" <Leo.Kis...@gmail.comwrote:
On Oct 13, 4:44*am, harvey.tho...@informa.com wrote:
శ్రీనివాస wrote:
Hai friends,
Can any one tell me how can i remove a character from a unocode text.
కల్*&హార is aTelugu word in Unicode. Here i want to
remove '&' but not replace with a zero width char. And one more thing,
if any whitespaces are there before and after '&' char, the text should
be kept as it is. Please tell me how can i workout this with regular
expressions.
Thanks and regards
Srinivasa Raju DatlaDon't know anything about Telugu, but is this theapproach you want?
>>x=u'\xfe\xff & \xfe\xff \xfe\xff&\xfe\xff'
>>noampre = re.compile('(?<!\s)&(?!\s)', re.UNICODE).sub
>>noampre('', x)
He wants to replace & with zero width joiner so the last call should be
noampre(u"\u200D", x)
Pardon my poor reading comprehension, OP doesn't want zero width
joiner. Though I'm confused why he mentioned it at all.

Oct 13 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.