By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,275 Members | 1,745 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,275 IT Pros & Developers. It's quick & easy.

regular expression unicode character class trouble

P: n/a
Hi,

I need in a unicode-environment the character-class

set("\w") - set("[0-9]")

or aplha w/o num. Any ideas how to create that? And what performance
implications do I have to fear? I mean I guess that the characterclasses
aren't implementet as sets, but as comparison-function that compares a
value with certain well-defined ranges.

Regards,

Diez
Sep 4 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Diez B. Roggisch wrote:
Hi,

I need in a unicode-environment the character-class

set("\w") - set("[0-9]")

or aplha w/o num. Any ideas how to create that?


I'd use something like r"[^_\d\W]", that is, all things that are neither
underscores, digits or non-alphas. In action:

py> re.findall(r'[^_\d\W]+', '42badger100x__xxA1BC')
['badger', 'x', 'xxA', 'BC']

HTH,

STeVe
Sep 4 '05 #2

P: n/a
Steven Bethard wrote:
I'd use something like r"[^_\d\W]", that is, all things that are neither
underscores, digits or non-alphas. In action:

py> re.findall(r'[^_\d\W]+', '42badger100x__xxA1BC')
['badger', 'x', 'xxA', 'BC']

HTH,


Seems so, great!

Diez
Sep 5 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.