473,396 Members | 1,968 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Re: Regular expressions and Unicode


JeffreyHowever, when I apply it to this Unicode string, I get only the
Jeffreyfirst 3 letters of the surname:

Jeffreyname = 'Anton\xc3\xadn Dvo\xc5\x99\xc3\xa1k'

Maybe

name = unicode('Anton\xc3\xadn Dvo\xc5\x99\xc3\xa1k', "utf-8")

? Yup, that works:
>>name = unicode('Anton\xc3\xadn Dvo\xc5\x99\xc3\xa1k', "utf-8")
name
u'Anton\xedn Dvo\u0159\xe1k'
>>surname = r'(?u).+ (\w+)'
import re
surname_re = re.compile(surname)
m = surname_re.search(name)
m.groups()
(u'Dvo\u0159\xe1k',)
Oct 2 '08 #1
0 759

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: RN | last post by:
Hi, I don't know if this is the right place to ask this, but - what's the difference between posix and perl regular expressions? A good example is "aa|bb". Will this match "aa" or "ab" in...
15
by: Roz Lee | last post by:
I am trying to work out a regular expression which will validate a password box. The following rules apply Must be 8 characters Must have at least one digit (0-9) and at least one character...
1
by: Christopher Subich | last post by:
I don't think the python regular expression module correctly handles combining marks; it gives inconsistent results between equivalent forms of some regular expressions: >>> sys.version '2.4.1...
9
by: Mike P | last post by:
I have a regular expression that I use on text boxes where I want to limit the user to letters a-z and spaces. I now need to allow characters such as ö, ä and å (Nordic characters). Does anybody...
6
by: dotnetprogram | last post by:
Does anybody have a regular expression to be used in the validation controls of asp.net that checks if the text inputted is: 1) alphanumeric and doesn't include special characters and symbols...
2
by: Fuzzyman | last post by:
Hello all, Can someone confirm that compiled regular expressions from ascii strings will always (and safely) yield unicode values when matched against unicode strings ? I've tested it and it...
8
by: Luke Matuszewski | last post by:
Hi ! I have faced the problem of checking that the user has entered the unicode letter (not only ASCII set of letters...). It seems that ECMAScript 3rd regular expressions do not include posix...
2
by: John Nagle | last post by:
Regular expressions are compiled in ASCII mode unless Unicode mode is specified to "rc.compile". The difference is that regular expressions in ASCII mode don't recognize things like Unicode...
13
by: Wiseman | last post by:
I'm kind of disappointed with the re regular expressions module. In particular, the lack of support for recursion ( (?R) or (?n) ) is a major drawback to me. There are so many great things that can...
2
by: bryan rasmussen | last post by:
Hi, I'm writing a program that requires specifically Unicode regular expressions http://unicode.org/reports/tr18/ to be loaded in from an external file and then interpreted against the data. if...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.