473,387 Members | 1,517 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Re: python regex character group matches

christopher taylor wrote:
my issue, is that the pattern i used was returning:

[ '\\uAD0X', '\\u1BF3', ... ]

when i expected:

[ '\\uAD0X\\u1BF3', ]

the code looks something like this:

pat = re.compile("(\\\u[0-9A-F]{4})+", re.UNICODE|re.LOCALE)
#print pat.findall(txt_line)
results = pat.finditer(txt_line)

i ran the pattern through a couple of my colleagues and they were all
in agreement that my pattern should have matched correctly.
First, [0-9A-F] cannot match an "X". Assuming that's a typo, your next
problem is a precedence issue: (X)+ means "one or more (X)", not "one or
more X inside parens". In other words, that pattern matches one or more
X's and captures the last one.

Assuming that you want to find runs of \uXXXX escapes, simply use
non-capturing parentheses:

pat = re.compile(u"(?:\\\u[0-9A-F]{4})")

and use group(0) instead of group(1) to get the match.

</F>

Sep 17 '08 #1
2 1002
On Wed, 17 Sep 2008 15:56:31 +0200, Fredrik Lundh wrote:
Assuming that you want to find runs of \uXXXX escapes, simply use
non-capturing parentheses:

pat = re.compile(u"(?:\\\u[0-9A-F]{4})")
Doesn't work for me:
>>pat = re.compile(u"(?:\\\u[0-9A-F]{4})")
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position
5-7: truncated \uXXXX escape
Assuming that the OP is searching byte strings, I came up with this:
>>pat = re.compile('(\\\u[0-9A-F]{4})+')
pat.search('abcd\\u1234\\uAA99\\u0BC4efg').group (0)
'\\u1234\\uAA99\\u0BC4'

--
Steven
Sep 17 '08 #2
Steven D'Aprano wrote:
>Assuming that you want to find runs of \uXXXX escapes, simply use
non-capturing parentheses:

pat = re.compile(u"(?:\\\u[0-9A-F]{4})")

Doesn't work for me:
>>>pat = re.compile(u"(?:\\\u[0-9A-F]{4})")
it helps if you cut and paste the right line... here's a better version:

pat = re.compile(r"(?:\\u[0-9A-F]{4})+")

</F>

Sep 17 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: alphatan | last post by:
Is there relative source or document for this purpose? I've searched the index of "Mastering Regular Expression", but cannot get the useful information for C. Thanks in advanced. -- Learning...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
2
by: Andrew Robert | last post by:
I have two Perl expressions If windows: perl -ple "s/()/sprintf(q#%%%2X#, ord $1)/ge" somefile.txt If posix perl -ple 's/()/sprintf("%%%2X", ord $1)/ge' somefile.txt
17
by: Mark | last post by:
I must create a routine that finds tokens in small, arbitrary VB code snippets. For example, it might have to find all occurrences of {Formula} I was thinking that using regular expressions...
7
by: MrNobody | last post by:
I'm trying to do some regex in C# but for some reason linebreaks are causing my regex to not work. the test string goes like this: string ss = "<tagname...
4
by: unexpected | last post by:
I'm trying to do a whole word pattern match for the term 'MULTX-' Currently, my regular expression syntax is: re.search(('^')+(keyword+'\\b') where keyword comes from a list of terms....
6
by: Johny | last post by:
Playing a little more with strings, I found out that string.find function provides the position of the first occurance of the substring in the string. Is there a way how to find out all...
11
by: proctor | last post by:
hello, i have a regex: rx_test = re.compile('/x()*x/') which is part of this test program: ============ import re
7
by: Nightcrawler | last post by:
Hi all, I am trying to use regular expressions to parse out mp3 titles into three different groups (artist, title and remix). I currently have three ways to name a mp3 file: Artist - Title ...
2
by: christopher taylor | last post by:
hello python-list! the other day, i was trying to match unicode character sequences that looked like this: \\uAD0X... my issue, is that the pattern i used was returning:
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.