By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,364 Members | 1,536 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,364 IT Pros & Developers. It's quick & easy.

Regex with ASCII and non-ASCII chars

P: n/a
Hello everybody.
How I can do a regex match in a string with ascii and non ascii chars
for example:

regex = re.compile(r"(ÿÿ‹ð…öÂty)", re.IGNORECASE)
match = regex.search("ÿÿ‹ð…öÂty")
if match:
result = match.group()
print result
else:
result = "No match found"
print result

it return "no match found" even if the two string are equal.
Help me please!
Thx in advance :)

Jan 31 '07 #1
Share this Question
Share on Google+
5 Replies


P: n/a
TOXiC wrote:
How I can do a regex match in a string with ascii and non ascii chars
for example:

regex = re.compile(r"(??ty)", re.IGNORECASE)
match = regex.search("??ty")
if match:
result = match.group()
print result
else:
result = "No match found"
print result

it return "no match found" even if the two string are equal.
For equal strings you should get a match:
>>re.compile("Z", re.IGNORECASE).search("yadda z yadda")
<_sre.SRE_Match object at 0x401e0a68>
>>print _.group()
z

For case ignorance your best bet is unicode:
>>re.compile(u"", re.IGNORECASE|re.UNICODE).search(u"")
<_sre.SRE_Match object at 0x401e09f8>

Peter

Jan 31 '07 #2

P: n/a
Thx it work perfectly.
If I want to query a file stream?

file = open(fileName, "r")
text = file.read()
file.close()

regex = re.compile(u"(ÿÿ‹ð…öÂ)", re.IGNORECASE)
match = regex.search(text)
if (match):
result = match.group()
print result
WritePatch(fileName,text,result)
else:
result = "No match found"
print result

It return "no match found" (the file contain the string "ÿÿ‹ð…öÂ"
but...).
Thanks in advance for the help!

Jan 31 '07 #3

P: n/a
TOXiC wrote:
Thx it work perfectly.
If I want to query a file stream?

file = open(fileName, "r")
text = file.read()
file.close()
Convert the bytes read from the file to unicode. For that you have to know
the encoding, e. g.

file_encoding = "utf-8" # replace with the actual encoding
text = text.decode(file_encoding)
regex = re.compile(u"(ÿÿ‹ð…öÂ)", re.IGNORECASE)
match = regex.search(text)
if (match):
result = match.group()
print result
WritePatch(fileName,text,result)
else:
result = "No match found"
print result

It return "no match found" (the file contain the string "ÿÿ‹ð…öÂ"
but...).
Thanks in advance for the help!
Peter
Jan 31 '07 #4

P: n/a
It wont work with utf-8,iso or ascii...

Jan 31 '07 #5

P: n/a
On 31 Gen, 17:30, "TOXiC" <Gatling...@gmail.comwrote:
It wont work with utf-8,iso or ascii...
I think the best way is to search hex value in the file stream but I
tryed (in the regex) \hxx but it don't work...

Jan 31 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.