473,407 Members | 2,320 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

Regex with ASCII and non-ASCII chars

Hello everybody.
How I can do a regex match in a string with ascii and non ascii chars
for example:

regex = re.compile(r"(ÿÿ‹ð…öÂty)", re.IGNORECASE)
match = regex.search("ÿÿ‹ð…öÂty")
if match:
result = match.group()
print result
else:
result = "No match found"
print result

it return "no match found" even if the two string are equal.
Help me please!
Thx in advance :)

Jan 31 '07 #1
5 5086
TOXiC wrote:
How I can do a regex match in a string with ascii and non ascii chars
for example:

regex = re.compile(r"(ÿÿ?ð?öÂty)", re.IGNORECASE)
match = regex.search("ÿÿ?ð?öÂty")
if match:
result = match.group()
print result
else:
result = "No match found"
print result

it return "no match found" even if the two string are equal.
For equal strings you should get a match:
>>re.compile("Zäöü", re.IGNORECASE).search("yadda zäöü yadda")
<_sre.SRE_Match object at 0x401e0a68>
>>print _.group()
zäöü

For case ignorance your best bet is unicode:
>>re.compile(u"äöü", re.IGNORECASE|re.UNICODE).search(u"ÄÖÜ")
<_sre.SRE_Match object at 0x401e09f8>

Peter

Jan 31 '07 #2
Thx it work perfectly.
If I want to query a file stream?

file = open(fileName, "r")
text = file.read()
file.close()

regex = re.compile(u"(ÿÿ‹ð…öÂ)", re.IGNORECASE)
match = regex.search(text)
if (match):
result = match.group()
print result
WritePatch(fileName,text,result)
else:
result = "No match found"
print result

It return "no match found" (the file contain the string "ÿÿ‹ð…öÂ"
but...).
Thanks in advance for the help!

Jan 31 '07 #3
TOXiC wrote:
Thx it work perfectly.
If I want to query a file stream?

file = open(fileName, "r")
text = file.read()
file.close()
Convert the bytes read from the file to unicode. For that you have to know
the encoding, e. g.

file_encoding = "utf-8" # replace with the actual encoding
text = text.decode(file_encoding)
regex = re.compile(u"(ÿÿ‹ð…öÂ)", re.IGNORECASE)
match = regex.search(text)
if (match):
result = match.group()
print result
WritePatch(fileName,text,result)
else:
result = "No match found"
print result

It return "no match found" (the file contain the string "ÿÿ‹ð…öÂ"
but...).
Thanks in advance for the help!
Peter
Jan 31 '07 #4
It wont work with utf-8,iso or ascii...

Jan 31 '07 #5
On 31 Gen, 17:30, "TOXiC" <Gatling...@gmail.comwrote:
It wont work with utf-8,iso or ascii...
I think the best way is to search hex value in the file stream but I
tryed (in the regex) \hxx but it don't work...

Jan 31 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: John Hunter | last post by:
In trying to sdebug why a certain regex wasn't working like I expected it to, I came across this strange (to me) behavior. The file I am trying to match definitely contains many instances of the...
3
by: Alan Pretre | last post by:
Can anyone help me figure out a regex pattern for the following input example: xxx:a=b,c=d,yyy:e=f,zzz:www:g=h,i=j,l=m I would want four matches from this: 1. xxx a=b,c=d 2. yyy e=f 3....
4
by: Michael Vilain | last post by:
Originally, I was using $value =~ s/<.*>//g; to strip HTML tags from a variable. It actually stripped everything from the first "<" to the last ">" after the ending tag. I found this regex...
7
by: alphatan | last post by:
Is there relative source or document for this purpose? I've searched the index of "Mastering Regular Expression", but cannot get the useful information for C. Thanks in advanced. -- Learning...
8
by: Bibe | last post by:
I've been trying to get this going for awhile now, and need help. I've done a regex object, and when I use IsMatch, it's behavior is quite weird. I am trying to use Regex to make sure that a...
3
by: Luis Esteban Valencia | last post by:
hello quite a simple one if you understand regular expressions vbscript and ..net, probably quite hard if you don't i have a single line input which offers classic search functionality, so if...
4
by: Cor | last post by:
Hi Newsgroup, I have given an answer in this newsgroup about a "Replace". There came an answer on that I did not understand, so I have done some tests. I got the idea that someone said,...
7
by: Nightcrawler | last post by:
Hi all, I am trying to use regular expressions to parse out mp3 titles into three different groups (artist, title and remix). I currently have three ways to name a mp3 file: Artist - Title ...
4
by: pedrito | last post by:
I have a regex question and it never occurred to me to ask here, until I saw Jesse Houwing's quick response to Phil for his Regex question. I have some filenames that I'm trying to parse out of...
2
by: tawright915 | last post by:
Ok so here is my regex (--.*\n|/\*(.|\n)*?\*/). It finds all comments just fine. However I want it to return to me all strings that are not commented out. Is there a way to exclude the comments...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.