473,406 Members | 2,769 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

compare unicode to non-unicode strings

how could I test that those 2 strings are the same:

'séd' (repr is 's\\xc3\\xa9d')

u'séd' (repr is u's\\xe9d')
Aug 31 '08 #1
5 5547
On Aug 31, 11:04 pm, Asterix <aste...@lagaule.orgwrote:
how could I test that those 2 strings are the same:

'séd' (repr is 's\\xc3\\xa9d')
No, the repr is 's\xc3\xa9d'.
>
u'séd' (repr is u's\\xe9d')
No, the repr is u's\xe9d'.

To answer your question:

Aug 31 '08 #2
On Aug 31, 11:04 pm, Asterix <aste...@lagaule.orgwrote:
how could I test that those 2 strings are the same:

'séd' (repr is 's\\xc3\\xa9d')

u'séd' (repr is u's\\xe9d')
[note: your reprs are wrong; change the \\ to \]

You need to decode the non-unicode string and compare the result with
the unicode string. You need to know the encoding used for the non-
unicode string. In the example that you gave, it's about 99.99% likely
that it's UTF-8.
>>'s\xc3\xa9d'.decode('utf8')
u's\xe9d'
>>u's\xe9d'.encode('utf8')
's\xc3\xa9d'
>>>
HTH,
John
Aug 31 '08 #3
Asterix wrote:
how could I test that those 2 strings are the same:

'séd' (repr is 's\\xc3\\xa9d')

u'séd' (repr is u's\\xe9d')
determine what encoding the former string is using (looks like UTF-8),
and convert it to Unicode before doing the comparision.
>>b = 's\xc3\xa9d'
u = u's\xe9d'
b
's\xc3\xa9d'
>>u
u's\xe9d'
>>unicode(b, "utf-8")
u's\xe9d'
>>unicode(b, "utf-8") == u
True

</F>

Aug 31 '08 #4
Par Toutatis !
Si tu avais posé la question Ã* Ordralphabétix, ou sur un des ng français
consacrés Ã* Python, au lieu de refaire "La grande Traversée", la réponse
aurait peut-être été plus rapide.

@-salutations
--
Michel Claveau
Aug 31 '08 #5
Asterix wrote:
how could I test that those 2 strings are the same:

'séd' (repr is 's\\xc3\\xa9d')

u'séd' (repr is u's\\xe9d')
You may also want to look at unicodedata.normalize(). For example, é can
be represented multiple ways:
>>import unicodedata
unicodedata.normalize('NFC', u'é')
u'\xe9'
>>unicodedata.normalize('NFD', u'é')
u'e\u0301'
>>u'\xe9' == u'e\u0301'
False

The first form is "composed", just being U+00E9 (LATIN SMALL LETTER E
WITH ACUTE). The second form is "decomposed", being made up of U+0065
(LATIN SMALL LETTER E) and U+0301 (COMBINING ACUTE ACCENT).

Even though they represent the same thing to a human, they don't compare
as equal. But if you normalize them to the same form, they will.

For more information, look at the unicodedata module's documentation:
<http://docs.python.org/lib/module-unicodedata.html>
--
Aug 31 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

26
by: Michael Klatt | last post by:
I am trying to write an iterator for a std::set that allows the iterator target to be modified. Here is some relvant code: template <class Set> // Set is an instance of std::set<> class...
4
by: Maur | last post by:
Hi all, I have 2 tables say t_OLD and t_NEW. The new has corrections for audit purposes. They are identical in all respects (i.e. new is a copy of old and then changes are made to t_new) ...
6
by: Maileen | last post by:
Hi, I have the following code : Function GetRequestType(ByVal EvDt As String, ByVal StPeriod As String, ByVal EdPeriod As String, ByVal TaskType As String) As Integer Dim strtest As String Dim...
12
by: John | last post by:
Hi How can I select records that have non-alphanumeric characters in a field using a select query? Thanks Regards
17
by: Mark A | last post by:
DB2 8.2 for Linux, FP 10 (also performs the same on DB2 8.2 for Windoes, FP 11). Using the SAMPLE database, tables EMP and EMLOYEE. In the followng stored procedure, 2 NULL columns (COMM) are...
9
by: Usman Jamil | last post by:
Hi I'm having a strange error while comparing two strings. Please check the code below. This is a simple string comparison code and works just fine on all of my machines. While debugging an...
15
by: jacob navia | last post by:
Problem You want to ensure that a pointer argument to a function is non-null. Solution int fn(double data); This means that the array (that is passed as a pointer)
9
by: Plissken.s | last post by:
Hi, how can i compare a string which is non null and empty? i look thru the string methods here, but cant find one which does it? ...
26
by: Pietro Cerutti | last post by:
Hi group, I always thought that applying a binary operator such as ==, !=, <= or well defined. Now, I'm passing a program through splint and it says: Dangerous equality comparison involving...
2
by: Sasi Rekha | last post by:
Hi I have Chennai in a textbox and CHENnai in my drop down. When i try to compare those two they are not matching. Is drop down values case sensitive? If so how can i make the drop down list values...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.