compare unicode to non-unicode strings

Asterix

how could I test that those 2 strings are the same:

'sÃ©d' (repr is 's\\xc3\\xa9d')

u'sÃ©d' (repr is u's\\xe9d')

Aug 31 '08 #1

Subscribe Post Reply

5547

John Machin

On Aug 31, 11:04 pm, Asterix <aste...@lagaule.orgwrote:

how could I test that those 2 strings are the same:

'séd' (repr is 's\\xc3\\xa9d')

No, the repr is 's\xc3\xa9d'.

>
u'séd' (repr is u's\\xe9d')

No, the repr is u's\xe9d'.

To answer your question:

Aug 31 '08 #2

John Machin

On Aug 31, 11:04 pm, Asterix <aste...@lagaule.orgwrote:

how could I test that those 2 strings are the same:

'séd' (repr is 's\\xc3\\xa9d')

u'séd' (repr is u's\\xe9d')

[note: your reprs are wrong; change the \\ to \]

You need to decode the non-unicode string and compare the result with
the unicode string. You need to know the encoding used for the non-
unicode string. In the example that you gave, it's about 99.99% likely
that it's UTF-8.

>>'s\xc3\xa9d'.decode('utf8')

u's\xe9d'

>>u's\xe9d'.encode('utf8')

's\xc3\xa9d'

>>>

HTH,
John

Aug 31 '08 #3

Fredrik Lundh

Asterix wrote:

how could I test that those 2 strings are the same:

'sÃ©d' (repr is 's\\xc3\\xa9d')

u'sÃ©d' (repr is u's\\xe9d')

determine what encoding the former string is using (looks like UTF-8),
and convert it to Unicode before doing the comparision.

>>b = 's\xc3\xa9d'
u = u's\xe9d'
b

's\xc3\xa9d'

>>u

u's\xe9d'

>>unicode(b, "utf-8")

u's\xe9d'

>>unicode(b, "utf-8") == u

True

</F>

Aug 31 '08 #4

=?Utf-8?Q?M=C3=A9ta-MCI_=28MVP=29?=

Par Toutatis !
Si tu avais posÃ© la question Ã* OrdralphabÃ©tix, ou sur un des ng franÃ§ais
consacrÃ©s Ã* Python, au lieu de refaire "La grande TraversÃ©e", la rÃ©ponse
aurait peut-Ãªtre Ã©tÃ© plus rapide.

@-salutations
--
Michel Claveau

Aug 31 '08 #5

Matt Nordhoff

Asterix wrote:

how could I test that those 2 strings are the same:

'sÃ©d' (repr is 's\\xc3\\xa9d')

u'sÃ©d' (repr is u's\\xe9d')

You may also want to look at unicodedata.normalize(). For example, Ã© can
be represented multiple ways:

>>import unicodedata
unicodedata.normalize('NFC', u'Ã©')

u'\xe9'

>>unicodedata.normalize('NFD', u'Ã©')

u'e\u0301'

>>u'\xe9' == u'e\u0301'

False

The first form is "composed", just being U+00E9 (LATIN SMALL LETTER E
WITH ACUTE). The second form is "decomposed", being made up of U+0065
(LATIN SMALL LETTER E) and U+0301 (COMBINING ACUTE ACCENT).

Even though they represent the same thing to a human, they don't compare
as equal. But if you normalize them to the same form, they will.

For more information, look at the unicodedata module's documentation:
<http://docs.python.org/lib/module-unicodedata.html>
--

Aug 31 '08 #6

Similar topics

A non-const std::set iterator

by: Michael Klatt | last post by:

I am trying to write an iterator for a std::set that allows the iterator target to be modified. Here is some relvant code: template <class Set> // Set is an instance of std::set<> class...

C / C++

Efficient way to compare data in Two identically structured tables.

by: Maur | last post by:

Hi all, I have 2 tables say t_OLD and t_NEW. The new has corrections for audit purposes. They are identical in all respects (i.e. new is a copy of old and then changes are made to t_new) ...

Microsoft Access / VBA

string compare

by: Maileen | last post by:

Hi, I have the following code : Function GetRequestType(ByVal EvDt As String, ByVal StPeriod As String, ByVal EdPeriod As String, ByVal TaskType As String) As Integer Dim strtest As String Dim...

Visual Basic .NET

Selection by non-alphanumeric characters

by: John | last post by:

Hi How can I select records that have non-alphanumeric characters in a field using a select query? Thanks Regards

Microsoft Access / VBA

Variables in SP do not compare as equal when both are NULL

by: Mark A | last post by:

DB2 8.2 for Linux, FP 10 (also performs the same on DB2 8.2 for Windoes, FP 11). Using the SAMPLE database, tables EMP and EMLOYEE. In the followng stored procedure, 2 NULL columns (COMM) are...

DB2 Database

Error in string comparison (Non-English windows)

by: Usman Jamil | last post by:

Hi I'm having a strange error while comparing two strings. Please check the code below. This is a simple string comparison code and works just fine on all of my machines. While debugging an...

C# / C Sharp

How to specify a non-null pointer argument in C

by: jacob navia | last post by:

Problem You want to ensure that a pointer argument to a function is non-null. Solution int fn(double data); This means that the array (that is passed as a pointer)

C / C++

How can i compare a string which is non null and empty

by: Plissken.s | last post by:

Hi, how can i compare a string which is non null and empty? i look thru the string methods here, but cant find one which does it? ...

Python

comparison on non-integer types

by: Pietro Cerutti | last post by:

Hi group, I always thought that applying a binary operator such as ==, !=, <= or well defined. Now, I'm passing a program through splint and it says: Dangerous equality comparison involving...

C / C++

Compare text non-case sensitive

by: Sasi Rekha | last post by:

Hi I have Chennai in a textbox and CHENnai in my drop down. When i try to compare those two they are not matching. Is drop down values case sensitive? If so how can i make the drop down list values...

.NET Framework

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA