unicode wrap unicode object?

>>> import sys

sys.setdefaultencoding("utf-8")
s='\xe9\xab\x98' #this uff-8 string
ss=U'\xe9\xab\x98'
s '\xe9\xab\x98' ss u'\xe9\xab\x98'

how do I get ss from s?
Can there be a way do this?
thanks!

Apr 7 '06 #1

Subscribe Post Reply

3921

Fredrik Lundh

"ygao" <yg******@gmail.com> wrote:

import sys
sys.setdefaultencoding("utf-8")
hmm. what kind of bootleg python is that ?
import sys
sys.setdefaultencoding("utf-8") Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'module' object has no attribute 'setdefaultencoding'

(you're not supposed to change the default encoding. don't
do that; it'll only cause problems in the long run).
s='\xe9\xab\x98' #this uff-8 string
ss=U'\xe9\xab\x98'
s '\xe9\xab\x98' ss u'\xe9\xab\x98' how do I get ss from s?
Can there be a way do this?

you have UTF-8 *bytes* in a Unicode text string? sounds like
someone's made a mistake earlier on...

anyway, iso-8859-1 is, in practice, a null transform, that simply
converts unicode characters to bytes:

s = ss.encode("iso-8859-1")
s '\xe9\xab\x98' s.decode("utf-8") u'\u9ad8' import unicodedata
unicodedata.name(s.decode("utf-8"))

'CJK UNIFIED IDEOGRAPH-9AD8'

but it's probably better to fix the code that puts UTF-8 data in your
Unicode strings (look for bogus iso-8859-1 conversions)

</F>

Apr 8 '06 #2

ygao

sorry,my poor english.
I got a solution from others.
I must use utf-8 for chinese.

import sys
reload(sys)
sys.setdefaultencoding("utf-8")
s='\xe9\xab\x98' #this uff-8 string
ss=U'\xe9\xab\x98'
ss1=ss.encode('unicode_escape').decode('string_esc ape')
s1=s.decode('unicode_escape')
s1==ss True ss1==s True

Apr 8 '06 #3

ygao

sorry,my poor english.
I got a solution from others.
I must use utf-8 for chinese.

import sys
reload(sys)
sys.setdefaultencoding("utf-8")
s='\xe9\xab\x98' #this uff-8 string
ss=U'\xe9\xab\x98'
ss1=ss.encode('unicode_escape').decode('string_esc ape')
s1=s.decode('unicode_escape')
s1==ss True ss1==s

True

Apr 8 '06 #4

Fredrik Lundh

"ygao" wrpte_

I must use utf-8 for chinese.

yeah, but you shouldn't store it in a *Unicode* string. Unicode strings
are designed to hold things that you've already decoded (that is, your
chinese text), not the raw UTF-8 bytes.

if you store the UTF-8 in an ordinary 8-bit string instead, you can use
the unicode constructor to convert things properly:

b = "... some utf-8 data ..."

# turn it into a unicode string
u = unicode(b, "utf-8")

# ... do something with it ...

# turn it back into a utf-8 string
s = u.encode("utf-8")

# or use some other encoding
s = u.encode("big5")

e.g.

b = '\xe9\xab\x98'
u = unicode(b, "utf-8")
u.encode("utf-8") '\xe9\xab\x98' u.encode("big5")

'\xb0\xaa'

</F>

Apr 8 '06 #5

ygao

thanks for your advice.

Apr 8 '06 #6

Martin v. Löwis

ygao wrote:

I must use utf-8 for chinese.

Sure. But please don't do that:

import sys
reload(sys)
sys.setdefaultencoding("utf-8")
As Fredrik says, you should really avoid changing the
default encoding.
s='\xe9\xab\x98' #this uff-8 string
ss=U'\xe9\xab\x98'
ss1=ss.encode('unicode_escape').decode('string_esc ape')
s1=s.decode('unicode_escape')
s1==ss True ss1==s

True

Ok. But how about that:

py> s='\xe9\xab\x98'
py> ss=u'\u9ad8'
py> s1=s.decode('utf-8')
py> s1==ss
True

Here, ss is a single character, which uses 3 bytes in UTF-8.
In your example, ss has three characters, which are not Chinese,
but European.

Regards,
Martin

Apr 8 '06 #7

Similar topics

UNICODE support in VB 6.0

by: ..... | last post by:

I have an established program that I am changing to allow users to select one of eight languages and have all the label captions change accordingly. I have no problems with English, French, Dutch,...

Visual Basic 4 / 5 / 6

Writing UTF-8 string to UNICODE file

by: Michael Weir | last post by:

I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code...

Python

Retrive unicode keys from the registry

by: Thomas Heller | last post by:

First I was astonished to see that _winreg.QueryValue doesn't accept unicode key names, then I came up with this pattern: def RegQueryValue(root, subkey): if isinstance(subkey, unicode): return...

Python

Portable Code that supports Unicode

by: Tomás | last post by:

Let's start off with: class Nation { public: virtual const char* GetName() const = 0; } class Norway : public Nation { public: virtual const char* GetName() const

C / C++

Array of Bytes to Unicode chars (ISO-8859-1)

by: abhi147 | last post by:

Hi , I want to convert an array of bytes like : {79,104,-37,-66,24,123,30,-26,-99,-8,80,-38,19,14,-127,-3} into Unicode character with ISO-8859-1 standard. Can anyone help me .. how should...

C / C++

does raw_input() return unicode?

by: Stuart McGraw | last post by:

In the announcement for Python-2.3 http://groups.google.com/group/comp.lang.python/msg/287e94d9fe25388d?hl=en it says "raw_input(): can now return Unicode objects". But I didn't see anything...

Python

[unicode] inconvenient unicode conversion of non-string arguments

by: Holger Joukl | last post by:

Hi there, I consider the behaviour of unicode() inconvenient wrt to conversion of non-string arguments. While you can do: u'17.3' you cannot do:

Python

error messages containing unicode

by: Jim | last post by:

Hello, I'm trying to write exception-handling code that is OK in the presence of unicode error messages. I seem to have gotten all mixed up and I'd appreciate any un-mixing that anyone can...

Python

Unicode list

by: Rehceb Rotkiv | last post by:

Hello, I have this little grep-like program: ++++++++++snip++++++++++ #!/usr/bin/python import sys import re

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA