473,406 Members | 2,371 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

unicode wrap unicode object?

>>> import sys
sys.setdefaultencoding("utf-8")
s='\xe9\xab\x98' #this uff-8 string
ss=U'\xe9\xab\x98'
s '\xe9\xab\x98' ss u'\xe9\xab\x98'

how do I get ss from s?
Can there be a way do this?
thanks!

Apr 7 '06 #1
6 3921
"ygao" <yg******@gmail.com> wrote:
import sys
sys.setdefaultencoding("utf-8")
hmm. what kind of bootleg python is that ?
import sys
sys.setdefaultencoding("utf-8") Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'module' object has no attribute 'setdefaultencoding'

(you're not supposed to change the default encoding. don't
do that; it'll only cause problems in the long run).
s='\xe9\xab\x98' #this uff-8 string
ss=U'\xe9\xab\x98'
s '\xe9\xab\x98' ss u'\xe9\xab\x98' how do I get ss from s?
Can there be a way do this?


you have UTF-8 *bytes* in a Unicode text string? sounds like
someone's made a mistake earlier on...

anyway, iso-8859-1 is, in practice, a null transform, that simply
converts unicode characters to bytes:
s = ss.encode("iso-8859-1")
s '\xe9\xab\x98' s.decode("utf-8") u'\u9ad8' import unicodedata
unicodedata.name(s.decode("utf-8"))

'CJK UNIFIED IDEOGRAPH-9AD8'

but it's probably better to fix the code that puts UTF-8 data in your
Unicode strings (look for bogus iso-8859-1 conversions)

</F>

Apr 8 '06 #2
sorry,my poor english.
I got a solution from others.
I must use utf-8 for chinese.

import sys
reload(sys)
sys.setdefaultencoding("utf-8")
s='\xe9\xab\x98' #this uff-8 string
ss=U'\xe9\xab\x98'
ss1=ss.encode('unicode_escape').decode('string_esc ape')
s1=s.decode('unicode_escape')
s1==ss True ss1==s True


Apr 8 '06 #3
sorry,my poor english.
I got a solution from others.
I must use utf-8 for chinese.
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
s='\xe9\xab\x98' #this uff-8 string
ss=U'\xe9\xab\x98'
ss1=ss.encode('unicode_escape').decode('string_esc ape')
s1=s.decode('unicode_escape')
s1==ss True ss1==s

True

Apr 8 '06 #4
"ygao" wrpte_
I must use utf-8 for chinese.


yeah, but you shouldn't store it in a *Unicode* string. Unicode strings
are designed to hold things that you've already decoded (that is, your
chinese text), not the raw UTF-8 bytes.

if you store the UTF-8 in an ordinary 8-bit string instead, you can use
the unicode constructor to convert things properly:

b = "... some utf-8 data ..."

# turn it into a unicode string
u = unicode(b, "utf-8")

# ... do something with it ...

# turn it back into a utf-8 string
s = u.encode("utf-8")

# or use some other encoding
s = u.encode("big5")

e.g.
b = '\xe9\xab\x98'
u = unicode(b, "utf-8")
u.encode("utf-8") '\xe9\xab\x98' u.encode("big5")

'\xb0\xaa'

</F>

Apr 8 '06 #5
thanks for your advice.

Apr 8 '06 #6
ygao wrote:
I must use utf-8 for chinese.


Sure. But please don't do that:
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
As Fredrik says, you should really avoid changing the
default encoding.
s='\xe9\xab\x98' #this uff-8 string
ss=U'\xe9\xab\x98'
ss1=ss.encode('unicode_escape').decode('string_esc ape')
s1=s.decode('unicode_escape')
s1==ss True ss1==s

True


Ok. But how about that:

py> s='\xe9\xab\x98'
py> ss=u'\u9ad8'
py> s1=s.decode('utf-8')
py> s1==ss
True

Here, ss is a single character, which uses 3 bytes in UTF-8.
In your example, ss has three characters, which are not Chinese,
but European.

Regards,
Martin
Apr 8 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: ..... | last post by:
I have an established program that I am changing to allow users to select one of eight languages and have all the label captions change accordingly. I have no problems with English, French, Dutch,...
3
by: Michael Weir | last post by:
I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code...
9
by: Thomas Heller | last post by:
First I was astonished to see that _winreg.QueryValue doesn't accept unicode key names, then I came up with this pattern: def RegQueryValue(root, subkey): if isinstance(subkey, unicode): return...
13
by: Tomás | last post by:
Let's start off with: class Nation { public: virtual const char* GetName() const = 0; } class Norway : public Nation { public: virtual const char* GetName() const
14
by: abhi147 | last post by:
Hi , I want to convert an array of bytes like : {79,104,-37,-66,24,123,30,-26,-99,-8,80,-38,19,14,-127,-3} into Unicode character with ISO-8859-1 standard. Can anyone help me .. how should...
17
by: Stuart McGraw | last post by:
In the announcement for Python-2.3 http://groups.google.com/group/comp.lang.python/msg/287e94d9fe25388d?hl=en it says "raw_input(): can now return Unicode objects". But I didn't see anything...
5
by: Holger Joukl | last post by:
Hi there, I consider the behaviour of unicode() inconvenient wrt to conversion of non-string arguments. While you can do: u'17.3' you cannot do:
9
by: Jim | last post by:
Hello, I'm trying to write exception-handling code that is OK in the presence of unicode error messages. I seem to have gotten all mixed up and I'd appreciate any un-mixing that anyone can...
4
by: Rehceb Rotkiv | last post by:
Hello, I have this little grep-like program: ++++++++++snip++++++++++ #!/usr/bin/python import sys import re
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.