469,282 Members | 1,704 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,282 developers. It's quick & easy.

a unicode question?

Hello,
There is a unicode string, I want to change it to ansi string. but
it raise an exception.
Could you help me?

## I want to change s1 to s2.

s1 = u'\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) '

s2 = '\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) '

Apr 10 '06 #1
6 1561
What do you mean by "ansi string"?

Here is a superficially not-unreasonable answer to your more specific
question:

# >>> s1 = u'\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) '
# >>> s2 = '\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) '
# >>> s3 = s1.encode('latin1')
# >>> s2 == s3
# True

But what are you really trying to achieve? Where does your Unicode data
come from? What ranges of characters do you expect it to contain? You
need to crunch it into an 8-bit representation because ... what?

Apr 10 '06 #2
Mr. John Machin, Thank you very much!

Apr 10 '06 #3
Mr. John Machin

This question come form the flow codes. I use the PyXml to build a DOM
tree.

from xml.dom.ext.reader import HtmlLib
doc =
HtmlLib.FromHtmlUrl('http://stock.business.sohu.com/q/nbcg.php?code=600028')
title_elem = doc.documentElement.getElementsByTagName("TITLE")[0]
title_string = title_elem.firstChild.data
print title_string

# the title_string is unicode, but it is not "latin1" code, so I wantto
change it.

Apr 10 '06 #4

zd****@xinces.com wrote:
Mr. John Machin

This question come form the flow codes. I use the PyXml to build a DOM
tree.

from xml.dom.ext.reader import HtmlLib
doc =
HtmlLib.FromHtmlUrl('http://stock.business.sohu.com/q/nbcg.php?code=600028')
title_elem = doc.documentElement.getElementsByTagName("TITLE")[0]
title_string = title_elem.firstChild.data
print title_string

# the title_string is unicode, but it is not "latin1" code, so I wantto
change it.


Errr, but the title of the page is written in Chinese and it is not
supposed to be crammed into latin1 encoding. What are you trying to do
with the string after you squeezed Chinese into latin1?

Apr 10 '06 #5
Errrrrrrr, it get's worse: not only is the title written in Chinese, it
is encoded as gb2312 -- here is the repr() of the first few chunks:

"<html>\n<head>\n <title>\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) :
\xc4\xd
a\xb2\xbf\xc8\xcb\xd4\xb1\xb3\xd6\xb9\xc9 -
\xcb\xd1\xba\xfc\xb9\xc9\xc6\xb1</ti
tle>\n<meta http-equiv='Content-Type' content='text/html;
charset=gb2312'>\n"

and here is what you get after that_guff.decode('gb2312')

u"<html>\n<head>\n <title>\u4e2d\u56fd\u77f3\u5316(600028) :
\u5185\u90e8\u
4eba\u5458\u6301\u80a1 - \u641c\u72d0\u80a1\u7968</title>\n<meta
http-equiv='Con
tent-Type' content='text/html; charset=gb2312'>\n"

The first 2 characters of the title are recognisable both visually on
the browser title and in the unicode as "zhong guo" i.e. China.

BUT the OP's first message is interpreting that gb2312-encoded stuff as
Unicode:
s1 = u'\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) '

*SOMEBODY* is seriously deluded, and it ain't me, and it ain't Serge
:-)

.... and yes Peter, info travels faster also from China that it does
from Armenia :-())

Apr 10 '06 #6
John Machin wrote:
... and yes Peter, info travels faster also from China that it does
from Armenia :-())


Q: Can info travel faster from Armenia than from China?
Radio Yerevan: In principle, yes. Just make sure that it doesn't go the
other way round the globe or meets some friends on the way...
Apr 11 '06 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

8 posts views Thread by sebastien.hugues | last post: by
9 posts views Thread by François Pinard | last post: by
27 posts views Thread by EU citizen | last post: by
3 posts views Thread by dalei | last post: by
12 posts views Thread by damjan | last post: by
14 posts views Thread by abhi147 | last post: by
2 posts views Thread by willie | last post: by
5 posts views Thread by =?Utf-8?B?S2V2aW4gVGFuZw==?= | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.