Hello,
There is a unicode string, I want to change it to ansi string. but
it raise an exception.
Could you help me?
## I want to change s1 to s2.
s1 = u'\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) '
s2 = '\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) ' 6 1622
What do you mean by "ansi string"?
Here is a superficially not-unreasonable answer to your more specific
question:
# >>> s1 = u'\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) '
# >>> s2 = '\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) '
# >>> s3 = s1.encode('latin1')
# >>> s2 == s3
# True
But what are you really trying to achieve? Where does your Unicode data
come from? What ranges of characters do you expect it to contain? You
need to crunch it into an 8-bit representation because ... what?
Mr. John Machin, Thank you very much!
Mr. John Machin
This question come form the flow codes. I use the PyXml to build a DOM
tree.
from xml.dom.ext.reader import HtmlLib
doc =
HtmlLib.FromHtmlUrl('http://stock.business.sohu.com/q/nbcg.php?code=600028')
title_elem = doc.documentElement.getElementsByTagName("TITLE")[0]
title_string = title_elem.firstChild.data
print title_string
# the title_string is unicode, but it is not "latin1" code, so I wantto
change it. zd****@xinces.com wrote: Mr. John Machin
This question come form the flow codes. I use the PyXml to build a DOM tree.
from xml.dom.ext.reader import HtmlLib doc = HtmlLib.FromHtmlUrl('http://stock.business.sohu.com/q/nbcg.php?code=600028') title_elem = doc.documentElement.getElementsByTagName("TITLE")[0] title_string = title_elem.firstChild.data print title_string
# the title_string is unicode, but it is not "latin1" code, so I wantto change it.
Errr, but the title of the page is written in Chinese and it is not
supposed to be crammed into latin1 encoding. What are you trying to do
with the string after you squeezed Chinese into latin1?
Errrrrrrr, it get's worse: not only is the title written in Chinese, it
is encoded as gb2312 -- here is the repr() of the first few chunks:
"<html>\n<head>\n <title>\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) :
\xc4\xd
a\xb2\xbf\xc8\xcb\xd4\xb1\xb3\xd6\xb9\xc9 -
\xcb\xd1\xba\xfc\xb9\xc9\xc6\xb1</ti
tle>\n<meta http-equiv='Content-Type' content='text/html;
charset=gb2312'>\n"
and here is what you get after that_guff.decode('gb2312')
u"<html>\n<head>\n <title>\u4e2d\u56fd\u77f3\u5316(600028) :
\u5185\u90e8\u
4eba\u5458\u6301\u80a1 - \u641c\u72d0\u80a1\u7968</title>\n<meta
http-equiv='Con
tent-Type' content='text/html; charset=gb2312'>\n"
The first 2 characters of the title are recognisable both visually on
the browser title and in the unicode as "zhong guo" i.e. China.
BUT the OP's first message is interpreting that gb2312-encoded stuff as
Unicode:
s1 = u'\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) '
*SOMEBODY* is seriously deluded, and it ain't me, and it ain't Serge
:-)
.... and yes Peter, info travels faster also from China that it does
from Armenia :-())
John Machin wrote: ... and yes Peter, info travels faster also from China that it does from Armenia :-())
Q: Can info travel faster from Armenia than from China?
Radio Yerevan: In principle, yes. Just make sure that it doesn't go the
other way round the globe or meets some friends on the way... This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: sebastien.hugues |
last post by:
Hi
I would like to retrieve the application data directory path of the
logged user on
windows XP. To achieve this goal i use the environment variable
APPDATA.
The logged user has this name:...
|
by: François Pinard |
last post by:
Hi, people. I hope someone would like to enlighten me.
For any application handling Unicode internally, I'm usually careful
at properly converting those Unicode strings into 8-bit strings before...
|
by: EU citizen |
last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding?
If so, can anyone name a free application which I can use under Windows 98
to create web pages?
|
by: Supratim |
last post by:
Hi,
For past few weeks I am working on a function that would take encoded
Unicode characters from query string of http requests and then decode
them back to Unicode numbers.
I have full success...
|
by: dalei |
last post by:
My question is presented more clearly in following web page:
http://www.pinyinology.com/signs2.html
<html>
HTML entities display outside script tags:
a¹, a², a³, a⁴
But...
|
by: damjan |
last post by:
This may look like a silly question to someone, but the more I try to
understand Unicode the more lost I feel. To say that I am not a beginner
C++ programmer, only had no need to delve into...
|
by: abhi147 |
last post by:
Hi ,
I want to convert an array of bytes like :
{79,104,-37,-66,24,123,30,-26,-99,-8,80,-38,19,14,-127,-3}
into Unicode character with ISO-8859-1 standard.
Can anyone help me .. how should...
|
by: willie |
last post by:
Martin v. Löwis:
Thanks for the thorough explanation. One last question
about terminology then I'll go away :)
What is the proper way to describe "ustr" below?
<type 'unicode'>
|
by: =?Utf-8?B?S2V2aW4gVGFuZw==?= |
last post by:
In MFC, CRichEditCtrl contrl, I want to set the codepage for the control to
Unicode.
I used the following method to set codepage for it (only for ANSI or BIG5,
etc, not unicode). How should I...
|
by: deloford |
last post by:
Hi
This is going to be a question for anyone who is an expert in C# Text Encoding.
My situation is this: I have a Sybase database which is firing back ISO-8559 encoded strings. I am unable to...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
| |