1. I download a page in python using urllib and now want to convert and
keep it as utf-8? I already know the original encoding of the page.
What calls should I make to convert the encoding of the page to utf8?
For example, let's say the page is encoded in gb2312 (simple chinese)
and I want to keep it in utf-8?
2. Is this a good approach? Can I keep any pages in any languages in
this way and return them when requested using utf-8 encoding?
3. Does python 2.4 support all encodings?
By the way, I have set my default encoding in Python to utf8.
I appreciate any help.
-JF 5 1848 ja***@feghhi.co m wrote: 1. I download a page in python using urllib and now want to convert and keep it as utf-8? I already know the original encoding of the page. What calls should I make to convert the encoding of the page to utf8? For example, let's say the page is encoded in gb2312 (simple chinese) and I want to keep it in utf-8?
Something like
data = urllib.url_open (...).read()
unicodeData = data.decode('gb 2312')
utf8Data = unicodeData.enc ode('utf-8')
You may want to supply the errors parameter to decode() or encode(); see the docs for details. http://docs.python.org/lib/string-methods.html
2. Is this a good approach? Can I keep any pages in any languages in this way and return them when requested using utf-8 encoding?
Yes, as long as you know reliably what the encoding is for the source pages.
3. Does python 2.4 support all encodings?
I doubt it :-) but it supports many encodings. The list is at http://docs.python.org/lib/standard-encodings.html
Kent
<ja***@feghhi.c om> schrieb im Newsbeitrag
news:11******** **************@ o13g2000cwo.goo glegroups.com.. .
| 1. I download a page in python using urllib and now want to convert and
| keep it as utf-8? I already know the original encoding of the page.
| What calls should I make to convert the encoding of the page to utf8?
| For example, let's say the page is encoded in gb2312 (simple chinese)
| and I want to keep it in utf-8?
Something like:
utf8_s = s.decode('gb231 2').encode('utf-8')
- with s being the simplified chinese string - should work.
|
| 2. Is this a good approach? Can I keep any pages in any languages in
| this way and return them when requested using utf-8 encoding?
|
| 3. Does python 2.4 support all encodings?
See http://docs.python.org/lib/standard-encodings.html for an overview.
|
| By the way, I have set my default encoding in Python to utf8.
|
Why would you want to do that?
--
Vincent Wehren
|
| I appreciate any help.
|
| -JF
|
Kent Johnson wrote: Something like data = urllib.url_open (...).read() unicodeData = data.decode('gb 2312') utf8Data = unicodeData.enc ode('utf-8')
You may want to supply the errors parameter to decode() or encode(); see the docs for details. http://docs.python.org/lib/string-methods.html
In addition, for an HTML page, you might need to update the META element
for the content-type HTTP header. For an XHTML page, you might need to
update/remove the XML declaration.
Regards,
Martin
thanks for the replies. As for why I set my default encoding to utf-8
in python, I did it a while ago and I think I did it because when I was
reading some strings from database in utf-8 it raised errors b/c there
were some chars it could recongnize in standard encoding. When I made
the change, the error didn't happen anymore.
Does it make sense?
-JF ja***@feghhi.co m wrote: thanks for the replies. As for why I set my default encoding to utf-8 in python, I did it a while ago and I think I did it because when I was reading some strings from database in utf-8 it raised errors b/c there were some chars it could recongnize in standard encoding. When I made the change, the error didn't happen anymore.
Does it make sense?
No. If reading the strings from the database already gives an exception
(i.e. without any processing of these strings), that is a bug in the
database. It is also unlikely that this is what actually happened.
More likely, you are reading the strings from the database, and then
combining them explicitly with Unicode strings. Instead of changing
the default encoding, you should tell your database adapter to return
the strings as Unicode objects; if this is not supported, you should
convert them to Unicode objects in the process of reading them.
Regards,
Martin This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Mark |
last post by:
Hi...
I've been doing a lot of work both creating and consuming web services, and
I notice there seems to be a discontinuity between a number of the different
cogs in the wheel centering around windows-1252 and that it is not equivalent
to iso-8859-1.
Looking in the registry under HKEY_CLASSES_ROOT\MIME\Database\Charset and
\Codepage, it seems that all variations on iso-8859-1 (latin1, etc) are
mapped to code page 1252, which I'm...
|
by: Mark |
last post by:
I've run a few simple tests looking at how query string encoding/decoding gets handled in asp.net, and it seems like the situation is even messier than it was in asp... Can't say I think much of the "improvements", but maybe someone here can point me in the right direction...
First, it looks like asp.net will automatically read and recognize query strings encoded in utf8 and 16-bit unicode, only the latter is some mutant, non-standard...
|
by: jmhmaine |
last post by:
During the course of development cycle I receive HTML files from designers
that use Macs and PCs, but use tools other then Visual Studio. So these files
sometimes are not UTF-8 Encoded.
I see that Visual Studio creates a globalization tag with UTF-8 as the
requestEndcoding and responseEncoding.
I have three questions regarding this:
1. Does the globalization tag convert an ANSI encoded file into UTF-8 when
it complies the ASPX and...
|
by: Mark |
last post by:
Hi...
Just noticed something odd... In old ASP if you had query parameters that
were invalid for their encoding (broken utf-8, say), ASP would give you back
chars representing the 8-bit byte value of the broken encoding, so you still
got something for every input byte.
This appears to have changed radically in ASP.Net, going down to the base
System.Text.Encoding object. Now, it appears to simply vaporize bytes that
don't fit in the...
|
by: lprisr |
last post by:
Hi,
I have double byte characters in the content that I am returning using Web
Services. However, the encoding in the xml file returned by Web Services is
utf-8 and I am unable to read the content, not even by changing browser
encoding setting to the appropriate one.
I implemented SoapExtension (called EncodingExtension) to rewrite the xml to
change the encoding="utf-8" to encoding="windows-1252" in BeforeDeserialize
SoapMessage...
| |
by: velle |
last post by:
My headache is growing while playing arround with unicode in Python,
please help this novice. I have chosen to divide my problem into a few
questions.
Python 2.3.4 (#1, Feb 2 2005, 12:11:53)
on linux2
1)
Does " >>>print 'hello' " simply write to sys.stdout?
|
by: Provost Zakharov |
last post by:
Hello,
I just needed some help on how the DOM is encoded by the IE
parser.
As per the MSDN page,
http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset4.asp
,server encodings are considered first,then the <metatag specified
encodings and then finally the user's preferred settings(which is
usually Western-European aka windows-1252).
I used Ethereal to packet sniff the traffic to www.baidu.com .There is
|
by: henk-jan ebbers |
last post by:
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <http://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive:...
|
by: Allan Ebdrup |
last post by:
I hava an ajax web application where i hvae problems with UTF-8 encoding oc
chineese chars.
My Ajax webapplication runs in a HTML page that is UTF-8 Encoded.
I copy and paste some chineese chars from another HTML page viewed in IE7,
that is also UTF-8 encoded (search for "china" on google.com). I paste the
chineese chars into a content editable div.
My Ajax webservice compiles an XML where the data from the content editable
div is...
|
by: Erwin Moller |
last post by:
Hi group,
I could use a bit of guidance on the following matter.
I am starting a new project now and must make some decisions regarding
encoding.
Environment: PHP4.3, Postgres7.4.3
I must be able to receive forminformation and store that in a database and
later produce it on screen on the client (just plain HTML).
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |