473,386 Members | 1,766 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Unicode question

>>> u"äöü"
u'\x84\x94\x81'

(Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")

Why does this work?

Does Python guess which encoding I mean? I thought Python should refuse
to guess :-)
-- Gerhard

Jul 18 '05 #1
3 2909
Gerhard Häring <gh@ghaering.de> writes:
>>> u"äöü"

u'\x84\x94\x81'

(Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")

Why does this work?

Does Python guess which encoding I mean? I thought Python should
refuse to guess :-)


I stumbled over this yesterday, and it seems it is (at least) partially
answered by PEP 263:

In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape". This makes the programming
environment rather unfriendly to Python users who live and work in
non-Latin-1 locales such as many of the Asian countries. Programmers
can write their 8-bit strings using the favorite encoding, but are
bound to the "unicode-escape" encoding for Unicode literals.

I have the impression that this is undocumented on purpose, because you
should not write unescaped non-ansi characters into the source file
(with 'unknown' encoding).

Thomas
Jul 18 '05 #2
Thomas Heller wrote:
Gerhard Häring <gh@ghaering.de> writes:

>>> u"äöü"

u'\x84\x94\x81'

(Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")

Why does this work?

Does Python guess which encoding I mean? I thought Python should
refuse to guess :-)



I stumbled over this yesterday, and it seems it is (at least) partially
answered by PEP 263:

In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape". This makes the programming
environment rather unfriendly to Python users who live and work in
non-Latin-1 locales such as many of the Asian countries. Programmers
can write their 8-bit strings using the favorite encoding, but are
bound to the "unicode-escape" encoding for Unicode literals.

I have the impression that this is undocumented on purpose, because you
should not write unescaped non-ansi characters into the source file
(with 'unknown' encoding).


I agree that using latin1 as default is bad. If there's an encoding
cookie in the 2.3+ source file then this encoding could be used.

I stumbled on this when giving another Python user on this list a
pointer to the relevant section in the Python tutorial
(http://www.python.org/doc/current/tu...00000000000000)
where Guido uses u"äöü" in an example.

As this is BAD the tutorial should probably be changed. I'll file a bug
report.

-- Gerhard

Jul 18 '05 #3
Gerhard Häring wrote:
Ricardo Bugalho wrote:
On Fri, 18 Jul 2003 02:07:13 +0200, Gerhard Häring wrote:
Gerhard Häring <gh@ghaering.de> writes:

>>>> u"äöü"
>
> u'\x84\x94\x81'
> [this works, but IMO shouldn't]
[...]
You'll get warnings if you don't define an encoding (either encoding
cookie or BOM) and use 8-Bit characters in your source files. These
warnings will becomome errors in later Python versions.

It's all in the PEP :)


I feel like an idiot now :-( I do get the warnings when I run a Python
script, but I do not get the warnings when I'm using the interactive
prompt. So it's all good (almost). Why not also produce warnings at the
interactive prompt?

-- Gerhard

Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: sebastien.hugues | last post by:
Hi I would like to retrieve the application data directory path of the logged user on windows XP. To achieve this goal i use the environment variable APPDATA. The logged user has this name:...
9
by: François Pinard | last post by:
Hi, people. I hope someone would like to enlighten me. For any application handling Unicode internally, I'm usually careful at properly converting those Unicode strings into 8-bit strings before...
27
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
3
by: Supratim | last post by:
Hi, For past few weeks I am working on a function that would take encoded Unicode characters from query string of http requests and then decode them back to Unicode numbers. I have full success...
3
by: dalei | last post by:
My question is presented more clearly in following web page: http://www.pinyinology.com/signs2.html <html> HTML entities display outside script tags: a&sup1;, a&sup2;, a&sup3;, a⁴ But...
12
by: damjan | last post by:
This may look like a silly question to someone, but the more I try to understand Unicode the more lost I feel. To say that I am not a beginner C++ programmer, only had no need to delve into...
14
by: abhi147 | last post by:
Hi , I want to convert an array of bytes like : {79,104,-37,-66,24,123,30,-26,-99,-8,80,-38,19,14,-127,-3} into Unicode character with ISO-8859-1 standard. Can anyone help me .. how should...
2
by: willie | last post by:
Martin v. Löwis: Thanks for the thorough explanation. One last question about terminology then I'll go away :) What is the proper way to describe "ustr" below? <type 'unicode'>
5
by: =?Utf-8?B?S2V2aW4gVGFuZw==?= | last post by:
In MFC, CRichEditCtrl contrl, I want to set the codepage for the control to Unicode. I used the following method to set codepage for it (only for ANSI or BIG5, etc, not unicode). How should I...
0
by: deloford | last post by:
Hi This is going to be a question for anyone who is an expert in C# Text Encoding. My situation is this: I have a Sybase database which is firing back ISO-8559 encoded strings. I am unable to...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.