Unicode question

Gerhard Häring

>>> u"äöü"
u'\x84\x94\x81'

(Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")

Why does this work?

Does Python guess which encoding I mean? I thought Python should refuse
to guess :-)
-- Gerhard

Jul 18 '05 #1

Subscribe Post Reply

2910

Thomas Heller

Gerhard Häring <gh@ghaering.de> writes:

>>> u"äöü"

u'\x84\x94\x81'

(Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")

Why does this work?

Does Python guess which encoding I mean? I thought Python should
refuse to guess :-)

I stumbled over this yesterday, and it seems it is (at least) partially
answered by PEP 263:

In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape". This makes the programming
environment rather unfriendly to Python users who live and work in
non-Latin-1 locales such as many of the Asian countries. Programmers
can write their 8-bit strings using the favorite encoding, but are
bound to the "unicode-escape" encoding for Unicode literals.

I have the impression that this is undocumented on purpose, because you
should not write unescaped non-ansi characters into the source file
(with 'unknown' encoding).

Thomas

Jul 18 '05 #2

Gerhard Häring

Thomas Heller wrote:

Gerhard Häring <gh@ghaering.de> writes:

>>> u"äöü"

u'\x84\x94\x81'

(Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")

Why does this work?

Does Python guess which encoding I mean? I thought Python should
refuse to guess :-)

I stumbled over this yesterday, and it seems it is (at least) partially
answered by PEP 263:

In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape". This makes the programming
environment rather unfriendly to Python users who live and work in
non-Latin-1 locales such as many of the Asian countries. Programmers
can write their 8-bit strings using the favorite encoding, but are
bound to the "unicode-escape" encoding for Unicode literals.

I have the impression that this is undocumented on purpose, because you
should not write unescaped non-ansi characters into the source file
(with 'unknown' encoding).

I agree that using latin1 as default is bad. If there's an encoding
cookie in the 2.3+ source file then this encoding could be used.

I stumbled on this when giving another Python user on this list a
pointer to the relevant section in the Python tutorial
(http://www.python.org/doc/current/tu...00000000000000)
where Guido uses u"äöü" in an example.

As this is BAD the tutorial should probably be changed. I'll file a bug
report.

-- Gerhard

Jul 18 '05 #3

Gerhard HÃ¤ring

Gerhard HÃ¤ring wrote:

Ricardo Bugalho wrote:
On Fri, 18 Jul 2003 02:07:13 +0200, Gerhard HÃ¤ring wrote:
Gerhard HÃ¤ring <gh@ghaering.de> writes:

>>>> u"Ã¤Ã¶Ã¼"
>
> u'\x84\x94\x81'
> [this works, but IMO shouldn't]
[...]
You'll get warnings if you don't define an encoding (either encoding
cookie or BOM) and use 8-Bit characters in your source files. These
warnings will becomome errors in later Python versions.

It's all in the PEP :)

I feel like an idiot now :-( I do get the warnings when I run a Python
script, but I do not get the warnings when I'm using the interactive
prompt. So it's all good (almost). Why not also produce warnings at the
interactive prompt?

-- Gerhard

Jul 18 '05 #4

by: sebastien.hugues | last post by:

Hi I would like to retrieve the application data directory path of the logged user on windows XP. To achieve this goal i use the environment variable APPDATA. The logged user has this name:...

Python

Q: The `print' statement over Unicode

by: François Pinard | last post by:

Hi, people. I hope someone would like to enlighten me. For any application handling Unicode internally, I'm usually careful at properly converting those Unicode strings into 8-bit strings before...

Python

UTF-8 & Unicode

by: EU citizen | last post by:

Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?

.NET Framework

Converting UTF-16 encoded chars in querystring to unicode

by: Supratim | last post by:

Hi, For past few weeks I am working on a function that would take encoded Unicode characters from query string of http requests and then decode them back to Unicode numbers. I have full success...

HTML / CSS

html symbols and unicode

by: dalei | last post by:

My question is presented more clearly in following web page: http://www.pinyinology.com/signs2.html <html> HTML entities display outside script tags: a¹, a², a³, a⁴ But...

Javascript

unicode mess in c++

by: damjan | last post by:

This may look like a silly question to someone, but the more I try to understand Unicode the more lost I feel. To say that I am not a beginner C++ programmer, only had no need to delve into...

C / C++

Array of Bytes to Unicode chars (ISO-8859-1)

by: abhi147 | last post by:

Hi , I want to convert an array of bytes like : {79,104,-37,-66,24,123,30,-26,-99,-8,80,-38,19,14,-127,-3} into Unicode character with ISO-8859-1 standard. Can anyone help me .. how should...

C / C++

byte count unicode string

by: willie | last post by:

Martin v. LÃ¶wis: Thanks for the thorough explanation. One last question about terminology then I'll go away :) What is the proper way to describe "ustr" below? <type 'unicode'>

Python

[MFC] CRichEditCtrl how to set codepage for Unicode?

by: =?Utf-8?B?S2V2aW4gVGFuZw==?= | last post by:

In MFC, CRichEditCtrl contrl, I want to set the codepage for the control to Unicode. I used the following method to set codepage for it (only for ANSI or BIG5, etc, not unicode). How should I...

.NET Framework

Encoding: how to convert ISO-8559 to Unicode

by: deloford | last post by:

Hi This is going to be a question for anyone who is an expert in C# Text Encoding. My situation is this: I have a Sybase database which is firing back ISO-8559 encoded strings. I am unable to...

.NET Framework

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Unicode question

Similar topics