473,588 Members | 2,471 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Unicode question

>>> u"äöü"
u'\x84\x94\x81'

(Python 2.2.3/2.3b2; sys.getdefaulte ncoding() == "ascii")

Why does this work?

Does Python guess which encoding I mean? I thought Python should refuse
to guess :-)
-- Gerhard

Jul 18 '05 #1
3 2924
Gerhard Häring <gh@ghaering.de > writes:
>>> u"äöü"

u'\x84\x94\x81'

(Python 2.2.3/2.3b2; sys.getdefaulte ncoding() == "ascii")

Why does this work?

Does Python guess which encoding I mean? I thought Python should
refuse to guess :-)


I stumbled over this yesterday, and it seems it is (at least) partially
answered by PEP 263:

In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape". This makes the programming
environment rather unfriendly to Python users who live and work in
non-Latin-1 locales such as many of the Asian countries. Programmers
can write their 8-bit strings using the favorite encoding, but are
bound to the "unicode-escape" encoding for Unicode literals.

I have the impression that this is undocumented on purpose, because you
should not write unescaped non-ansi characters into the source file
(with 'unknown' encoding).

Thomas
Jul 18 '05 #2
Thomas Heller wrote:
Gerhard Häring <gh@ghaering.de > writes:

>>> u"äöü"

u'\x84\x94\x8 1'

(Python 2.2.3/2.3b2; sys.getdefaulte ncoding() == "ascii")

Why does this work?

Does Python guess which encoding I mean? I thought Python should
refuse to guess :-)



I stumbled over this yesterday, and it seems it is (at least) partially
answered by PEP 263:

In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape". This makes the programming
environment rather unfriendly to Python users who live and work in
non-Latin-1 locales such as many of the Asian countries. Programmers
can write their 8-bit strings using the favorite encoding, but are
bound to the "unicode-escape" encoding for Unicode literals.

I have the impression that this is undocumented on purpose, because you
should not write unescaped non-ansi characters into the source file
(with 'unknown' encoding).


I agree that using latin1 as default is bad. If there's an encoding
cookie in the 2.3+ source file then this encoding could be used.

I stumbled on this when giving another Python user on this list a
pointer to the relevant section in the Python tutorial
(http://www.python.org/doc/current/tu...00000000000000)
where Guido uses u"äöü" in an example.

As this is BAD the tutorial should probably be changed. I'll file a bug
report.

-- Gerhard

Jul 18 '05 #3
Gerhard Häring wrote:
Ricardo Bugalho wrote:
On Fri, 18 Jul 2003 02:07:13 +0200, Gerhard Häring wrote:
Gerhard Häring <gh@ghaering.de > writes:

>>>> u"äöü"
>
> u'\x84\x94\x81'
> [this works, but IMO shouldn't]
[...]
You'll get warnings if you don't define an encoding (either encoding
cookie or BOM) and use 8-Bit characters in your source files. These
warnings will becomome errors in later Python versions.

It's all in the PEP :)


I feel like an idiot now :-( I do get the warnings when I run a Python
script, but I do not get the warnings when I'm using the interactive
prompt. So it's all good (almost). Why not also produce warnings at the
interactive prompt?

-- Gerhard

Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
7078
by: sebastien.hugues | last post by:
Hi I would like to retrieve the application data directory path of the logged user on windows XP. To achieve this goal i use the environment variable APPDATA. The logged user has this name: sébastien. The second character is not an ascii one and when i try to encode the path that contains this name in utf-8,
9
2301
by: François Pinard | last post by:
Hi, people. I hope someone would like to enlighten me. For any application handling Unicode internally, I'm usually careful at properly converting those Unicode strings into 8-bit strings before writing them out. However, this morning, I mistakenly forgot to do so before using one Unicode string (containing a non-ASCII character) as an argument to the `print' statement, and I did _not_ get an error. This is rather surprising to me. ...
27
5134
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
3
5234
by: Supratim | last post by:
Hi, For past few weeks I am working on a function that would take encoded Unicode characters from query string of http requests and then decode them back to Unicode numbers. I have full success with UTF-8 encoding but it is UTF-16 where I stumble. Can somebody help me with one of the following examples that puzzle me : %B7%C9 is UTF-16 encoded version of unicode 98DE (39134 in decimal)
3
2676
by: dalei | last post by:
My question is presented more clearly in following web page: http://www.pinyinology.com/signs2.html <html> HTML entities display outside script tags: a&sup1;, a&sup2;, a&sup3;, a⁴ But unicode doesn't display outside script tags: a\xb2, a\xb3, a\u2074
12
3021
by: damjan | last post by:
This may look like a silly question to someone, but the more I try to understand Unicode the more lost I feel. To say that I am not a beginner C++ programmer, only had no need to delve into character encoding intricacies before. In c/c++, the unicode characters are introduced by the means of wchar_t type. Based on the presence of _UNICODE definition C functions are macro'd to either the normal version or the one prefixed with w. Because...
14
6396
by: abhi147 | last post by:
Hi , I want to convert an array of bytes like : {79,104,-37,-66,24,123,30,-26,-99,-8,80,-38,19,14,-127,-3} into Unicode character with ISO-8859-1 standard. Can anyone help me .. how should I go about doing it ? Thanks
2
401
by: willie | last post by:
Martin v. Löwis: Thanks for the thorough explanation. One last question about terminology then I'll go away :) What is the proper way to describe "ustr" below? <type 'unicode'>
5
9553
by: =?Utf-8?B?S2V2aW4gVGFuZw==?= | last post by:
In MFC, CRichEditCtrl contrl, I want to set the codepage for the control to Unicode. I used the following method to set codepage for it (only for ANSI or BIG5, etc, not unicode). How should I change codepage to Unicode? Get the default character format. GetDefaultCharFormat(CHARFORMAT2& _cf)
0
5052
by: deloford | last post by:
Hi This is going to be a question for anyone who is an expert in C# Text Encoding. My situation is this: I have a Sybase database which is firing back ISO-8559 encoded strings. I am unable to get the db to translate to UTF-8 for non technical reasons. So I have a string coming back with the character œ (ISO value 156). this character appears in .NET as a box character because 156 is not a valid Unicode character value. I have been...
0
7929
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
7862
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8228
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8223
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
5729
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5398
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
3887
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2372
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1459
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.