>>> u"äöü"
u'\x84\x94\x81'
(Python 2.2.3/2.3b2; sys.getdefaulte ncoding() == "ascii")
Why does this work?
Does Python guess which encoding I mean? I thought Python should refuse
to guess :-)
-- Gerhard 3 2924
Gerhard Häring <gh@ghaering.de > writes: >>> u"äöü"
u'\x84\x94\x81'
(Python 2.2.3/2.3b2; sys.getdefaulte ncoding() == "ascii")
Why does this work?
Does Python guess which encoding I mean? I thought Python should refuse to guess :-)
I stumbled over this yesterday, and it seems it is (at least) partially
answered by PEP 263:
In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape". This makes the programming
environment rather unfriendly to Python users who live and work in
non-Latin-1 locales such as many of the Asian countries. Programmers
can write their 8-bit strings using the favorite encoding, but are
bound to the "unicode-escape" encoding for Unicode literals.
I have the impression that this is undocumented on purpose, because you
should not write unescaped non-ansi characters into the source file
(with 'unknown' encoding).
Thomas
Thomas Heller wrote: Gerhard Häring <gh@ghaering.de > writes: >>> u"äöü" u'\x84\x94\x8 1'
(Python 2.2.3/2.3b2; sys.getdefaulte ncoding() == "ascii")
Why does this work?
Does Python guess which encoding I mean? I thought Python should refuse to guess :-)
I stumbled over this yesterday, and it seems it is (at least) partially answered by PEP 263: In Python 2.1, Unicode literals can only be written using the Latin-1 based encoding "unicode-escape". This makes the programming environment rather unfriendly to Python users who live and work in non-Latin-1 locales such as many of the Asian countries. Programmers can write their 8-bit strings using the favorite encoding, but are bound to the "unicode-escape" encoding for Unicode literals. I have the impression that this is undocumented on purpose, because you should not write unescaped non-ansi characters into the source file (with 'unknown' encoding).
I agree that using latin1 as default is bad. If there's an encoding
cookie in the 2.3+ source file then this encoding could be used.
I stumbled on this when giving another Python user on this list a
pointer to the relevant section in the Python tutorial
( http://www.python.org/doc/current/tu...00000000000000)
where Guido uses u"äöü" in an example.
As this is BAD the tutorial should probably be changed. I'll file a bug
report.
-- Gerhard
Gerhard Häring wrote: Ricardo Bugalho wrote: On Fri, 18 Jul 2003 02:07:13 +0200, Gerhard Häring wrote:
Gerhard Häring <gh@ghaering.de > writes:
>>>> u"äöü" > > u'\x84\x94\x81' > [this works, but IMO shouldn't]
[...] You'll get warnings if you don't define an encoding (either encoding cookie or BOM) and use 8-Bit characters in your source files. These warnings will becomome errors in later Python versions. It's all in the PEP :)
I feel like an idiot now :-( I do get the warnings when I run a Python
script, but I do not get the warnings when I'm using the interactive
prompt. So it's all good (almost). Why not also produce warnings at the
interactive prompt?
-- Gerhard This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: sebastien.hugues |
last post by:
Hi
I would like to retrieve the application data directory path of the
logged user on
windows XP. To achieve this goal i use the environment variable
APPDATA.
The logged user has this name: sébastien. The second character is not an
ascii one and when i try to encode the path that contains this name in
utf-8,
|
by: François Pinard |
last post by:
Hi, people. I hope someone would like to enlighten me.
For any application handling Unicode internally, I'm usually careful
at properly converting those Unicode strings into 8-bit strings before
writing them out.
However, this morning, I mistakenly forgot to do so before using one
Unicode string (containing a non-ASCII character) as an argument to
the `print' statement, and I did _not_ get an error. This is rather
surprising to me. ...
|
by: EU citizen |
last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding?
If so, can anyone name a free application which I can use under Windows 98
to create web pages?
|
by: Supratim |
last post by:
Hi,
For past few weeks I am working on a function that would take encoded
Unicode characters from query string of http requests and then decode
them back to Unicode numbers.
I have full success with UTF-8 encoding but it is UTF-16 where I
stumble. Can somebody help me with one of the following examples that
puzzle me :
%B7%C9 is UTF-16 encoded version of unicode 98DE (39134 in decimal)
|
by: dalei |
last post by:
My question is presented more clearly in following web page:
http://www.pinyinology.com/signs2.html
<html>
HTML entities display outside script tags:
a¹, a², a³, a⁴
But unicode doesn't display outside script tags:
a\xb2, a\xb3, a\u2074
| |
by: damjan |
last post by:
This may look like a silly question to someone, but the more I try to
understand Unicode the more lost I feel. To say that I am not a beginner
C++ programmer, only had no need to delve into character encoding
intricacies before.
In c/c++, the unicode characters are introduced by the means of wchar_t
type. Based on the presence of _UNICODE definition C functions are
macro'd to either the normal version or the one prefixed with w. Because...
|
by: abhi147 |
last post by:
Hi ,
I want to convert an array of bytes like :
{79,104,-37,-66,24,123,30,-26,-99,-8,80,-38,19,14,-127,-3}
into Unicode character with ISO-8859-1 standard.
Can anyone help me .. how should I go about doing it ?
Thanks
|
by: willie |
last post by:
Martin v. Löwis:
Thanks for the thorough explanation. One last question
about terminology then I'll go away :)
What is the proper way to describe "ustr" below?
<type 'unicode'>
|
by: =?Utf-8?B?S2V2aW4gVGFuZw==?= |
last post by:
In MFC, CRichEditCtrl contrl, I want to set the codepage for the control to
Unicode.
I used the following method to set codepage for it (only for ANSI or BIG5,
etc, not unicode). How should I change codepage to Unicode?
Get the default character format. GetDefaultCharFormat(CHARFORMAT2& _cf)
|
by: deloford |
last post by:
Hi
This is going to be a question for anyone who is an expert in C# Text Encoding.
My situation is this: I have a Sybase database which is firing back ISO-8559 encoded strings. I am unable to get the db to translate to UTF-8 for non technical reasons.
So I have a string coming back with the character œ (ISO value 156). this character appears in .NET as a box character because 156 is not a valid Unicode character value.
I have been...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |