Python & unicode

Michel Claveau - abstraction mÃ©ta-galactique non

Hi !

If Python is Ok with Unicode, why the next script not run ?
# -*- coding: utf-8 -*-

def Ñ€ÐµÐ¶Ð¸Ð¼(toto):
return(toto*3)

aret = Ñ€ÐµÐ¶Ð¸Ð¼(4)

@-salutations
--
Michel Claveau

Jul 18 '05 #1

Subscribe Post Reply

2090

John Roth

It doesn't work because Python scripts
must be in ASCII except for the
contents of string literals. Having a function
name in anything but ASCII isn't
supported.

John Roth

"Michel Claveau - abstraction mÃ©ta-galactique non triviale en fuite
perpÃ©tuelle." <un************@msupprimerlepoint.claveauPOINTco m> wrote in
message news:41**********************@news.wanadoo.fr...
Hi !

If Python is Ok with Unicode, why the next script not run ?
# -*- coding: utf-8 -*-

def Ñ€ÐµÐ¶Ð¸Ð¼(toto):
return(toto*3)

aret = Ñ€ÐµÐ¶Ð¸Ð¼(4)

@-salutations
--
Michel Claveau

Jul 18 '05 #2

Leif K-Brooks

John Roth wrote:

It doesn't work because Python scripts must be in ASCII except for
the contents of string literals. Having a function name in anything
but ASCII isn't supported.

To nit-pick a bit, identifiers can be in Unicode; they're simply
confined to digits and plain Latin letters.

Jul 18 '05 #3

Michel Claveau - abstraction méta-galactique non t

Hi !

and plain Latin letters

But not all letters (no : é à ç à ê ö ñ etc.)

Therefore, the Python's support of Unicode is... limited.

Good night
--
Michel Claveau

Jul 18 '05 #4

Radovan Garabik

"Michel Claveau - abstraction mÃ©ta-galactique non triviale en fuite perpÃ©tuelle." <un************@msupprimerlepoint.claveaupointco m> wrote:

Hi !
and plain Latin letters

But not all letters (no : Ã© Ã* Ã§ Ã* Ãª Ã¶ Ã± etc.)

.... and some more letters that are not latin (j,w,u,z)
ok, I'd better shut up :-)
--
-----------------------------------------------------------
| Radovan GarabÃ*k http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!

Jul 18 '05 #5

michele.simionato

Uhm ...

class C(object): .... pass
.... setattr(C, "è", "The letter è")
getattr(C, "è")

'The letter \xe8'

;-)

Michele Simionato

Jul 18 '05 #6

michele.simionato

I forgot to add the following:

setattr(C, "è", u"The letter è")
getattr(C, "è") u'The letter \xe8' print getattr(C, "è") The letter è

Python identifiers can be generic strings, including Latin-1
characters;
they cannot be unicode strings, however:
setattr(C, u"è", "The letter è")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in
position 0: ordinal not in range(128)

So you are right after all, but I though most people didn't know that
you can have
valid identifiers with accented letters, spaces, and non printable
chars.
setattr(C, " ", "this works")
getattr(C, " ")

Michele Simionato

Jul 18 '05 #7

Michel Claveau - abstraction mÃ©ta-galactique non triviale en fuite
perpÃ©tuelle. wrote:

Hi !

If Python is Ok with Unicode, why the next script not run ?

# -*- coding: utf-8 -*-

def Ñ€ÐµÐ¶Ð¸Ð¼(toto):
return(toto*3)

Because the coding is only supported in string literals.
But I'm not sure exactly why. It would be nice to do:

import math
Ï€ = math.pi

--
PÃ¡draig Brady - http://www.pixelbeat.org
--

Jul 18 '05 #8

Kent Johnson

mi***************@gmail.com wrote:

I forgot to add the following:

setattr(C, "Ã¨", u"The letter Ã¨")
getattr(C, "Ã¨")
u'The letter \xe8'
print getattr(C, "Ã¨")
The letter Ã¨
But try this:
C.Ã¨

File "<stdin>", line 1
C.â”œÂ¿
^
SyntaxError: invalid syntax

Python identifiers can be generic strings, including Latin-1
characters;

I don't think so. You have hacked an attribute with latin-1 characters in it, but you haven't
actually created an identifier.

According to the language reference, identifiers can only contain letters a-z and A-Z, digits 0-9
and underscore.
http://docs.python.org/ref/identifiers.html

Kent

Jul 18 '05 #9

Serge.Orlov

Michel Claveau - abstraction méta-galactique non triviale en fuite
perpétuelle. wrote:

Hi !
and plain Latin letters

But not all letters (no : é à ç à ê ö ñ etc.)

Therefore, the Python's support of Unicode is... limited.

So is the support of Unicode in virtually every computer language
because they don't support ... digits except 0..9. Does anyone know a
language that supports?

Serge.

Jul 18 '05 #10

michele.simionato

Kent:

I don't think so. You have hacked an attribute with latin-1 characters in it, but you haven't actually created an identifier.
No, I really created an identifier. For instance
I can create a global name in this way:

globals()["è"]=1
globals()["è"]

1
According to the language reference, identifiers can only contain letters a-z and A-Z, digits 0-9 and underscore.
http://docs.python.org/ref/identifiers.html

The parser has this restriction, so it gets confused if it finds "è".
But the underlying
implementation just works for generic identifiers.
Michele Simionato

Jul 18 '05 #11

Kent Johnson

mi***************@gmail.com wrote:

Kent:
I don't think so. You have hacked an attribute with latin-1
characters in it, but you
haven't actually created an identifier.

No, I really created an identifier. For instance
I can create a global name in this way:

globals()["è"]=1
globals()["è"]

1

Maybe I'm splitting hairs but to me an identifier is a syntactical element that can be used in
specific ways. For example the syntax defines
attributeref ::=
primary "." identifier
so if identifiers can contain latin-1 characters you should be able to say
C.è=1

Kent

According to the language reference, identifiers can only contain

letters a-z and A-Z,
digits 0-9 and underscore.
http://docs.python.org/ref/identifiers.html

The parser has this restriction, so it gets confused if it finds "è".
But the underlying
implementation just works for generic identifiers.
Michele Simionato

Jul 18 '05 #12

Scott David Daniels

P@draigBrady.com wrote:

Because the coding is only supported in string literals.
But I'm not sure exactly why.

The why is the same as why we write in English on this newsgroup.
Not because English is better, but because that leaves a single
language for everyone to use to communicate in. If you allow
non-ASCII characters in symbol names, your source code will be
unviewable (and uneditable) for people with ASCII-only terminals,
never mind how comprehensible it might otherwise be. It is a
least-common-denominator argument, not a "this is better"
argument.

-Scott David Daniels
Sc***********@Acm.Org

Jul 18 '05 #13

Leif K-Brooks

Se*********@gmail.com wrote:

So is the support of Unicode in virtually every computer language
because they don't support ... digits except 0..9.

Hex digits aren't 0..9.

Python 2.4 (#2, Dec 3 2004, 17:59:05)
[GCC 3.3.5 (Debian 1:3.3.5-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

0xCF 207 hex(123)

'0x7b'

Jul 18 '05 #14

Michel Claveau - abstraction méta-galactique non t

Hi !

It is a least-common-denominator argument, not a "this is better"

argument.

I understand, but I have a feeling of attempt at hegemony. Is english
language really least-common-denominator for a russian who writes into
cyrillic, or not anglophone chinese?

And, did you think of klingons?

;-)
Michel Claveau

Jul 18 '05 #15

Michel Claveau - abstraction mÃ©ta-galactique non

Hi !

import math
Ï€ = math.pi

Good sample ! I like too : e=mcÂ²

Jul 18 '05 #16

Michel Claveau - abstraction méta-galactique non t

;o)
Python ==. limits ,===>
\________/

Jul 18 '05 #17

Serge Orlov

Leif K-Brooks wrote:

Se*********@gmail.com wrote:
So is the support of Unicode in virtually every computer language
because they don't support ... digits except 0..9.

Hex digits aren't 0..9.

You're right, I forgot about hex. But that's boring :) How about Hebrew
numerals which are present in Unicode?

Serge.

Jul 18 '05 #18

Serge Orlov

Michel Claveau - abstraction méta-galactique non triviale en fuite
perpétuelle. wrote:

Hi !
It is a least-common-denominator argument, not a "this is better"
argument.

I understand, but I have a feeling of attempt at hegemony. Is

english language really least-common-denominator for a russian who writes into cyrillic, or not anglophone chinese?

I don't know about Chinese but English *is* the least common
denominator for native Russian software developers, there are a lot of
reasons for that:

- to switch between Russian keyboard layout and English keyboard you
need to press a switch key or usually even two keys (at the same time).
Since language syntax and library calls are in English you have to
switch often. Very often you forget what is the current keyboard layout
and start typing in wrong one and you have to delete the garbage, hit
switch key and type it again. If it happens ten times every ten minutes
it will drive you crazy.

- Most of native Russian developers graduated from universities or
institutes. They attended hundreds of hours of math and physics
classes. All these classes use latin notation.

- Any serious local sw development job application mentions "Technical
English" as requirement. It means you're expected to read technical
documents in English.

- At the same time majority of native Russians developers do not speak
English very well and they feel they need more English practice. Using
English identifiers is a chance to practice while you work.

- The amount of useful information in English is much greater than in
Russian, thanks to Internet.

Surprised? :)

Serge.

Jul 18 '05 #19

Scott David Daniels wrote:

P@draigBrady.com wrote:
Because the coding is only supported in string literals.
But I'm not sure exactly why.
The why is the same as why we write in English on this newsgroup.
Not because English is better, but because that leaves a single
language for everyone to use to communicate in.

Fair enough. Though people can communicate in other languages
if they want, or have specific newsgroups for other languages.
If you allow
non-ASCII characters in symbol names, your source code will be
unviewable (and uneditable) for people with ASCII-only terminals,
never mind how comprehensible it might otherwise be.
So how does one edit non ascii string literals at the moment?
It is a
least-common-denominator argument, not a "this is better"
argument.

If one edited the whole file in the specified coding
then one wouldn't have to switch editing modes when
editing strings which is a real pain.

--
PÃ¡draig Brady - http://www.pixelbeat.org
--

Jul 18 '05 #20

Michel Claveau - abstraction méta-galactique non t

Hi !

Sorry, but I think that, for russians, english is an *add-on*, and not a
common-denominator.
English is the most known language, but it is not common. It is the same
difference as between co-operation and colonization.

Have a good day
--
Michel Claveau

Jul 18 '05 #21

Scott David Daniels

P@draigBrady.com wrote:

Scott David Daniels wrote:
If you allow
non-ASCII characters in symbol names, your source code will be
unviewable (and uneditable) for people with ASCII-only terminals,
never mind how comprehensible it might otherwise be.
So how does one edit non ascii string literals at the moment?

Generally by using editors that leave bytes alone if they cannot
be understood. For many applications where I'll work on a program,
I don't need to read the strings, but rather the code that uses
those strings. I am at a disadvantage if I cannot understand the
derivation of the names, no doubt, but at least I know when two
letters are different, and what tokens are distinct.
If one edited the whole file in the specified coding
then one wouldn't have to switch editing modes when
editing strings which is a real pain.

No question, but ASCII is available as a subset for many encodings.
As you might note, my conception is that I might be helping on a
program with many programmers. Python spent a lot of effort to
avoid favoring a character set as much as possible, while still
being a medium for sharing code.

--Scott David Daniels
Sc***********@Acm.Org

Jul 18 '05 #22

Marc 'BlackJack' Rintsch

In <41**********************@news.wanadoo.fr>, Michel Claveau -
abstraction méta-galactique non triviale en fuite perpétuelle. wrote:

I understand, but I have a feeling of attempt at hegemony. Is english
language really least-common-denominator for a russian who writes into
cyrillic, or not anglophone chinese?

And, did you think of klingons?

Klingons don't do Python, they hack ('n slash) in var'aq:

http://www.geocities.com/connorbd/varaq/

SCNR,
Marc 'BlackJack' Rintsch

Jul 18 '05 #23

Serge Orlov

Michel Claveau - abstraction méta-galactique non triviale en fuite
perpétuelle. wrote:

Hi !

Sorry, but I think that, for russians, english is an *add-on*,
and not a common-denominator.
You miss the point, programs are not English writings, they are written
in computer languages using libraries with English identifiers. On the
other hand comments and documentation are text. And Russian programmers
do write Russian comments in programs.
I've seen that a lot of times. On the other hand I've never seen any
serious program written with Russian identifiers. Sure such programs
may exist but my point is that they are very rare. That makes English
the language of choice for Russian programmers. I'm not against the
ability to write identifiers in my native Russian language, I don't
mind it. I'm just trying to get the message across that Russian
programmers are not dying for such feature and almost all of them don't
use such feature in languages that permit Unicode identifiers.
English is the most known language, but it is not common. It is
the same difference as between co-operation and colonization.

When I hear "It's the same difference as ..." it raises a red flag in
my mind. Often, fine words have no connection to the subject. I can say
that it's the same difference as between ability to drive a car and
ability to walk. The car doesn't own you ;)

Serge.

Jul 18 '05 #24

Similar topics

Non-unicode strings & Python.

by: Jonathon Blake | last post by:

All: Question Python is currently Unicode Compliant. What happens when strings are read in from text files that were created using GB 2312-1980, or KPS 9566-2003, or other, equally...

Python

py2exe: abnormal program termination

by: PyDenis | last post by:

Today, I found strange error while using py2exe: 1. I wrote simple program and save as 1.py: import win32ui import win32con win32ui.MessageBox('Test messageBox.' , 'Test', win32con.MB_OK |...

Python

Short questions wrt Python & Unicode

by: KvS | last post by:

Hi all, I've been reading about unicode in general and using it in Python in particular lately as this turns out to be not so straightforward actually. I wanted to aks two questions: 1) I'm...

Python

Weekly Python Patch/Bug Summary

by: Kurt B. Kaiser | last post by:

Patch / Bug Summary ___________________ Patches : 431 open ( +3) / 3425 closed ( +8) / 3856 total (+11) Bugs : 916 open (-23) / 6273 closed (+44) / 7189 total (+21) RFE : 244 open...

Python

Python and decimal character entities over 128.

by: bsagert | last post by:

Some web feeds use decimal character entities that seem to confuse Python (or me). For example, the string "doesn't" may be coded as "doesn’t" which should produce a right leaning apostrophe....

Python

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++