By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,908 Members | 1,850 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,908 IT Pros & Developers. It's quick & easy.

Python & unicode

P: n/a
Hi !

If Python is Ok with Unicode, why the next script not run ?
# -*- coding: utf-8 -*-

def режим(toto):
return(toto*3)

aret = режим(4)



@-salutations
--
Michel Claveau


Jul 18 '05 #1
Share this Question
Share on Google+
23 Replies


P: n/a
It doesn't work because Python scripts
must be in ASCII except for the
contents of string literals. Having a function
name in anything but ASCII isn't
supported.

John Roth

"Michel Claveau - abstraction méta-galactique non triviale en fuite
perpétuelle." <un************@msupprimerlepoint.claveauPOINTco m> wrote in
message news:41**********************@news.wanadoo.fr...
Hi !

If Python is Ok with Unicode, why the next script not run ?
# -*- coding: utf-8 -*-

def режим(toto):
return(toto*3)

aret = режим(4)



@-salutations
--
Michel Claveau


Jul 18 '05 #2

P: n/a
John Roth wrote:
It doesn't work because Python scripts must be in ASCII except for
the contents of string literals. Having a function name in anything
but ASCII isn't supported.


To nit-pick a bit, identifiers can be in Unicode; they're simply
confined to digits and plain Latin letters.
Jul 18 '05 #3

P: n/a
Hi !
and plain Latin letters


But not all letters (no : etc.)

Therefore, the Python's support of Unicode is... limited.

Good night
--
Michel Claveau

Jul 18 '05 #4

P: n/a
"Michel Claveau - abstraction méta-galactique non triviale en fuite perpétuelle." <un************@msupprimerlepoint.claveaupointco m> wrote:
Hi !
and plain Latin letters


But not all letters (no : é * ç * ê ö ñ etc.)


.... and some more letters that are not latin (j,w,u,z)
ok, I'd better shut up :-)
--
-----------------------------------------------------------
| Radovan Garab*k http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
Jul 18 '05 #5

P: n/a
Uhm ...
class C(object): .... pass
.... setattr(C, "", "The letter ")
getattr(C, "")

'The letter \xe8'

;-)

Michele Simionato

Jul 18 '05 #6

P: n/a
I forgot to add the following:
setattr(C, "", u"The letter ")
getattr(C, "") u'The letter \xe8' print getattr(C, "") The letter

Python identifiers can be generic strings, including Latin-1
characters;
they cannot be unicode strings, however:
setattr(C, u"", "The letter ")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in
position 0: ordinal not in range(128)

So you are right after all, but I though most people didn't know that
you can have
valid identifiers with accented letters, spaces, and non printable
chars.
setattr(C, " ", "this works")
getattr(C, " ")

Michele Simionato

Jul 18 '05 #7

P: n/a
P
Michel Claveau - abstraction méta-galactique non triviale en fuite
perpétuelle. wrote:
Hi !

If Python is Ok with Unicode, why the next script not run ?

# -*- coding: utf-8 -*-

def режим(toto):
return(toto*3)


Because the coding is only supported in string literals.
But I'm not sure exactly why. It would be nice to do:

import math
π = math.pi

--
Pádraig Brady - http://www.pixelbeat.org
--
Jul 18 '05 #8

P: n/a
mi***************@gmail.com wrote:
I forgot to add the following:

setattr(C, "è", u"The letter è")
getattr(C, "è")
u'The letter \xe8'
print getattr(C, "è")
The letter è
But try this:
C.è

File "<stdin>", line 1
C.è
^
SyntaxError: invalid syntax

Python identifiers can be generic strings, including Latin-1
characters;


I don't think so. You have hacked an attribute with latin-1 characters in it, but you haven't
actually created an identifier.

According to the language reference, identifiers can only contain letters a-z and A-Z, digits 0-9
and underscore.
http://docs.python.org/ref/identifiers.html

Kent
Jul 18 '05 #9

P: n/a
Michel Claveau - abstraction mta-galactique non triviale en fuite
perptuelle. wrote:
Hi !
and plain Latin letters


But not all letters (no : etc.)

Therefore, the Python's support of Unicode is... limited.


So is the support of Unicode in virtually every computer language
because they don't support ... digits except 0..9. Does anyone know a
language that supports?

Serge.

Jul 18 '05 #10

P: n/a
Kent:
I don't think so. You have hacked an attribute with latin-1 characters in it, but you haven't actually created an identifier.
No, I really created an identifier. For instance
I can create a global name in this way:
globals()[""]=1
globals()[""]

1
According to the language reference, identifiers can only contain letters a-z and A-Z, digits 0-9 and underscore.
http://docs.python.org/ref/identifiers.html


The parser has this restriction, so it gets confused if it finds "".
But the underlying
implementation just works for generic identifiers.
Michele Simionato

Jul 18 '05 #11

P: n/a
mi***************@gmail.com wrote:
Kent:
I don't think so. You have hacked an attribute with latin-1
characters in it, but you
haven't actually created an identifier.

No, I really created an identifier. For instance
I can create a global name in this way:

globals()[""]=1
globals()[""]


1


Maybe I'm splitting hairs but to me an identifier is a syntactical element that can be used in
specific ways. For example the syntax defines
attributeref ::=
primary "." identifier
so if identifiers can contain latin-1 characters you should be able to say
C.=1

Kent

According to the language reference, identifiers can only contain


letters a-z and A-Z,
digits 0-9 and underscore.
http://docs.python.org/ref/identifiers.html

The parser has this restriction, so it gets confused if it finds "".
But the underlying
implementation just works for generic identifiers.
Michele Simionato

Jul 18 '05 #12

P: n/a
P@draigBrady.com wrote:
Because the coding is only supported in string literals.
But I'm not sure exactly why.

The why is the same as why we write in English on this newsgroup.
Not because English is better, but because that leaves a single
language for everyone to use to communicate in. If you allow
non-ASCII characters in symbol names, your source code will be
unviewable (and uneditable) for people with ASCII-only terminals,
never mind how comprehensible it might otherwise be. It is a
least-common-denominator argument, not a "this is better"
argument.

-Scott David Daniels
Sc***********@Acm.Org


Jul 18 '05 #13

P: n/a
Se*********@gmail.com wrote:
So is the support of Unicode in virtually every computer language
because they don't support ... digits except 0..9.


Hex digits aren't 0..9.

Python 2.4 (#2, Dec 3 2004, 17:59:05)
[GCC 3.3.5 (Debian 1:3.3.5-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
0xCF 207 hex(123)

'0x7b'
Jul 18 '05 #14

P: n/a
Hi !
It is a least-common-denominator argument, not a "this is better"

argument.

I understand, but I have a feeling of attempt at hegemony. Is english
language really least-common-denominator for a russian who writes into
cyrillic, or not anglophone chinese?

And, did you think of klingons?

;-)
Michel Claveau

Jul 18 '05 #15

P: n/a
Hi !

import math
π = math.pi

Good sample ! I like too : e=mc²

Jul 18 '05 #16

P: n/a
;o)
Python ==. limits ,===>
\________/


Jul 18 '05 #17

P: n/a
Leif K-Brooks wrote:
Se*********@gmail.com wrote:
So is the support of Unicode in virtually every computer language
because they don't support ... digits except 0..9.


Hex digits aren't 0..9.


You're right, I forgot about hex. But that's boring :) How about Hebrew
numerals which are present in Unicode?

Serge.

Jul 18 '05 #18

P: n/a
Michel Claveau - abstraction mta-galactique non triviale en fuite
perptuelle. wrote:
Hi !
It is a least-common-denominator argument, not a "this is better"
argument.

I understand, but I have a feeling of attempt at hegemony. Is

english language really least-common-denominator for a russian who writes into cyrillic, or not anglophone chinese?


I don't know about Chinese but English *is* the least common
denominator for native Russian software developers, there are a lot of
reasons for that:

- to switch between Russian keyboard layout and English keyboard you
need to press a switch key or usually even two keys (at the same time).
Since language syntax and library calls are in English you have to
switch often. Very often you forget what is the current keyboard layout
and start typing in wrong one and you have to delete the garbage, hit
switch key and type it again. If it happens ten times every ten minutes
it will drive you crazy.

- Most of native Russian developers graduated from universities or
institutes. They attended hundreds of hours of math and physics
classes. All these classes use latin notation.

- Any serious local sw development job application mentions "Technical
English" as requirement. It means you're expected to read technical
documents in English.

- At the same time majority of native Russians developers do not speak
English very well and they feel they need more English practice. Using
English identifiers is a chance to practice while you work.

- The amount of useful information in English is much greater than in
Russian, thanks to Internet.

Surprised? :)

Serge.

Jul 18 '05 #19

P: n/a
P
Scott David Daniels wrote:
P@draigBrady.com wrote:
Because the coding is only supported in string literals.
But I'm not sure exactly why.
The why is the same as why we write in English on this newsgroup.
Not because English is better, but because that leaves a single
language for everyone to use to communicate in.


Fair enough. Though people can communicate in other languages
if they want, or have specific newsgroups for other languages.
If you allow
non-ASCII characters in symbol names, your source code will be
unviewable (and uneditable) for people with ASCII-only terminals,
never mind how comprehensible it might otherwise be.
So how does one edit non ascii string literals at the moment?
It is a
least-common-denominator argument, not a "this is better"
argument.


If one edited the whole file in the specified coding
then one wouldn't have to switch editing modes when
editing strings which is a real pain.

--
Pádraig Brady - http://www.pixelbeat.org
--
Jul 18 '05 #20

P: n/a
Hi !

Sorry, but I think that, for russians, english is an *add-on*, and not a
common-denominator.
English is the most known language, but it is not common. It is the same
difference as between co-operation and colonization.

Have a good day
--
Michel Claveau


Jul 18 '05 #21

P: n/a
P@draigBrady.com wrote:
Scott David Daniels wrote:
If you allow
non-ASCII characters in symbol names, your source code will be
unviewable (and uneditable) for people with ASCII-only terminals,
never mind how comprehensible it might otherwise be.
So how does one edit non ascii string literals at the moment?


Generally by using editors that leave bytes alone if they cannot
be understood. For many applications where I'll work on a program,
I don't need to read the strings, but rather the code that uses
those strings. I am at a disadvantage if I cannot understand the
derivation of the names, no doubt, but at least I know when two
letters are different, and what tokens are distinct.
If one edited the whole file in the specified coding
then one wouldn't have to switch editing modes when
editing strings which is a real pain.

No question, but ASCII is available as a subset for many encodings.
As you might note, my conception is that I might be helping on a
program with many programmers. Python spent a lot of effort to
avoid favoring a character set as much as possible, while still
being a medium for sharing code.

--Scott David Daniels
Sc***********@Acm.Org
Jul 18 '05 #22

P: n/a
In <41**********************@news.wanadoo.fr>, Michel Claveau -
abstraction mta-galactique non triviale en fuite perptuelle. wrote:
I understand, but I have a feeling of attempt at hegemony. Is english
language really least-common-denominator for a russian who writes into
cyrillic, or not anglophone chinese?

And, did you think of klingons?


Klingons don't do Python, they hack ('n slash) in var'aq:

http://www.geocities.com/connorbd/varaq/

SCNR,
Marc 'BlackJack' Rintsch
Jul 18 '05 #23

P: n/a
Michel Claveau - abstraction mta-galactique non triviale en fuite
perptuelle. wrote:
Hi !

Sorry, but I think that, for russians, english is an *add-on*,
and not a common-denominator.
You miss the point, programs are not English writings, they are written
in computer languages using libraries with English identifiers. On the
other hand comments and documentation are text. And Russian programmers
do write Russian comments in programs.
I've seen that a lot of times. On the other hand I've never seen any
serious program written with Russian identifiers. Sure such programs
may exist but my point is that they are very rare. That makes English
the language of choice for Russian programmers. I'm not against the
ability to write identifiers in my native Russian language, I don't
mind it. I'm just trying to get the message across that Russian
programmers are not dying for such feature and almost all of them don't
use such feature in languages that permit Unicode identifiers.
English is the most known language, but it is not common. It is
the same difference as between co-operation and colonization.


When I hear "It's the same difference as ..." it raises a red flag in
my mind. Often, fine words have no connection to the subject. I can say
that it's the same difference as between ability to drive a car and
ability to walk. The car doesn't own you ;)

Serge.

Jul 18 '05 #24

This discussion thread is closed

Replies have been disabled for this discussion.