By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,538 Members | 2,211 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,538 IT Pros & Developers. It's quick & easy.

PEP 3131: Supporting Non-ASCII Identifiers

P: n/a
PEP 1 specifies that PEP authors need to collect feedback from the
community. As the author of PEP 3131, I'd like to encourage comments
to the PEP included below, either here (comp.lang.python), or to
py*********@python.org

In summary, this PEP proposes to allow non-ASCII letters as
identifiers in Python. If the PEP is accepted, the following
identifiers would also become valid as class, function, or
variable names: Löffelstiel, changé, ошибка, or 売り*
(hoping that the latter one means "counter").

I believe this PEP differs from other Py3k PEPs in that it really
requires feedback from people with different cultural background
to evaluate it fully - most other PEPs are culture-neutral.

So, please provide feedback, e.g. perhaps by answering these
questions:
- should non-ASCII identifiers be supported? why?
- would you use them if it was possible to do so? in what cases?

Regards,
Martin
PEP: 3131
Title: Supporting Non-ASCII Identifiers
Version: $Revision: 55059 $
Last-Modified: $Date: 2007-05-01 22:34:25 +0200 (Di, 01 Mai 2007) $
Author: Martin v. Löwis <ma****@v.loewis.de>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 1-May-2007
Python-Version: 3.0
Post-History:
Abstract
========

This PEP suggests to support non-ASCII letters (such as accented
characters, Cyrillic, Greek, Kanji, etc.) in Python identifiers.

Rationale
=========

Python code is written by many people in the world who are not familiar
with the English language, or even well-acquainted with the Latin
writing system. Such developers often desire to define classes and
functions with names in their native languages, rather than having to
come up with an (often incorrect) English translation of the concept
they want to name.

For some languages, common transliteration systems exist (in particular,
for the Latin-based writing systems). For other languages, users have
larger difficulties to use Latin to write their native words.

Common Objections
=================

Some objections are often raised against proposals similar to this one.

People claim that they will not be able to use a library if to do so
they have to use characters they cannot type on their keyboards.
However, it is the choice of the designer of the library to decide on
various constraints for using the library: people may not be able to use
the library because they cannot get physical access to the source code
(because it is not published), or because licensing prohibits usage, or
because the documentation is in a language they cannot understand. A
developer wishing to make a library widely available needs to make a
number of explicit choices (such as publication, licensing, language
of documentation, and language of identifiers). It should always be the
choice of the author to make these decisions - not the choice of the
language designers.

In particular, projects wishing to have wide usage probably might want
to establish a policy that all identifiers, comments, and documentation
is written in English (see the GNU coding style guide for an example of
such a policy). Restricting the language to ASCII-only identifiers does
not enforce comments and documentation to be English, or the identifiers
actually to be English words, so an additional policy is necessary,
anyway.

Specification of Language Changes
=================================

The syntax of identifiers in Python will be based on the Unicode
standard annex UAX-31 [1]_, with elaboration and changes as defined
below.

Within the ASCII range (U+0001..U+007F), the valid characters for
identifiers are the same as in Python 2.5. This specification only
introduces additional characters from outside the ASCII range. For
other characters, the classification uses the version of the Unicode
Character Database as included in the ``unicodedata`` module.

The identifier syntax is ``<ID_Start<ID_Continue>*``.

``ID_Start`` is defined as all characters having one of the general
categories uppercase letters (Lu), lowercase letters (Ll), titlecase
letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers
(Nl), plus the underscore (XXX what are "stability extensions" listed in
UAX 31).

``ID_Continue`` is defined as all characters in ``ID_Start``, plus
nonspacing marks (Mn), spacing combining marks (Mc), decimal number
(Nd), and connector punctuations (Pc).

All identifiers are converted into the normal form NFC while parsing;
comparison of identifiers is based on NFC.

Policy Specification
====================

As an addition to the Python Coding style, the following policy is
prescribed: All identifiers in the Python standard library MUST use
ASCII-only identifiers, and SHOULD use English words wherever feasible.

As an option, this specification can be applied to Python 2.x. In that
case, ASCII-only identifiers would continue to be represented as byte
string objects in namespace dictionaries; identifiers with non-ASCII
characters would be represented as Unicode strings.

Implementation
==============

The following changes will need to be made to the parser:

1. If a non-ASCII character is found in the UTF-8 representation of the
source code, a forward scan is made to find the first ASCII
non-identifier character (e.g. a space or punctuation character)

2. The entire UTF-8 string is passed to a function to normalize the
string to NFC, and then verify that it follows the identifier syntax.
No such callout is made for pure-ASCII identifiers, which continue to
be parsed the way they are today.

3. If this specification is implemented for 2.x, reflective libraries
(such as pydoc) must be verified to continue to work when Unicode
strings appear in ``__dict__`` slots as keys.

References
==========

... [1] http://www.unicode.org/reports/tr31/
Copyright
=========

This document has been placed in the public domain.
May 13 '07
Share this Question
Share on Google+
399 Replies


P: n/a
Aldo Cortesi <al**@nullcube.comwrote:
Thus spake Steven D'Aprano (st****@REMOVE.THIS.cybersource.com.au):
If you're relying on cursory visual inspection to recognize harmful code,
you're already vulnerable to trojans.

What a daft thing to say. How do YOU recognize harmful code in a patch
submission? Perhaps you blindly apply patches, and then run your test suite on
a quarantined system, with an instrumented operating system to allow you to
trace process execution, and then perform a few weeks worth of analysis on the
data?

Me, I try to understand a patch by reading it. Call me old-fashioned.
I concur, Aldo. Indeed, if I _can't_ be sure I understand a patch, I
don't accept it -- I ask the submitter to make it clearer.

Homoglyphs would ensure I could _never_ be sure I understand a patch,
without at least running it through some transliteration tool. I don't
think the world of open source needs this extra hurdle in its path.
Alex
May 14 '07 #51

P: n/a

"Bruno Desthuilliers" <bd....q...ho**@free.que..rt.frwrote:
>Martin v. Löwis a écrit :
>So, please provide feedback, e.g. perhaps by answering these
questions:
- should non-ASCII identifiers be supported?

No.
Agreed - I also do not think it is a good idea
>
>why?

Because it will definitivly make code-sharing impossible. Live with it
or else, but CS is english-speaking, period. I just can't understand
code with spanish or german (two languages I have notions of)
identifiers, so let's not talk about other alphabets...
The understanding aside, it seems to me that the maintenance nightmare is
more irritating, as you are faced with stuff you can't type on your
keyboard, without resorting to look up tables and <alt... sequences.
And then you could still be wrong, as has been pointed out for capital
A and Greek alpha.

Then one should consider the effects of this on the whole issue of shared
open source python programs, as Bruno points out, before we argue that
I should not be "allowed" access to Greek, or French and German code
with umlauts and other diacritic marks, as someone else has done.

I think it is best to say nothing of Saint Cyril's script.

I think that to allow identifiers to be "native", while the rest of the
reserved words in the language remains ASCII English kind of
defeats the object of making the python language "language friendly".
It would need something like macros to enable the definition of
native language terms for things like "while", "for", "in", etc...

And we have been through the Macro thingy here, and the consensus
seemed to be that we don't want people to write their own dialects.

I think that the same arguments apply here.
>NB : I'm *not* a native english speaker, I do *not* live in an english
speaking country, and my mother's language requires non-ascii encoding.
And I don't have special sympathy for the USA. And yes, I do write my
code - including comments - in english.
My case is similar, except that we are supposed to have eleven official
languages. - When my ancestors fought the English at Spion Kop*,
we could not even spell our names - and here I am defending the use of
this disease that masquerades as a language, in the interests of standardisation
of communication and ease of sharing and maintenance.

BTW - Afrikaans also has stuff like umlauts - my keyboard cannot type them
and I rarely miss it, because most of my communication is done in English.

- Hendrik

* Spion Kop is one of the few battles in history that went contrary to the
common usage whereby both sides claim victory. In this case, both sides
claimed defeat. "We have suffered a small reverse..." - Sir Redvers Buller,
who was known afterwards as Sir Reverse Buller, or the Ferryman of the
Tugela. To be fair, it was the first war with trenches in it, and nobody
knew how to handle them.

May 14 '07 #52

P: n/a
Alexander Schmolck napisa(a):
>>So, please provide feedback, e.g. perhaps by answering these
questions:
- should non-ASCII identifiers be supported? why?
No, because "programs must be written for people to read, and only
incidentally for machines to execute". Using anything other than "lowest
common denominator" (ASCII) will restrict accessibility of code. This is
not a literature, that requires qualified translators to get the text
from Hindi (or Persian, or Chinese, or Georgian, or...) to Polish.

While I can read the code with Hebrew, Russian or Greek names
transliterated to ASCII, I would not be able to read such code in native.

Who or what would force you to? Do you currently have to deal with hebrew,
russian or greek names transliterated into ASCII? I don't and I suspect this
whole panic about everyone suddenly having to deal with code written in kanji,
klingon and hieroglyphs etc. is unfounded -- such code would drastically
reduce its own "fitness" (much more so than the ASCII-transliterated chinese,
hebrew and greek code I never seem to come across), so I think the chances
that it will be thrust upon you (or anyone else in this thread) are minuscule.
I often must read code written by people using some kind of cyrillic
(Russians, Serbs, Bulgarians). "Native" names transliterated to ascii
are usual artifacts and I don't mind it.
BTW, I'm not sure if you don't underestimate your own intellectual faculties
if you think couldn't cope with greek or russian characters. On the other hand
I wonder if you don't overestimate your ability to reasonably deal with code
written in a completely foreign language, as long as its ASCII -- for anything
of nontrivial length, surely doing anything with such code would already be
orders of magnitude harder?
While I don't have problems with some of non-latin character sets, such
as greek and cyrillic (I was attending school in time when learning
Russian was obligatory in Poland and later I learned Greek), there are a
plenty I wouldn't be able to read, such as Hebrew, Arabic or Persian.

--
Jarek Zgoda

"We read Knuth so you don't have to."
May 14 '07 #53

P: n/a
In <7x************@ruckus.brouhaha.com>, Paul Rubin wrote:
Alexander Schmolck <a.********@gmail.comwrites:
>Plenty of programming languages already support unicode identifiers,

Could you name a few? Thanks.
Haskell. AFAIK the Haskell Report says so but the compilers don't
supported it last time I tried. :-)

Ciao,
Marc 'BlackJack' Rintsch

May 14 '07 #54

P: n/a
Martin v. Löwis:
This PEP suggests to support non-ASCII letters (such as accented
characters, Cyrillic, Greek, Kanji, etc.) in Python identifiers.
I support this to ease integration with other languages and
platforms that allow non-ASCII letters to be used in identifiers. Python
has a strong heritage as a glue language and this has been enabled by
adapting to the features of various environments rather than trying to
assert a Pythonic view of how things should work.

Neil
May 14 '07 #55

P: n/a
On Sun, 13 May 2007 21:10:46 +0200, Stefan Behnel
<st******************@web.dewrote:
[snip]
Now, I am not a strong supporter (most public code will use English
identifiers anyway)
How will you guarantee that? I'm quite convinced that most of the public
code today started its life as private code earlier...
So, introducing non-ASCII identifiers is just a
small step further. Disallowing this does *not* guarantee in any way that
identifiers are understandable for English native speakers. It only
guarantees
that identifiers are always *typable* by people who have access to latin
characters on their keyboard. A rather small advantage, I'd say.
I would certainly not qualify that as "rather small". There have been
quite a few times where I had to change some public code. If this code had
been written in a character set that did not exist on my keyboard, the
only possibility would have been to copy/paste every identifier I had to
type. Have you ever tried to do that? It's actually quite simple to test
it: just remove on your keyboard a quite frequent letter ('E' is a good
candidate), and try to update some code you have at hand. You'll see that
it takes 4 to 5 times longer than writing the code directly, because you
always have to switch between keyboard and mouse far too often. In
addition to the unnecessary movements, it also completely breaks your
concentration. Typing foreign words transliterated to english actually
does take longer than typing "proper" english words, but at least, it can
be done, and it's still faster than having to copy/paste everything.

So I'd say that it would be a major drawback for code sharing, which - if
I'm not mistaken - is the basis for the whole open-source philosophy.
--
python -c "print ''.join([chr(154 - ord(c)) for c in
'U(17zX(%,5.zmz5(17l8(%,5.Z*(93-965$l7+-'])"
May 14 '07 #56

P: n/a
On Sun, 13 May 2007 23:55:11 +0200, Bruno Desthuilliers
<bd*****************@free.quelquepart.frwrote:
Martin v. Löwis a écrit :
>PEP 1 specifies that PEP authors need to collect feedback from the
community. As the author of PEP 3131, I'd like to encourage comments
to the PEP included below, either here (comp.lang.python), or to
py*********@python.org
In summary, this PEP proposes to allow non-ASCII letters as
identifiers in Python. If the PEP is accepted, the following
identifiers would also become valid as class, function, or
variable names: Löffelstiel, changé, ошибка, or 売り*
(hoping that the latter one means "counter").
I believe this PEP differs from other Py3k PEPs in that it really
requires feedback from people with different cultural background
to evaluate it fully - most other PEPs are culture-neutral.
So, please provide feedback, e.g. perhaps by answering these
questions:
- should non-ASCII identifiers be supported?

No.
>why?

Because it will definitivly make code-sharing impossible. Live with it
or else, but CS is english-speaking, period. I just can't understand
code with spanish or german (two languages I have notions of)
identifiers, so let's not talk about other alphabets...
+1 on everything.
NB : I'm *not* a native english speaker, I do *not* live in an english
speaking country,
.... and so am I (and this happens to be the same country as Bruno's...)
and my mother's language requires non-ascii encoding.
.... and so does my wife's (she's Japanese).
And I don't have special sympathy for the USA. And yes, I do write my
code - including comments - in english.
Again, +1. Even when writing code that appears to be "private" at some
time, one *never* knows what will become of it in the future. If it ever
goes public, its chances to evolve - or just to be maintained - are far
bigger if it's written all in english.
--
python -c "print ''.join([chr(154 - ord(c)) for c in
'U(17zX(%,5.zmz5(17l8(%,5.Z*(93-965$l7+-'])"
May 14 '07 #57

P: n/a
Eric Brunel wrote:
Even when writing code that appears to be "private" at some
time, one *never* knows what will become of it in the future. If it ever
goes public, its chances to evolve - or just to be maintained - are far
bigger if it's written all in english.

--python -c "print ''.join([chr(154 - ord(c)) for c in
'U(17zX(%,5.zmz5(17l8(%,5.Z*(93-965$l7+-'])"
Oh well, why did *that* code ever go public?

Stefan
May 14 '07 #58

P: n/a
Eric Brunel wrote:
On Sun, 13 May 2007 21:10:46 +0200, Stefan Behnel
<st******************@web.dewrote:
[snip]
>Now, I am not a strong supporter (most public code will use English
identifiers anyway)

How will you guarantee that? I'm quite convinced that most of the public
code today started its life as private code earlier...
Ok, so we're back to my original example: the problem here is not the
non-ASCII encoding but the non-english identifiers.

If we move the problem to a pure unicode naming problem:

How likely is it that it's *you* (lacking a native, say, kanji keyboard) who
ends up with code that uses identifiers written in kanji? And that you are the
only person who is now left to do the switch to an ASCII transliteration?

Any chance there are still kanji-enabled programmes around that were not hit
by the bomb in this scenario? They might still be able to help you get the
code "public".

Stefan
May 14 '07 #59

P: n/a
Alex Martelli schrieb:
Aldo Cortesi <al**@nullcube.comwrote:
>Thus spake Steven D'Aprano (st****@REMOVE.THIS.cybersource.com.au):
>>If you're relying on cursory visual inspection to recognize harmful code,
you're already vulnerable to trojans.
What a daft thing to say. How do YOU recognize harmful code in a patch
submission? Perhaps you blindly apply patches, and then run your test suite on
a quarantined system, with an instrumented operating system to allow you to
trace process execution, and then perform a few weeks worth of analysis on the
data?

Me, I try to understand a patch by reading it. Call me old-fashioned.

I concur, Aldo. Indeed, if I _can't_ be sure I understand a patch, I
don't accept it -- I ask the submitter to make it clearer.

Homoglyphs would ensure I could _never_ be sure I understand a patch,
without at least running it through some transliteration tool. I don't
think the world of open source needs this extra hurdle in its path.
But then, where's the problem? Just stick to accepting only patches that are
plain ASCII *for your particular project*. And if you want to be sure, put an
ASCII encoding header in all source files (which you want to do anyway, to
prevent the same problem with string constants).

The PEP is only arguing to support this decision at a per-project level rather
than forbidding it at the language level. This makes sense as it moves the
power into the hands of those people who actually use it, not those who
designed the language.

Stefan
May 14 '07 #60

P: n/a
Bruno Desthuilliers wrote:
but CS is english-speaking, period.
That's a wrong assumption. I understand that people can have this impression
when they deal a lot with Open Source code, but I've seen a lot of places
where code was produced that was not written to become publicly available (and
believe me, it *never* will become Open Source). And the projects made strong
use of identifiers with domain specific names. And believe me, those are best
expressed in a language your client knows and expresses concepts in. And this
is definitely not the language you claim to be the only language in CS.

Stefan
May 14 '07 #61

P: n/a
In article <vJ****************@news-server.bigpond.net.au>,
ny*****************@gmail.com says...
Martin v. Lwis:
This PEP suggests to support non-ASCII letters (such as accented
characters, Cyrillic, Greek, Kanji, etc.) in Python identifiers.
I support this to ease integration with other languages and
platforms that allow non-ASCII letters to be used in identifiers. Python
has a strong heritage as a glue language and this has been enabled by
adapting to the features of various environments rather than trying to
assert a Pythonic view of how things should work.

Neil
Ouch! Now I seem to be disagreeing with the one who writes my editor.
What will become of me now?

A.
May 14 '07 #62

P: n/a
Martin v. Lwis <ma****@v.loewis.dewrote:
So, please provide feedback, e.g. perhaps by answering these
questions:
Firstly on the PEP itself:

It defines characters that would be allowed. However not being up to
speed on unicode jargon I don't have a clear idea about which
characters those are. A page with some examples or even all possible
allowed characters would be great, plus some examples of disallowed
characters.
- should non-ASCII identifiers be supported? why?
Only if PEP 8 was amended to state that ASCII characters only should
be used for publically released / library code. I'm quite happy with
Unicode in comments / docstrings (but that is supported already).
- would you use them if it was possible to do so? in what cases?
My initial reaction is that it would be cool to use all those great
symbols. A variable called OHM etc! However on reflection I think it
would be a step back for the easy to read nature of python.

My worries are :-

a) English speaking people would invent their own dialects of python
which looked like APL with all those nice Unicode mathematical
operators / Greek letters you could use as variable/function names. I
like the symbol free nature of python which makes for easy
comprehension of code and don't want to see it degenerate.

b) Unicode characters would creep into the public interface of public
libraries. I think this would be a step back for the homogeneous
nature of the python community.

c) the python keywords are in ASCII/English. I hope you weren't
thinking of changing them?

....

In summary, I'm not particularly keen on the idea; though it might be
all right in private. Unicode identifiers are allowed in java though,
so maybe I'm worrying too much ;-)

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
May 14 '07 #63

P: n/a
On Mon, 14 May 2007 11:00:29 +0200, Stefan Behnel
<st******************@web.dewrote:
Eric Brunel wrote:
>On Sun, 13 May 2007 21:10:46 +0200, Stefan Behnel
<st******************@web.dewrote:
[snip]
>>Now, I am not a strong supporter (most public code will use English
identifiers anyway)

How will you guarantee that? I'm quite convinced that most of the public
code today started its life as private code earlier...

Ok, so we're back to my original example: the problem here is not the
non-ASCII encoding but the non-english identifiers.
As I said in the rest of my post, I do recognize that there is a problem
with non-english identifiers. I only think that allowing these identifiers
to use a non-ASCII encoding will make things worse, and so should be
avoided.
If we move the problem to a pure unicode naming problem:

How likely is it that it's *you* (lacking a native, say, kanji keyboard)
who
ends up with code that uses identifiers written in kanji? And that you
are the
only person who is now left to do the switch to an ASCII transliteration?

Any chance there are still kanji-enabled programmes around that were not
hit
by the bomb in this scenario? They might still be able to help you get
the
code "public".
Contrarily to what one might think seeing the great achievements of
open-source software, people willing to maintain public code and/or make
it evolve seem to be quite rare. If you add burdens on such people - such
as being able to read and write the language of the original code writer,
or forcing them to request a translation or transliteration from someone
else -, the chances are that they will become even rarer...
--
python -c "print ''.join([chr(154 - ord(c)) for c in
'U(17zX(%,5.zmz5(17l8(%,5.Z*(93-965$l7+-'])"
May 14 '07 #64

P: n/a
Eric Brunel wrote:
On Mon, 14 May 2007 11:00:29 +0200, Stefan Behnel
>Any chance there are still kanji-enabled programmes around that were
not hit
by the bomb in this scenario? They might still be able to help you get
the
code "public".

Contrarily to what one might think seeing the great achievements of
open-source software, people willing to maintain public code and/or make
it evolve seem to be quite rare. If you add burdens on such people -
such as being able to read and write the language of the original code
writer, or forcing them to request a translation or transliteration from
someone else -, the chances are that they will become even rarer...
Ok, but then maybe that code just will not become Open Source. There's a
million reasons code cannot be made Open Source, licensing being one, lack of
resources being another, bad implementation and lack of documentation being
important also.

But that won't change by keeping Unicode characters out of source code.

Now that we're at it, badly named english identifiers chosen by non-english
native speakers, for example, are a sure way to keep people from understanding
the code and thus from being able to contribute resources.

I'm far from saying that all code should start using non-ASCII characters.
There are *very* good reasons why a lot of projects are well off with ASCII
and should obey the good advice of sticking to plain ASCII. But those are
mainly projects that are developed in English and use English documentation,
so there is not much of a risk to stumble into problems anyway.

I'm only saying that this shouldn't be a language restriction, as there
definitely *are* projects (I know some for my part) that can benefit from the
clarity of native language identifiers (just like English speaking projects
benefit from the English language). And yes, this includes spelling native
language identifiers in the native way to make them easy to read and fast to
grasp for those who maintain the code.

It should at least be an available option to use this feature.

Stefan
May 14 '07 #65

P: n/a
Anton Vredegoor:
Ouch! Now I seem to be disagreeing with the one who writes my editor.
What will become of me now?
It should be OK. I try to keep my anger under control and not cut
off the pixel supply at the first stirrings of dissent.

It may be an idea to provide some more help for multilingual text
such as allowing ranges of characters to be represented as hex escapes
or character names automatically. Then someone who only normally uses
ASCII can more easily audit patches that could contain non-ASCII characters.

Neil
May 14 '07 #66

P: n/a
In <sl*****************@irishsea.home.craig-wood.com>, Nick Craig-Wood
wrote:
My initial reaction is that it would be cool to use all those great
symbols. A variable called OHM etc!
This is a nice candidate for homoglyph confusion. There's the Greek
letter omega (U+03A9) Ω and the SI unit symbol (U+2126) Ω, and I think
some omegas in the mathematical symbols area too.

Ciao,
Marc 'BlackJack' Rintsch
May 14 '07 #67

P: n/a
I suggest we keep focused on the main issue here, which is "shoud non-
ascii identifiers be allowed, given that we already allow non-ascii
strings literals and comments?"

Most arguments against this proposal really fall into the category
"ascii-only source files". If you want to promote code-sharing, then
you should enfore quite restrictive policies:
- 7-bit only source files, so that everyone is able to correctly
display and _print_ them (somehow I feel that printing foreign glyphs
can be harder than displaying them) ;
- English-only, readable comments _and_ identifiers (if you think of
it, it's really the same issue, readability... I know no Coding Style
that requires good commenting but allows meaningless identifiers).

Now, why in the first place one should be allowed to violate those
policies? One reason is freedom. Let me write my code the way I like
it, and don't force me writing it the way you like it (unless it's
supposed to be part of _your_ project, then have me follow _your_
style).

Another reason is that readability is quite a relative term...
comments that won't make any sense in a real world program, may be
appropriate in a 'getting started with' guide example:

# this is another way to increment variable 'a'
a += 1

we know a comment like that is totally useless (and thus harmful) to
any programmer (makes me think "thanks, but i knew that already"), but
it's perfectly appropriate if you're introducing that += operator for
the first time to a newbie.

You could even say that most string literals are best made English-
only:

print "Ciao Mondo!"

it's better written:

print _("Hello World!")

or with any other mean to allow the i18n of the output. The Italian
version should be implemented with a .po file or whatever.

Yet, we support non-ascii encodings for source files. That's in order
to give authors more freedom. And freedom comes at a price, of course,
as non-ascii string literals, comments and identifiers are all harmful
to some extents and in some contexts.

What I fail to see is a context in which it makes sense to allow non-
ascii literals and non-ascii comments but _not_ non-ascii identifiers.
Or a context in which it makes sense to rule out non-ascii identifiers
but not string literals and comments. E.g. would you accept a patch
with comments you don't understand (or even that you are not able to
display correctly)? How can you make sure the patch is correct, if you
can't read and understand the string literals it adds?

My point being that most public open source projects already have
plenty of good reasons to enforce an English-only, ascii-only policy
on source files. I don't think that allowing non-ascii indentifiers at
language level would hinder thier ability to enforce such a policy
more than allowing non-ascii comments or literals did.

OTOH, I won't be able to contribute much to a project that already
uses, say, Chinese for comments and strings. Even if I manage to
display the source code correctly here, still I won't understand much
of it. So I'm not losing much by allowing them to use Chinese for
identifiers too.
And whether it was a mistake on their part not to choose an "English
only, ascii only" policy it's their call, not ours, IMHO.

..TM.

May 14 '07 #68

P: n/a
Neil Hodgson <ny*****************@gmail.comwrites:
Paul Rubin wrote:
>>Plenty of programming languages already support unicode identifiers,

Could you name a few? Thanks.

C#, Java, Ecmascript, Visual Basic.
(i.e. everything that isn't a legacy or niche language)

scheme (major implementations such as PLT and the upcoming standard), the most
popular common lisp implementations, haskell[1], fortress[2], perl 6 and I should
imagine (but haven't checked) all new java or .NET based languages (F#,
IronPython, JavaFX, Groovy, etc.) as well -- the same goes for XML-based
languages.

(i.e. everything that's up and coming, too)

So as Neil said, I don't think keeping python ASCII and interoperable is an
option. I don't happen to think the anti-unicode arguments that have been
advanced so far terribly convincing so far[3], but even if they were it
wouldn't matter much -- the ability of functioning as a painless glue language
has always been absolutely vital for python.

cheers

'as

Footnotes:
[1] <http://hackage.haskell.org/trac/haskell-prime/wiki/UnicodeInHaskellSource>

[2] <http://research.sun.com/projects/plrg/fortress.pdf>

[3] Although I do agree that mechanisms to avoid spoofing and similar
problems (what normalization scheme and constraints unicode identifiers
should be subjected to) merit careful discussion.
May 14 '07 #69

P: n/a
Martin v. Löwis a écrit :
PEP 1 specifies that PEP authors need to collect feedback from the
community. As the author of PEP 3131, I'd like to encourage comments
to the PEP included below, either here (comp.lang.python), or to
py*********@python.org

In summary, this PEP proposes to allow non-ASCII letters as
identifiers in Python. If the PEP is accepted, the following
identifiers would also become valid as class, function, or
variable names: Löffelstiel, changé, ошибка, or 売り*
(hoping that the latter one means "counter").

I believe this PEP differs from other Py3k PEPs in that it really
requires feedback from people with different cultural background
to evaluate it fully - most other PEPs are culture-neutral.

So, please provide feedback, e.g. perhaps by answering these
questions:
- should non-ASCII identifiers be supported? why?
- would you use them if it was possible to do so? in what cases?
I strongly prefer to stay with current standard limited ascii for
identifiers.

Ideally, it would be agreable to have variables like greek letters for
some scientific vars, for french people using éèç* in names...

But... (I join common obections):

* where are-they on my keyboard, how can I type them ?
(i can see french éèç*, but us-layout keyboard dont know them, imagine
kanji or greek)

* how do I spell this cyrilic/kanji char ?

* when there are very similar chars, how can I distinguish them?
(without dealing with same representation chars having different unicode
names)

* is "amédé" variable and "amede" the same ?

* its an anti-KISS rule.

* not only I write code, I read it too, and having such variation
possibility in names make code really more unreadable.
(unless I learn other scripting representation - maybe not a bad thing
itself, but its not the objective here).

* I've read "Restricting the language to ASCII-only identifiers does
not enforce comments and documentation to be English, or the identifiers
actually to be English words, so an additional policy is necessary,
anyway."
But even with comments in german or spanish or japanese, I can guess to
identify what a (well written) code is doing with its data. It would be
very difficult with unicode spanning identifiers.
==I wouldn't use them.
So, keep ascii only.
Basic ascii is the lower common denominator known and available
everywhere, its known by all developers who can identify these chars
correctly (maybe 1 vs I or O vs 0 can get into problems with uncorrect
fonts).
Maybe, make default file-encoding to utf8 and strings to be unicode
strings by default (with a s"" for basic strings by example), but this
is another problem.
L.Pointal.

May 14 '07 #70

P: n/a
Marco Colombo wrote:
I suggest we keep focused on the main issue here, which is "shoud non-
ascii identifiers be allowed, given that we already allow non-ascii
strings literals and comments?"

Most arguments against this proposal really fall into the category
"ascii-only source files". If you want to promote code-sharing, then
you should enfore quite restrictive policies:
- 7-bit only source files, so that everyone is able to correctly
display and _print_ them (somehow I feel that printing foreign glyphs
can be harder than displaying them) ;
- English-only, readable comments _and_ identifiers (if you think of
it, it's really the same issue, readability... I know no Coding Style
that requires good commenting but allows meaningless identifiers).

Now, why in the first place one should be allowed to violate those
policies? One reason is freedom. Let me write my code the way I like
it, and don't force me writing it the way you like it (unless it's
supposed to be part of _your_ project, then have me follow _your_
style).

Another reason is that readability is quite a relative term...
comments that won't make any sense in a real world program, may be
appropriate in a 'getting started with' guide example:

# this is another way to increment variable 'a'
a += 1

we know a comment like that is totally useless (and thus harmful) to
any programmer (makes me think "thanks, but i knew that already"), but
it's perfectly appropriate if you're introducing that += operator for
the first time to a newbie.

You could even say that most string literals are best made English-
only:

print "Ciao Mondo!"

it's better written:

print _("Hello World!")

or with any other mean to allow the i18n of the output. The Italian
version should be implemented with a .po file or whatever.

Yet, we support non-ascii encodings for source files. That's in order
to give authors more freedom. And freedom comes at a price, of course,
as non-ascii string literals, comments and identifiers are all harmful
to some extents and in some contexts.

What I fail to see is a context in which it makes sense to allow non-
ascii literals and non-ascii comments but _not_ non-ascii identifiers.
Or a context in which it makes sense to rule out non-ascii identifiers
but not string literals and comments. E.g. would you accept a patch
with comments you don't understand (or even that you are not able to
display correctly)? How can you make sure the patch is correct, if you
can't read and understand the string literals it adds?

My point being that most public open source projects already have
plenty of good reasons to enforce an English-only, ascii-only policy
on source files. I don't think that allowing non-ascii indentifiers at
language level would hinder thier ability to enforce such a policy
more than allowing non-ascii comments or literals did.

OTOH, I won't be able to contribute much to a project that already
uses, say, Chinese for comments and strings. Even if I manage to
display the source code correctly here, still I won't understand much
of it. So I'm not losing much by allowing them to use Chinese for
identifiers too.
And whether it was a mistake on their part not to choose an "English
only, ascii only" policy it's their call, not ours, IMHO.
Very well written.

+1

Stefan
May 14 '07 #71

P: n/a
Alexander Schmolck <a.********@gmail.comwrote:
scheme (major implementations such as PLT and the upcoming standard),
the most popular common lisp implementations, haskell[1], fortress[2],
perl 6 and I should imagine (but haven't checked) all new java or .NET
based languages (F#, IronPython, JavaFX, Groovy, etc.) as well -- the
same goes for XML-based languages.
Just to confirm that: IronPython does accept non-ascii identifiers. From
"Differences between IronPython and CPython":
IronPython will compile files whose identifiers use non-ASCII
characters if the file has an encoding comment such as "# -*- coding:
utf-8 -*-". CPython will not compile such a file in any case.
May 14 '07 #72

P: n/a
Duncan Booth wrote:
Alexander Schmolck <a.********@gmail.comwrote:
>scheme (major implementations such as PLT and the upcoming standard),
the most popular common lisp implementations, haskell[1], fortress[2],
perl 6 and I should imagine (but haven't checked) all new java or .NET
based languages (F#, IronPython, JavaFX, Groovy, etc.) as well -- the
same goes for XML-based languages.

Just to confirm that: IronPython does accept non-ascii identifiers. From
"Differences between IronPython and CPython":
>IronPython will compile files whose identifiers use non-ASCII
characters if the file has an encoding comment such as "# -*- coding:
utf-8 -*-". CPython will not compile such a file in any case.
Sounds like CPython would better follow IronPython here.

Stefan
May 14 '07 #73

P: n/a
On May 14, 4:30*am, Nick Craig-Wood <n...@craig-wood.comwrote:
>
A variable called OHM etc!
--
Nick Craig-Wood <n...@craig-wood.com--http://www.craig-wood.com/nick
Then can 'lambda' -'λ' be far behind? (I know this is a keyword
issue, not covered by this PEP, but I also sense that the 'lambda'
keyword has always been ranklesome.)

In my own personal English-only experience, I've thought that it would
be helpful to the adoption of pyparsing if I could distribute class
name translations, since so much of my design goal of pyparsing is
that it be somewhat readable as in:

integer = Word(nums)

is 'an integer is a word composed of numeric digits'.

By distributing a translation file, such as:

Palabra = Word
Grupo = Group
etc.

a Spanish-speaker could write their own parser using:

numero = Palabra(nums)

and this would still pass the "fairly easy-to-read" test, for that
user. While my examples don't use any non-ASCII characters, I'm sure
the issue would come up fairly quickly.

As to the responder who suggested not mixing ASCII/Latin with, say,
Hebrew in any given identifier, this is not always possible. On a
business trip to Israel, I learned that there are many terms that do
not have Hebrew correspondents, and so Hebrew technical literature is
sprinkled with English terms in Latin characters. This is especially
interesting to watch being typed on a terminal, as the Hebrew
characters are written on the screen right-to-left, and then an
English word is typed by switching the editor to left-to-right mode.
The cursor remains in the same position and the typed Latin characters
push out to the left as they are typed. Then typing in right-to-left
mode is resumed, just to the left of the Latin characters just
entered.

-- Paul

May 14 '07 #74

P: n/a
Stefan Behnel <st******************@web.dewrote:
>Just to confirm that: IronPython does accept non-ascii identifiers.
From "Differences between IronPython and CPython":
>>IronPython will compile files whose identifiers use non-ASCII
characters if the file has an encoding comment such as "# -*-
coding: utf-8 -*-". CPython will not compile such a file in any
case.

Sounds like CPython would better follow IronPython here.
I cannot find any documentation which says exactly which non-ASCII
characters IronPython will accept.
I would guess that it probably follows C# in general, but it doesn't
follow C# identifier syntax exactly (in particular the leading @ to
quote keywords is not supported).

The C# identifier syntax from http://msdn2.microsoft.com/en-us/lib...70(VS.71).aspx
I think it differs from the PEP only in also allowing the Cf class of characters:

identifier:
available-identifier
@ identifier-or-keyword
available-identifier:
An identifier-or-keyword that is not a keyword
identifier-or-keyword:
identifier-start-character identifier-part-charactersopt
identifier-start-character:
letter-character
_ (the underscore character U+005F)
identifier-part-characters:
identifier-part-character
identifier-part-characters identifier-part-character
identifier-part-character:
letter-character
decimal-digit-character
connecting-character
combining-character
formatting-character
letter-character:
A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl
A unicode-escape-sequence representing a character of classes Lu, Ll, Lt, Lm, Lo, or Nl
combining-character:
A Unicode character of classes Mn or Mc
A unicode-escape-sequence representing a character of classes Mn or Mc
decimal-digit-character:
A Unicode character of the class Nd
A unicode-escape-sequence representing a character of the class Nd
connecting-character:
A Unicode character of the class Pc
A unicode-escape-sequence representing a character of the class Pc
formatting-character:
A Unicode character of the class Cf
A unicode-escape-sequence representing a character of the class Cf

For information on the Unicode character classes mentioned above, see
The Unicode Standard, Version 3.0, section 4.5.
May 14 '07 #75

P: n/a
On May 13, 5:44 pm, "Martin v. Löwis" <mar...@v.loewis.dewrote:

In summary, this PEP proposes to allow non-ASCII letters as
identifiers in Python. If the PEP is accepted, the following
identifiers would also become valid as class, function, or
variable names: Löffelstiel, changé, ошибка, or 売り*
(hoping that the latter one means "counter").
I am strongly against this PEP. The serious problems and huge costs
already explained by others are not balanced by the possibility of
using non-butchered identifiers in non-ASCII alphabets, especially
considering that one can write any language, in its full Unicode
glory, in the strings and comments of suitably encoded source files.
The diatribe about cross language understanding of Python code is IMHO
off topic; if one doesn't care about international readers, using
annoying alphabets for identifiers has only a marginal impact. It's
the same situation of IRIs (a bad idea) with HTML text (happily
Unicode).
- should non-ASCII identifiers be supported? why?
No, they are useless.
- would you use them if it was possible to do so? in what cases?
No, never.
Being Italian, I'm sometimes tempted to use accented vowels in my
code, but I restrain myself because of the possibility of annoying
foreign readers and the difficulty of convincing every text editor I
use to preserve them
Python code is written by many people in the world who are not familiar
with the English language, or even well-acquainted with the Latin
writing system. Such developers often desire to define classes and
functions with names in their native languages, rather than having to
come up with an (often incorrect) English translation of the concept
they want to name.
The described set of users includes linguistically intolerant people
who don't accept the use of suitable languages instead of their own,
and of compromised but readable spelling instead of the one they
prefer.
Most "people in the world who are not familiar with the English
language" are much more mature than that, even when they don't write
for international readers.
The syntax of identifiers in Python will be based on the Unicode
standard annex UAX-31 [1]_, with elaboration and changes as defined
below.
Not providing an explicit listing of allowed characters is inexcusable
sloppiness.
The XML standard is an example of how listings of large parts of the
Unicode character set can be provided clearly, exactly and (almost)
concisely.
``ID_Start`` is defined as all characters having one of the general
categories uppercase letters (Lu), lowercase letters (Ll), titlecase
letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers
(Nl), plus the underscore (XXX what are "stability extensions" listed in
UAX 31).

``ID_Continue`` is defined as all characters in ``ID_Start``, plus
nonspacing marks (Mn), spacing combining marks (Mc), decimal number
(Nd), and connector punctuations (Pc).
Am I the first to notice how unsuitable these characters are? Many of
these would be utterly invisible ("variation selectors" are Mn) or
displayed out of sequence (overlays are Mn), or normalized away
(combining accents are Mn) or absurdly strange and ambiguous (roman
numerals are Nl, for instance).

Lorenzo Gatti
May 14 '07 #76

P: n/a
Stefan Behnel a crit :
Eric Brunel wrote:
>On Mon, 14 May 2007 11:00:29 +0200, Stefan Behnel
>>Any chance there are still kanji-enabled programmes around that were
not hit
by the bomb in this scenario? They might still be able to help you get
the
code "public".
Contrarily to what one might think seeing the great achievements of
open-source software, people willing to maintain public code and/or make
it evolve seem to be quite rare. If you add burdens on such people -
such as being able to read and write the language of the original code
writer, or forcing them to request a translation or transliteration from
someone else -, the chances are that they will become even rarer...

Ok, but then maybe that code just will not become Open Source. There's a
million reasons code cannot be made Open Source, licensing being one, lack of
resources being another, bad implementation and lack of documentation being
important also.

But that won't change by keeping Unicode characters out of source code.
Nope, but adding unicode glyphs support for identifiers will only make
things worse, and we (free software authors/users/supporters)
definitively *don't* need this.
Now that we're at it, badly named english identifiers chosen by non-english
native speakers, for example, are a sure way to keep people from understanding
the code and thus from being able to contribute resources.
Broken English is certainly better than German or French or Italian when
it comes to sharing code.
I'm far from saying that all code should start using non-ASCII characters.
There are *very* good reasons why a lot of projects are well off with ASCII
and should obey the good advice of sticking to plain ASCII. But those are
mainly projects that are developed in English and use English documentation,
so there is not much of a risk to stumble into problems anyway.

I'm only saying that this shouldn't be a language restriction, as there
definitely *are* projects (I know some for my part) that can benefit from the
clarity of native language identifiers (just like English speaking projects
benefit from the English language).
As far as I'm concerned, I find "frenglish" source code (code with
identifiers in French) a total abomination. The fact is that all the
language (keywords, builtins, stdlib) *is* in English. Unless you
address that fact, your PEP is worthless (and even if you really plan to
do something about this, I still find it a very bad idea for reasons
already exposed).

The fact is also that anyone at least half-serious wrt/ CS will learn
technical English anyway. And, as other already pointed, learning
technical English is certainly not the most difficult part when it comes
to programming.
And yes, this includes spelling native
language identifiers in the native way to make them easy to read and fast to
grasp for those who maintain the code.
Yes, fine. So we end up with a code that's a mix of English (keywords,
builtins, stdlib, almost if not all third-part libs) and native
language. So, while native speakers will still have to deal with
English, non-native speakers won't be able to understand anything. Talk
about a great idea...
May 14 '07 #77

P: n/a
Stefan Behnel a crit :
Bruno Desthuilliers wrote:
>but CS is english-speaking, period.

That's a wrong assumption.
I've never met anyone *serious* about programming and yet unable to read
and write CS-oriented technical English.
I understand that people can have this impression
when they deal a lot with Open Source code, but I've seen a lot of places
where code was produced that was not written to become publicly available (and
believe me, it *never* will become Open Source).
Yeah, fine. This doesn't mean that all and every people that may have to
work on this code is a native speaker of the language used - or even
fluent enough with it.

May 14 '07 #78

P: n/a
On Mon, 14 May 2007 12:17:36 +0200, Stefan Behnel
<st******************@web.dewrote:
Eric Brunel wrote:
>On Mon, 14 May 2007 11:00:29 +0200, Stefan Behnel
>>Any chance there are still kanji-enabled programmes around that were
not hit
by the bomb in this scenario? They might still be able to help you get
the
code "public".

Contrarily to what one might think seeing the great achievements of
open-source software, people willing to maintain public code and/or make
it evolve seem to be quite rare. If you add burdens on such people -
such as being able to read and write the language of the original code
writer, or forcing them to request a translation or transliteration from
someone else -, the chances are that they will become even rarer...

Ok, but then maybe that code just will not become Open Source. There's a
million reasons code cannot be made Open Source, licensing being one,
lack of
resources being another, bad implementation and lack of documentation
being
important also.

But that won't change by keeping Unicode characters out of source code.
Maybe; maybe not. This is one more reason for a code preventing it from
becoming open-source. IMHO, there are already plenty of these reasons, and
I don't think we need a new one...
Now that we're at it, badly named english identifiers chosen by
non-english
native speakers, for example, are a sure way to keep people from
understanding
the code and thus from being able to contribute resources.
I wish we could have an option forbidding these also ;-) But now, maybe
some of my own code would no more execute when it's turned on...
I'm far from saying that all code should start using non-ASCII
characters.
There are *very* good reasons why a lot of projects are well off with
ASCII
and should obey the good advice of sticking to plain ASCII. But those are
mainly projects that are developed in English and use English
documentation,
so there is not much of a risk to stumble into problems anyway.

I'm only saying that this shouldn't be a language restriction, as there
definitely *are* projects (I know some for my part) that can benefit
from the
clarity of native language identifiers (just like English speaking
projects
benefit from the English language). And yes, this includes spelling
native
language identifiers in the native way to make them easy to read and
fast to
grasp for those who maintain the code.
My point is only that I don't think you can tell right from the start that
a project you're working on will stay private forever. See Java for
instance: Sun said for quite a long time that it wasn't a good idea to
release Java as open-source and that it was highly unlikely to happen. But
it finally did...

You could tell that the rule should be that if the project has the
slightest chance of becoming open-source, or shared with people not
speaking the same language as the original coders, one should not use
non-ASCII identifiers. I'm personnally convinced that *any* industrial
project falls into this category. So accepting non-ASCII identifiers is
just introducing a disaster waiting to happen.

But now, I have the same feeling about non-ASCII strings, and I - as a
project leader - won't ever accept a source file which has a "_*_ coding
_*_" line specifying anything else than ascii... So even if I usually
don't buy the "we're already half-dirty, so why can't we be the dirtiest
possible" argument, I'd understand if this feature went into the language.
But I personnally won't ever use it, and forbid it from others whenever
I'll be able to.
It should at least be an available option to use this feature.
If it's actually an option to the interpreter, I guess I'll just have to
alias python to 'python --ascii-only-please'...
--
python -c "print ''.join([chr(154 - ord(c)) for c in
'U(17zX(%,5.zmz5(17l8(%,5.Z*(93-965$l7+-'])"
May 14 '07 #79

P: n/a
Hi !
- should non-ASCII identifiers be supported? why?
- would you use them if it was possible to do so? in what cases?
Yes.
And, more: yes yes yes

Because:

1) when I connect Python to j(ava)Script, if the pages "connected"
contains objects with non-ascii characters, I can't use it ; snif...

2) when I connect Python to databases, if there are fields (columns)
with emphatic letters, I can't use class properties for drive these
fields. Exemples:
"cit" (french translate of "city")
"tlphone" (for phone)

And, because non-ASCII characters are possible, they are no-obligatory
; consequently guys (snobs?) want stay in pure-ASCII dimension will
can.

* sorry for my bad english *

--
@-salutations

Michel Claveau
May 14 '07 #80

P: n/a
Eric Brunel wrote:
You could tell that the rule should be that if the project has the
slightest chance of becoming open-source, or shared with people not
speaking the same language as the original coders, one should not use
non-ASCII identifiers. I'm personnally convinced that *any* industrial
project falls into this category. So accepting non-ASCII identifiers is
just introducing a disaster waiting to happen.
Not at all. If the need arises, you just translate the whole thing. Contrary
to popular belief, this is a quick and easy thing to do.

So YAGNI applies, and even if you find that you do need it, you may still have
won on the balance! As the time saved by using your native language just might
outweigh the time spent translating.

- Anders
May 14 '07 #81

P: n/a
Hendrik van Rooyen wrote:
And we have been through the Macro thingy here, and the consensus
seemed to be that we don't want people to write their own dialects.
Macros create dialects that are understood only by the three people in your
project group. It's unreasonable to compare that to a "dialect" such as
Mandarin, which is exclusive to a tiny little clique of one billion people.

- Anders
May 14 '07 #82

P: n/a
On May 14, 9:53 am, Michel Claveau
<mcPas.De.S...@mclaveauPas.De.Spam.comwrote:
- should non-ASCII identifiers be supported? why?
- would you use them if it was possible to do so? in what cases?

Yes.
And, more: yes yes yes

Because:

1) when I connect Python to j(ava)Script, if the pages "connected"
contains objects with non-ascii characters, I can't use it ; snif...

2) when I connect Python to databases, if there are fields (columns)
with emphatic letters, I can't use class properties for drive these
fields. Exemples:
"cit" (french translate of "city")
"tlphone" (for phone)

And, because non-ASCII characters are possible, they are no-obligatory
; consequently guys (snobs?) want stay in pure-ASCII dimension will
can.

* sorry for my bad english *
Can a discussion about support for non-english identifiers (1)
conducted in a group where 99.9% of the posters are fluent
speakers of english (2), have any chance of being objective
or fair?

Although probably not-sufficient to overcome this built-in
bias, it would be interesting if some bi-lingual readers would
raise this issue in some non-english Python discussion
groups to see if the opposition to this idea is as strong
there as it is here.

(1) No quibbles about the distintion between non-english
and non-ascii please.
(2) Several posters have claimed non-native english speaker
status to bolster their position, but since they are clearly at
or near native-speaker levels of fluency, that english is not
their native language is really irrelevant.

May 14 '07 #83

P: n/a
Neil Hodgson wrote:
Anton Vredegoor:
>Ouch! Now I seem to be disagreeing with the one who writes my editor.
What will become of me now?

It should be OK. I try to keep my anger under control and not cut
off the pixel supply at the first stirrings of dissent.
Thanks! I guess I won't have to make the obligatory Sovjet Russia joke
now :-)
It may be an idea to provide some more help for multilingual text
such as allowing ranges of characters to be represented as hex escapes
or character names automatically. Then someone who only normally uses
ASCII can more easily audit patches that could contain non-ASCII characters.
Now that I read about IronPython already supporting some larger
character set I feel like I'm somewhat caught in a side effect of an
embrace and extend scheme.

A.
May 14 '07 #84

P: n/a
Jarek Zgoda schrieb:
Stefan Behnel napisał(a):
>>While I can read the code with Hebrew, Russian or Greek names
transliterated to ASCII, I would not be able to read such code in native.
Then maybe it was code that was not meant to be read by you?

OK, then. As a code obfuscation measure this would fit perfectly.
I actually meant it as a measure for clarity and readability for those who are
actually meant to *read* the code.

Stefan
May 14 '07 #85

P: n/a
On May 13, 5:44 pm, "Martin v. Lwis" <mar...@v.loewis.dewrote:
- should non-ASCII identifiers be supported? why?
No. It's good convention to stick with english. And if we stick with
english, why we should need non-ASCII characters? Any non-ASCII
character makes code less readable. We never know if our code became
public.
- would you use them if it was possible to do so? in what cases?
No. I don't see any uses. I'm Polish. Polish-english mix looks funny.

May 14 '07 #86

P: n/a
This pep is not technical, or at least not only. It has
larger implications about society model we want.

Let me explain with an analogy:
let's compare 'ascii english' to coca-cola.

It's available nearly everywhere.

It does not taste good at first try, and is especially
repulsive to young children.

It's cheap and you don't expect much of it.

You know you can drink some in case of real need.

It's imperialist connotation is widely accepted(?)

But it's not good as your favorite beverage, beer, wine, ...

The world is full of other possibilities. Think, in case
of necessity you could even have to drink tea with yack
butter in himalaya! in normal circonstances, you should
never see any, but in extreme situation you may have to!

Were is freedom in such a world you could only drink coca?

I DON'T WANT TO HAVE TO DRINK COCA AT HOME ALL THE TIME.

and this pep is a glorious occasion to get free from it.
[disclaimer: coca is used here as the generic name it became,
and no real offense is intended]

--
Pierre
May 14 '07 #87

P: n/a

"Stefan Behnel" <st******************@web.dewrote in message
news:46**************@web.de...
| Sounds like CPython would better follow IronPython here.

One could also turn the argument around and say that there is no need to
follow IronPython; people who want non-ASCII identifiers can just juse
IronPython.


May 14 '07 #88

P: n/a
And Il1 O0 ?

--
@-salutations

Michel Claveau
May 14 '07 #89

P: n/a
In <mn***********************@mclaveauPas.De.Spam.com >, Michel Claveau
wrote:
And Il1 O0 ?
Hm, we should ban digits from identifier names. :-)

Ciao,
Marc 'BlackJack' Rintsch

May 14 '07 #90

P: n/a
Marc 'BlackJack' Rintsch schrieb:
In <mn***********************@mclaveauPas.De.Spam.com >, Michel Claveau
wrote:
>And Il1 O0 ?

Hm, we should ban digits from identifier names. :-)
Ah, good idea - and capital letters also. After all, they are rare enough in
English to just plain ignore their existance.

Stefan :)
May 14 '07 #91

P: n/a
On 2007-05-14, Stefan Behnel <st******************@web.dewrote:
Marc 'BlackJack' Rintsch schrieb:
>In <mn***********************@mclaveauPas.De.Spam.com >, Michel Claveau
wrote:
>>And Il1 O0 ?

Hm, we should ban digits from identifier names. :-)

Ah, good idea - and capital letters also. After all, they are
rare enough in English to just plain ignore their existance.
And I don't really see any need for using more than two
characters. With just two letters (ignoring case, of course),
you can create 676 identifiers in any namespace. That's
certainly got to be enough. If not, adding a special caracter
suffix (e.g. $,%,#) to denote the data type should sufficient
expand the namespace.

So, let's just silently ignore anything past the first two.
That way we'd be compatible with Commodor PET Basic.

[You don't want to know how long it took me to find all of the
name-collision bugs after porting a basic program from a CP/M
system which had a fairly sophisticated Basic compiler (no line
numbers, all the normal structured programming flow control
constructs) to a Commodore PET which had a really crappy BASIC
interpreter.]

--
Grant Edwards grante Yow! Am I having fun yet?
at
visi.com
May 14 '07 #92

P: n/a
Hi!

- should non-ASCII identifiers be supported? why?
- would you use them if it was possible to do so? in what cases?

Yes.

JScript can use letters with accents in identifiers
XML (1.1) can use letters with accents in tags
C# can use letters with accents in variables
SQL: MySQL/MS-Sql/Oralcle/etc. can use accents in fields or request
etc.
etc.

Python MUST make up for its lost time.
MCI


May 14 '07 #93

P: n/a
On May 14, 9:49 pm, Mta-MCI <enleverlesX.X...@XmclaveauX.comwrote:
Hi!

- should non-ASCII identifiers be supported? why?
- would you use them if it was possible to do so? in what cases?

Yes.

JScript can use letters with accents in identifiers
XML (1.1) can use letters with accents in tags
C# can use letters with accents in variables
SQL: MySQL/MS-Sql/Oralcle/etc. can use accents in fields or request
etc.
etc.

Python MUST make up for its lost time.

MCI
And generally nobody use it.
It sounds like "are for art's sake".

But OK. Maybe it'll be some impulse to learn some new languages.

+1 for this PEP

May 14 '07 #94

P: n/a
In <sl*****************@irishsea.home.craig-wood.com>, Nick Craig-Wood
wrote:
>My initial reaction is that it would be cool to use all those great
symbols. A variable called OHM etc!

This is a nice candidate for homoglyph confusion. There's the Greek
letter omega (U+03A9) Ω and the SI unit symbol (U+2126) Ω, and I think
some omegas in the mathematical symbols area too.
Under the PEP, identifiers are converted to normal form NFC, and
we have

pyunicodedata.normalize("NFC", u"\u2126")
u'\u03a9'

So, OHM SIGN compares equal to GREEK CAPITAL LETTER OMEGA. It can't
be confused with it - it is equal to it by the proposed language
semantics.

Regards,
Martin
May 14 '07 #95

P: n/a
Not providing an explicit listing of allowed characters is inexcusable
sloppiness.
That is a deliberate part of the specification. It is intentional that
it does *not* specify a precise list, but instead defers that list
to the version of the Unicode standard used (in the unicodedata
module).
The XML standard is an example of how listings of large parts of the
Unicode character set can be provided clearly, exactly and (almost)
concisely.
And, indeed, this is now recognized as one of the bigger mistakes
of the XML recommendation: they provide an explicit list, and fail
to consider characters that are unassigned. In XML 1.1, they try
to address this issue, by now allowing unassigned characters in
XML names even though it's not certain yet what those characters
mean (until they are assigned).
>``ID_Continue`` is defined as all characters in ``ID_Start``, plus
nonspacing marks (Mn), spacing combining marks (Mc), decimal number
(Nd), and connector punctuations (Pc).

Am I the first to notice how unsuitable these characters are?
Probably. Nobody in the Unicode consortium noticed, but what
do they know about suitability of Unicode characters...

Regards,
Martin
May 14 '07 #96

P: n/a
Neil Hodgson schrieb:
Paul Rubin wrote:
>>Plenty of programming languages already support unicode identifiers,

Could you name a few? Thanks.

C#, Java, Ecmascript, Visual Basic.
Specification-wise, C99 and C++98 also support Unicode identifiers,
although many compilers still don't.

For dynamic languages, Groovy also supports it.

Regards,
Martin
May 14 '07 #97

P: n/a
Martin v. Löwis:
Specification-wise, C99 and C++98 also support Unicode identifiers,
although many compilers still don't.
Ada 2005 allows Unicode identifiers and even includes the constant
'π' in Ada.Numerics.

Neil
May 14 '07 #98

P: n/a
Stefan Behnel <st******************@web.dewrites:
But then, where's the problem? Just stick to accepting only patches that are
plain ASCII *for your particular project*.
There is no feature that has ever been proposed for Python, that cannot
be supported with this argument. If you don't like having a "go to"
statement added to Python, where's the problem? Just don't use it in
your particular project.
May 15 '07 #99

P: n/a
ZeD
Neil Hodgson wrote:
Ada 2005 allows Unicode identifiers and even includes the constant
'?' in Ada.Numerics.
this. is. cool.

(oh, and +1 for the pep)

--
Under construction
May 15 '07 #100

399 Replies

This discussion thread is closed

Replies have been disabled for this discussion.