PEP 3131: Supporting Non-ASCII Identifiers

Paul Boddie wrote:

Does it really happen, or do IBM's engineers in China
or India (for example) have to write everything strictly in ASCII?

I assume they simply use american keyboards. They're working for an american
company after all. That's like the call-center people in India who learn the
superball results by heart before they go to their cubicle, just to keep
north-american callers from realising they are not connected to the shop
around the corner.

Stefan

May 15 '07 #207

Carsten Haese

On Tue, 2007-05-15 at 18:18 +0200, René Fleschenberg wrote:

Carsten Haese schrieb:
Allowing people to use identifiers in their native language would
definitely be an advantage for people from such cultures. That's the use
case for this PEP. It's easy for Euro-centric people to say "just suck
it up and use ASCII", but the same people would probably starve to death
if they were suddenly teleported from Somewhere In Europe to rural China
which is so unimaginably different from what they know that it might
just as well be a different planet. "Learn English and use ASCII" is not
generally feasible advice in such cultures.

This is a very weak argument, IMHO. How do you want to use Python
without learning at least enough English to grasp a somewhat decent
understanding of the standard library? Let's face it: To do any "real"
programming, you need to know at least some English today, and I don't
see that changing anytime soon. And it is definitely not going to be
changed by allowing non-ASCII identifiers.

Even if it were impossible to do "real programming" in Python without
knowing English (which I will neither accept nor reject because I don't
have enough data either way), I don't think Python should be restricted
to "real" programming only. Python (the programming language) is an
inherently easy-to-learn language. I find it quite plausible that
somebody in China might want to teach their students programming before
teaching them English. The posts on this thread by a teacher from China
confirm this suspicion.

Once the students learn Python and realize that there are lots of Python
resources "out there" that are only in English, that will be a
motivation for them to learn English. Requiring all potential Python
programmers to learn English first (or assuming that they know English
already) is an unacceptable barrier of entry.

--
Carsten Haese
http://informixdb.sourceforge.net

May 15 '07 #208

Duncan Booth

Donn Cave <do**@u.washington.eduwrote:

[Spanish in Brazil? Not as much as you might think.]

Sorry temporary[*] brain failure, I really do know it is Portugese.
[*] I hope.

May 15 '07 #209

John Nagle

There are really two issues here, and they're being
confused.

One is allowing non-English identifiers, which is a political
issuer. The other is homoglyphs, two characters which look the same.
The latter is a real problem in a language like Python with implicit
declarations. If a maintenance programmer sees a variable name
and retypes it, they may silently create a new variable.

If Unicode characters are allowed, they must be done under some
profile restrictive enough to prohibit homoglyphs. I'm not sure
if UTS-39, profile 2, "Highly Restrictive", solves this problem,
but it's a step in the right direction. This limits mixing of scripts
in a single identifier; you can't mix Hebrew and ASCII, for example,
which prevents problems with mixing right to left and left to right
scripts. Domain names have similar restrictions.

We have to have visually unique identifiers.

There's also an issue with implementations that interface
with other languages. Some Python implementations generate
C, Java, or LISP code. Even CPython will call C code.
The representation of external symbols needs to be standardized
across those interfaces.

John Nagle

May 15 '07 #210

John Nagle wrote:

There are really two issues here, and they're being
confused.

One is allowing non-English identifiers, which is a political
issuer. The other is homoglyphs, two characters which look the same.
The latter is a real problem in a language like Python with implicit
declarations. If a maintenance programmer sees a variable name
and retypes it, they may silently create a new variable.

If Unicode characters are allowed, they must be done under some
profile restrictive enough to prohibit homoglyphs. I'm not sure
if UTS-39, profile 2, "Highly Restrictive", solves this problem,
but it's a step in the right direction. This limits mixing of scripts
in a single identifier; you can't mix Hebrew and ASCII, for example,
which prevents problems with mixing right to left and left to right
scripts. Domain names have similar restrictions.

We have to have visually unique identifiers.

As others stated before, this is unlikely to become a problem in practice.
Project-internal standards will usually define a specific language for a
project, in which case these issues will not arise. In general, programmers
from a specific language/script background will stick to that script and not
magically start typing foreign characters. And projects where multiple
languages are involved will have to define a target language anyway, most
likely (although not necessarily) English.

Note that adherence to a specific script can easily checked programmatically
through Unicode ranges - if the need ever arises.

Stefan

May 15 '07 #211

Thorsten Kampe

* Duncan Booth (15 May 2007 17:30:58 GMT)

Donn Cave <do**@u.washington.eduwrote:
[Spanish in Brazil? Not as much as you might think.]

Sorry temporary[*] brain failure, I really do know it is Portugese.

Yes, you do. Spanish is what's been used in the United States, right?

Thorsten

May 15 '07 #212

RenÃ© Fleschenberg a Ã©crit :

IMO, the burden of proof is on you. If this PEP has the potential to
introduce another hindrance for code-sharing, the supporters of this PEP
should be required to provide a "damn good reason" for doing so. So far,
you have failed to do that, in my opinion. All you have presented are
vague notions of rare and isolated use-cases.

you want to limit my liberty of using appealing names in my language.

this alone should be enough to accept the pep!

May 15 '07 #213

RenÃ© Fleschenberg a Ã©crit :

Your example does not prove much. The fact that some people use
non-ASCII identifiers when they can does not at all prove that it would
be a serious problem for them if they could not.

i have to make orthograph mistakes in my code to please you?
--
Pierre

May 15 '07 #214

hello

i work for a large phone maker, and for a long time
we thought, very arrogantly, our phones would be ok
for the whole world.

After all, using a phone uses so little words, and
some of them where even replaced with pictograms!
every body should be able to understand appel, bis,
renvoi, mévo, ...

nowdays we make chinese, corean, japanese talking
phones.

because we can do it, because graphics are cheaper
than they were, because it augments our market.
(also because some markets require it)

see the analogy?

of course, +1 for the pep
--
Pierre

May 15 '07 #215

rurpy

"Hendrik van Rooyen" <ma**@microcorp.co.zawrote in message
news:ma***************************************@pyt hon.org...

<ru***@yahoo.comwrote:

[I fixed the broken attribution in your quote]

(2) Several posters have claimed non-native english speaker
status to bolster their position, but since they are clearly at
or near native-speaker levels of fluency, that english is not
their native language is really irrelevant.

I dispute the irrelevance strongly - I am one of the group referred
to, and I am here on this group because it works for me - I am not
aware of an Afrikaans python group - but even if one were to
exist - who, aside from myself, would frequent it? - would I have
access to the likes of the effbot, Steve Holden, Alex Martelli,
Irmen de Jongh, Eric Brunel, Tim Golden, John Machin, Martin
v Loewis, the timbot and the Nicks, the Pauls and other Stevens?

I didn't say that your (as a fluent but non-native English
speaker) views are irrelevant, only that when you say,
"I am a native speaker of Afrikaans and I don't want non-
ascii identifiers" it shouldn't carry any more weight
that if I (as a native English speaker) say the same
thing. (But I wouldn't of course :-).

My point was that this entire discussion is by English
speakers and that a consesious by such a group, that
non-english identfiers are bad, is neither surprising nor
legitimate.

- I somehow doubt it.

Fragmenting this resource into little national groups based
on language would be silly, if not downright stupid, and it seems
to me just as silly to allow native identifiers without also
allowing native reserved words, because you are just creating
a mess that is neither fish nor flesh if you do.

It already is fragmented. There is a Japanese Python users
group, complete with discussion forums, all in Japanese, not
English. Another poster said he was going to bring up this
issue on a French language discussion group.

How can you possibly propose that some authority should
decide what language a group of people should use to
discuss a common interest?!

And the downside to going the whole hog would be as follows:

Nobody would even want to look at my code if I write
"terwyl" instead of 'while', and "werknemer" instead of
"employee" - so where am I going to get help, and how,
once I am fully Python fit, can I contribute if I insist on
writing in a splinter language?

First "while" is a keyword and will remain "while" so
that has nothing to do with anything.

If nobody want to look at your code, it is not
the use of "werknemer" that is the cause. If you used
that as an identifier that I assume you decided
your code was exclusively of interest to Afrikaans
speakers. Otherwise use you would have used English
for for that indentifier. The point is that *you*
are in the best position to decide that, not the
designers of the language.

And while the Mandarin language group could be big enough
to be self sustaining, is that true of for example Finnish?

So I don't think my opinion on this is irrelevant just because
I miss spent my youth reading books by Pelham Grenfell
Wodehouse, amongst others.

And I also don't regard my own position as particularly unique
amongst python programmers that don't speak English as
their native language

Like I said, that English is not your native language is
irrelevant -- what matters is that you now speak English
fluently. Thus you are an English speaker argueing that
excluding non-english identifiers is not a problem.

May 15 '07 #216

Michel Claveau

Hi!

Yes, for legibility.

If letters with accents are possible:

d=dict(numéro=1234, name='Löwis', prénom='Martin', téléphone='+33123')

or

p1 = personn() #class
p1.numéro = 1234
p1.name='Löwis'
p1.prénom='Martin'
p1.téléphone='+33123'

Imagine the same code, is accents are not possible...

Don't forget: we must often be connected to databases who already
exists

--
@-salutations

Michel Claveau

May 15 '07 #217

Javier Bezos schrieb:

>But having, for example, things like open() from the stdlib in your code
and then Ã¶ffnen() as a name for functions/methods written by yourself is
just plain silly. It makes the code inconsistent and ugly without
significantly improving the readability for someone who speaks German
but not English.

Agreed. I always use English names (more or
less :-)), but this is not the PEP is about.

We all know what the PEP is about (we can read). The point is: If we do
not *need* non-English/ASCII identifiers, we do not need the PEP. If the
PEP does not solve an actual *problem* and still introduces some
potential for *new* problems, it should be rejected. So far, the
"problem" seems to just not exist. The burden of proof is on those who
support the PEP.

--
RenÃ©

May 15 '07 #218

rurpy

On May 15, 7:44 am, George Sakkis <george.sak...@gmail.comwrote:

After 175 replies (and counting), the only thing that is clear is the
controversy around this PEP. Most people are very strong for or
against it, with little middle ground in between. I'm not saying that
every change must meet 100% acceptance, but here there is definitely a
strong opposition to it. Accepting this PEP would upset lots of people
as it seems, and it's interesting that quite a few are not even native
english speakers.

As I pointed out in a previous post,
http://groups.google.com/group/comp....9d327a6?hl=en&
http://groups.google.com/group/comp....25623c4?hl=en&
whether a person is or is not a native English speaker is
irrelevant -- what is relevant is their current ability with
English.
And my impression is that neally all of posts from people not
fluent in English (judging from grammar mistakes and such)
are in favor of the PEP.

May 15 '07 #219

rurpy

On May 15, 11:08 am, Carsten Haese <cars...@uniqsys.comwrote:
snip

Once the students learn Python and realize that there are lots of Python
resources "out there" that are only in English, that will be a
motivation for them to learn English. Requiring all potential Python
programmers to learn English first (or assuming that they know English
already) is an unacceptable barrier of entry.

One the big concerns seems to be a hypothesized
negative impact on code sharing. Nobody has considered
the positive impact resulting from making Python
more accessible to non-English speakers, some of
whom will go on to become wiling and able to contribute
open "English python" code to the community. This
positive impact may well outweigh the negative.

May 15 '07 #220

rurpy

On May 15, 10:18 am, René Fleschenberg <r...@korteklippe.dewrote:

Carsten Haese schrieb:

Allowing people to use identifiers in their native language would
definitely be an advantage for people from such cultures. That's the use
case for this PEP. It's easy for Euro-centric people to say "just suck
it up and use ASCII", but the same people would probably starve to death
if they were suddenly teleported from Somewhere In Europe to rural China
which is so unimaginably different from what they know that it might
just as well be a different planet. "Learn English and use ASCII" is not
generally feasible advice in such cultures.

This is a very weak argument, IMHO. How do you want to use Python
without learning at least enough English to grasp a somewhat decent
understanding of the standard library? Let's face it: To do any "real"
programming, you need to know at least some English today, and I don't
see that changing anytime soon. And it is definitely not going to be
changed by allowing non-ASCII identifiers.

snip

Another way of framing this discussion could be, "should
Python continue to maintain a barrier to it's use by non-English
speakers if it is not necessary?"

Virtually every guide to programming style I have ever read stresses
the importance of variable naming. For example, the Wikipedia article
"programming style" mentions variable naming right after layout
(indentation, etc) in importance:

"Appropriate choices for variable names are seen as the keystone
for good style. Poorly-named variables make code harder to read
and understand"

Even when English-as-non-native-language speakers can understand
English words, the level and speed of compression is often far below
that of their native language. Denying the ability to use native
language
identifiers puts these people at a significant disadvantage compared
to English speakers with regard to reading (their own!) code.
And the justification for this is the hypothetical case that someone
who doesn't understand that language *might* *someday* have to
read it. Besides the large number of programs that will never be
public (far larger than most of the worriers think is my guess), even
in public programs this is not necessarily a disaster. A public
application
written in "Chinese Python" might work perfectly and be completely
usable by me, even if it is difficult for me to understand. And why
should my difficulty count for more than a Chinese person's
difficultly
in understanding my "English Python" application?

That Python keywords are English is unimportant -- they are a small
finite set that can be memorized. Identifiers are a large unbounded
set that can't be.

That the standard library code and documentation is in English
is irrelevant. One shouldn't need to read the standard library code
to use it. (That one sometimes has to is a Python flaw that should
be fixed -- not bandaided by requiring Python programmers to
know English).

There is no need to understand english to use the standard library.
Documentation has and will (as Python becomes more popular) be
translated into native languages. Here is Python standard library
documentation in Japanese:

http://www.python.jp/doc/release/lib/

While encouraging English in shared/public code is fine,
trying by enforce it by continuing to enforce ascii-only identifiers
smacks to me of a "whites only country club" mentality.

Making Python more accessible to the world (the vast majority
of whom do not speak English) can only advance Python.

May 15 '07 #221

RenÃ© Fleschenberg wrote:

We all know what the PEP is about (we can read).

BTW: who is this "we" if it doesn't include you?

Stefan

May 15 '07 #222

RenÃ© Fleschenberg wrote:

Javier Bezos schrieb:

>>But having, for example, things like open() from the stdlib in your code
and then Ã¶ffnen() as a name for functions/methods written by yourself is
just plain silly. It makes the code inconsistent and ugly without
significantly improving the readability for someone who speaks German
but not English.
Agreed. I always use English names (more or
less :-)), but this is not the PEP is about.

We all know what the PEP is about (we can read). The point is: If we do
not *need* non-English/ASCII identifiers, we do not need the PEP. If the
PEP does not solve an actual *problem* and still introduces some
potential for *new* problems, it should be rejected. So far, the
"problem" seems to just not exist. The burden of proof is on those who
support the PEP.

The main problem here seems to be proving the need of something to people who
do not need it themselves. So, if a simple "but I need it because a, b, c" is
not enough, what good is any further prove?

Stefan

May 15 '07 #223

MRAB

On May 15, 6:44 pm, John Nagle <n...@animats.comwrote:

There are really two issues here, and they're being
confused.

One is allowing non-English identifiers, which is a political
issuer. The other is homoglyphs, two characters which look the same.
The latter is a real problem in a language like Python with implicit
declarations. If a maintenance programmer sees a variable name
and retypes it, they may silently create a new variable.

If Unicode characters are allowed, they must be done under some
profile restrictive enough to prohibit homoglyphs. I'm not sure
if UTS-39, profile 2, "Highly Restrictive", solves this problem,
but it's a step in the right direction. This limits mixing of scripts
in a single identifier; you can't mix Hebrew and ASCII, for example,
which prevents problems with mixing right to left and left to right
scripts. Domain names have similar restrictions.

We have to have visually unique identifiers.

There's also an issue with implementations that interface
with other languages. Some Python implementations generate
C, Java, or LISP code. Even CPython will call C code.
The representation of external symbols needs to be standardized
across those interfaces.

Surely it should be possible programmatically to compare the visual
appearance of the characters and highlight ones which are similar, or
colour-code various subsets when required.

May 15 '07 #224

On Tue, 15 May 2007 12:17:09 +0200, RenÃ© Fleschenberg wrote:

Steven D'Aprano schrieb:
>How is that different from misreading "disk_burnt = True" as "disk_bumt
= True"? In the right (or perhaps wrong) font, like the ever-popular
Arial, the two can be visually indistinguishable. Or "call" versus
"cal1"?

That is the wrong question. The right question is: Why do you want to
introduce *more* possibilities to do such mistakes? Does this PEP solve
an actual problem, and if so, is that problem big enough to be worth the
introduction of these new risks and problems?

But they aren't new risks and problems, that's the point. So far, every
single objection raised ALREADY EXISTS in some form or another. There's
all this hysteria about the problems the proposed change will cause, but
those problems already exist. When was the last time a Black Hat tried to
smuggle in bad code by changing an identifier from xyz0 to xyzO?

I think it is not. I think that the problem only really applies to very
isolated use-cases.

Like the 5.5 billion people who speak no English.

So isolated that they do not justify a change to
mainline Python. If someone thinks that non-ASCII identifiers are really
needed, he could maintain a special Python branch that supports them. I
doubt that there would be alot of demand for it.

Maybe so. But I guarantee with a shadow of a doubt that if the change
were introduced, people would use it -- even if right now they say they
don't want it.

--
Steven.

May 16 '07 #225

Matthew Woodcraft

Thorsten Kampe <th******@thorstenkampe.dewrote:

>* René Fleschenberg (Tue, 15 May 2007 14:35:33 +0200)
>I am talking about the stdlib, not about the very few keywords Python
has. Are you going to re-write the standard library in your native
language so you can have a consistent use of natural language among your
code?

Why would I want to do that? It's not my code. Identifier names are
mine. If I use modules from standard library I use some "foreign
words". There's no problem in that.

It could even be an advantage. I sometimes find that I have to use a
'second best' name myself because I want to avoid the possible
confusion caused if I choose a name which has a well-known existing
use.

-M-

May 16 '07 #226

On Tue, 15 May 2007 09:09:30 +0200, Eric Brunel wrote:

Joke aside, this just means that I won't ever be able to program math in
ADA, because I have absolutely no idea on how to do a 'pi' character on
my keyboard.

Maybe you should find out then? Personal ignorance is never an excuse for
rejecting technology.
--
Steven

May 16 '07 #227

On Tue, 15 May 2007 12:01:57 +0200, RenÃ© Fleschenberg wrote:

Marc 'BlackJack' Rintsch schrieb:
>You find it in the sources by the line number from the traceback and
the letters can be copy'n'pasted if you don't know how to input them
with your keymap or keyboard layout.

Typing them is not the only problem. They might not even *display*
correctly if you don't happen to use a font that supports them.

Then maybe you should catch up to the 21st century and install some fonts
and a modern editor.
--
Steven.

May 16 '07 #228

On Tue, 15 May 2007 20:43:31 +1000, Aldo Cortesi wrote:

Thus spake Steven D'Aprano (st****@REMOVE.THIS.cybersource.com.au):

>Me, I try to understand a patch by reading it. Call me
old-fashioned.

I concur, Aldo. Indeed, if I _can't_ be sure I understand a patch, I
don't accept it -- I ask the submitter to make it clearer.

Yes, but there is a huge gulf between what Aldo originally said he does
("visual inspection") and *reading and understanding the code*.

Let's set aside the fact that you're guilty of sloppy quoting here,
since the phrase "visual inspection" is yours, not mine.

Yes, my bad, I apologize, that was sloppy of me. What you actually said
was "I can't reliably verify it by eye".

Regardless,
your interpretation of my words is just plain dumb. My phrasing was
intended to draw attention to the fact that one needs to READ code in
order to understand it. You know - with one's eyes. VISUALLY. And VISUAL
INSPECTION of code becomes unreliable if this PEP passes.

Not withstanding my misquote, I find it ... amusing ... that after
hauling me over the coals for using the term "visual inspection", you're
not only using it, but shouting it.

Perhaps you aren't aware that doing something "by eye" is idiomatic
English for doing it quickly, roughly, imprecisely. It is the opposite of
taking the time and effort to do the job carefully and accurately. If you
measure something "by eye", you just look at it and take a guess.

So, as I said, if you're relying on VISUAL INSPECTION (your words _now_)
you're already vulnerable. Fortunately for you, you're not relying on
visual inspection, you are actually _reading_ and _comprehending_ the
code. That might even mean, in extreme cases, you sit down with pencil
and paper and sketch out the program flow to understand what it is doing.

Now that (I hope!) you understand why I said what I said, can we agree
that _understanding_ is critical to the process? If you don't understand
the code, you don't accept it. If somebody submits a patch with
identifiers like a9472302 and a 9473202 you're going to reject it as too
difficult to understand.

How do non-ASCII identifiers change that situation? What will be
different?

>If I've understood Martin's post, the PEP states that identifiers are
converted to normal form. If two identifiers look the same, they will
be the same.

I'm sorry to have to tell you, but you understood Martin's post no
better than you did mine. There is no general way to detect homoglyphs
and "convert them to a normal form". Observe:

import unicodedata
print repr(unicodedata.normalize("NFC", u"\u2160")) print u"\u2160"
print "I"

Yes, I observe two very different glyphs, as different as the ASCII
characters I and |. What do you see?

So, a round 0 for reading comprehension this lesson, I'm afraid. Better
luck next time.

Ha ha, very funny.

So, let's summarize...

Non-ASCII identifiers are bad, because they are vulnerable to the exact
same problems as ASCII identifiers, only we're happy to live with those
problems if they are ASCII, and just install a font that makes I and l
look different, but we won't install a font that makes I and â…* look
different, because that's too hard.

Well, you've convinced me. Obviously expecting Python programmers to cope
with something as complicated as installing a decent set of fonts is such
a major huddle that people will abandon the language in droves, probably
taking up Haskel and Visual Basic and Lisp and all those other languages
that allow non-ASCII identifiers.

--
Steven.

May 16 '07 #229

On Tue, 15 May 2007 13:05:12 +0200, RenÃ© Fleschenberg wrote:

Any program that uses non-English identifiers in Python is bound to
become gibberish, since it *will* be cluttered with English identifiers
all over the place anyway, wether you like it or not.

It won't be gibberish to the people who speak the language.
--
Steven.

May 16 '07 #230

On Tue, 15 May 2007 14:44:44 +0200, Anton Vredegoor wrote:

HYRY wrote:

>>- should non-ASCII identifiers be supported? why?
Yes. I want this for years. I am Chinese, and teaching some 12 years
old children learning programming. The biggest problem is we cannot use
Chinese words for the identifiers. As the program source becomes
longer, they always lost their thought about the program logic.

That is probably because they are just entering the developmental phase
of being able to use formal operational reasoning. I can understand that
they are looking for something to put the blame on but it is an error to
give in to the idea that it is hard for 12 year olds to learn a foreign
language. You realize that children learn new languages a lot faster
than adults?

Children soak up new languages between the ages of about one and four. By
12, they're virtually adults as far as learning new languages.

Again, it's probably not the language but the formal logic they have
problems with.

You have zero evidence for that, you're just applying your own
preconceptions and ignoring what HYRY has told you.

Please do *not* conclude that some child is not very good
at math or logic or programming when they are slow at first.

You're the one saying they're having problems with logic, not HYRY. He's
saying they are having problems with English.

--
Steven.

May 16 '07 #231

On Tue, 15 May 2007 11:58:35 +0200, RenÃ© Fleschenberg wrote:

Unless you are 150% sure that there will *never* be the need for a
person who does not know your language of choice to be able to read or
modify your code, the language that "fits the environment best" is
English.

Just a touch of hyperbole perhaps?

You know, it may come to a surprise to some people that English is not
the only common language. In fact, it only ranks third, behind Mandarin
and Spanish, and just above Arabic. Although the exact number of speakers
vary according to the source you consult, the rankings are quite stable:
Mandarin, Spanish, then English. Any of those languages could equally
have claim to be the world's lingua franca.

And interestingly, with only one billion English speakers (as a first or
second language) in the world, and 5.5 billion people who don't speak
English, I think its probably fair to say that it is a small minority
that speak English.
--
Steven.

May 16 '07 #232

On Tue, 15 May 2007 10:44:37 -0700, John Nagle wrote:

We have to have visually unique identifiers.

Well, Python has existed for years without such a requirement, so I think
"have to" is too strong a term.

Compare:

thisisareallylongbutcompletelylegalidentiferandnot visuallyuniqueataglance

with

thisisareallylongbutcompletelylegalidentiferadnnot visuallyuniqueataglance

I imagine, decades ago, people arguing against the introduction of long
identifiers because of the risk that their projects will be flooded with
Black Hats trying to slip one over them by using the vulnerability cause
by really long identifiers. I can just see people banging away on their
keyboard, swearing black and blue that identifiers of more than four
characters are completely unnecessary (who needs more than 450,000
variables in a program?) and will just cause the End Of Programming As We
Know It.

rn = m = None
IIl0 = IlIO = None

I'm sure that the Python community has zero sympathy for anyone
suggesting that Python should _enforce_ rules like "don't use a single l
as an identifier", even if they have complete sympathy with anybody who
has such a rule in their own projects.
--
Steven.

May 16 '07 #233

Aldo Cortesi

Thus spake Steven D'Aprano (st****@REMOVE.THIS.cybersource.com.au):

Perhaps you aren't aware that doing something "by eye" is idiomatic
English for doing it quickly, roughly, imprecisely. It is the opposite of
taking the time and effort to do the job carefully and accurately. If you
measure something "by eye", you just look at it and take a guess.

Well, Steve, speaking as someone not entirely unfamiliar with idiomatic
English, I can say with some confidence that that's complete and utter bollocks
(idomatic usage for "nonsense", by the way). To do something "by eye" means
nothing more nor less than doing it visually. Unless you can provide a citation
to the contrary, please move on from this petty little point of yours, and try
to make a substantial technical argument instead.

So, as I said, if you're relying on VISUAL INSPECTION (your words _now_)
you're already vulnerable. Fortunately for you, you're not relying on
visual inspection, you are actually _reading_ and _comprehending_ the
code. That might even mean, in extreme cases, you sit down with pencil
and paper and sketch out the program flow to understand what it is doing.

Please, pick up a dictionary, and look up "visual" and "inspection", then
re-read my message. Ponder the fact that visual inspection is in fact a
necessary precursor to "reading" or "comprehending" code. Now, imagine reading
a piece of code where you can never be sure that a character is what it appears
to be...

If I've understood Martin's post, the PEP states that identifiers are
converted to normal form. If two identifiers look the same, they will
be the same.
I'm sorry to have to tell you, but you understood Martin's post no
better than you did mine. There is no general way to detect homoglyphs
and "convert them to a normal form". Observe:

import unicodedata
print repr(unicodedata.normalize("NFC", u"\u2160")) print u"\u2160"
print "I"

Yes, I observe two very different glyphs, as different as the ASCII
characters I and |. What do you see?

I recommend that you gain a basic understanding of the relationship between
Unicode code points and the glyphs on your screen before attempting to argue
this point again. The particular glyph your current font-set translates the
character into is irrelevant. Indeed, the fact that there is font variation
from client to client is one of the more obvious problems with your technically
illiterate hope that one could homogenize characters so that everything that
looks the same has the same meaning. Fiddle around with your fontsets a bit -
you only have to find one combination where the two glyps look the same to
prove my case...

Regards,

Aldo

--
Aldo Cortesi
al**@nullcube.com
http://www.nullcube.com
Mob: 0419 492 863

May 16 '07 #234

I've made various comments to other people's responses, so I guess it is
time to actually respond to the PEP itself.

On Sun, 13 May 2007 17:44:39 +0200, Martin v. LÃ¶wis wrote:

PEP 1 specifies that PEP authors need to collect feedback from the
community. As the author of PEP 3131, I'd like to encourage comments to
the PEP included below, either here (comp.lang.python), or to
py*********@python.org

In summary, this PEP proposes to allow non-ASCII letters as identifiers
in Python. If the PEP is accepted, the following identifiers would also
become valid as class, function, or variable names: LÃ¶ffelstiel, changÃ©,
Ð¾ÑˆÐ¸Ð±ÐºÐ°, or å£²ã‚Šå*´ (hoping that the latter one means "counter").

I believe this PEP differs from other Py3k PEPs in that it really
requires feedback from people with different cultural background to
evaluate it fully - most other PEPs are culture-neutral.

So, please provide feedback, e.g. perhaps by answering these questions:
- should non-ASCII identifiers be supported? why? - would you use them
if it was possible to do so? in what cases?

It seems to me that none of the objections to non-ASCII identifiers are
particularly strong. I've heard many accusations that they will introduce
"vulnerabilities", by analogy to unicode attacks in URLs, but I haven't
seen any credible explanations of how these vulnerabilities would work,
or how they are any different to existing threats. That's not to say that
there isn't a credible threat, but if there is, nobody has come close to
explaining it.

I would find it useful to be able to use non-ASCII characters for heavily
mathematical programs. There would be a closer correspondence between the
code and the mathematical equations if one could write Î”(Âµ*Ï€) instead of
delta(mu*pi).

(Aside: I wonder what the Numeric crowd would say about this?)

--
Steven.

May 16 '07 #235

Gregor Horvath

>
We all know what the PEP is about (we can read). The point is: If we do
not *need* non-English/ASCII identifiers, we do not need the PEP. If the
PEP does not solve an actual *problem* and still introduces some
potential for *new* problems, it should be rejected. So far, the
"problem" seems to just not exist. The burden of proof is on those who
support the PEP.

A good product does not only react to problems but acts.

Solving current problems is only one thing. Great products are exploring
new ways, ideas and possibilities according to their underlying vision.

Python has a vision of being easy even for newbies to programming.
Making it easier for non native English speakers is a step forward in
this regard.

Gregor

May 16 '07 #236

Gregor Horvath

Ross Ridge schrieb:

non-ASCII identifiers. While it's easy to find code where comments use
non-ASCII characters, I was never able to find a non-made up example
that used them in identifiers.

If comments are allowed to be none English, then why are identifier not?
This is inconsistent because there is a correlation between identifier
and comment.

The best identifier is one that needs no comment, because it
self-describes it's content. None English identifiers enhance the
meaning of identifiers for some projects. So why forbid them? We are all
adults.

Gregor

May 16 '07 #237

Alex Martelli

Aldo Cortesi <al**@nullcube.comwrote:

Thus spake Steven D'Aprano (st****@REMOVE.THIS.cybersource.com.au):

Perhaps you aren't aware that doing something "by eye" is idiomatic
English for doing it quickly, roughly, imprecisely. It is the opposite of
taking the time and effort to do the job carefully and accurately. If you
measure something "by eye", you just look at it and take a guess.

Well, Steve, speaking as someone not entirely unfamiliar with idiomatic
English, I can say with some confidence that that's complete and utter
bollocks (idomatic usage for "nonsense", by the way). To do something "by
eye" means nothing more nor less than doing it visually. Unless you can
provide a citation to the contrary, please move on from this petty little
point of yours, and try to make a substantial technical argument instead.

I can't find any reference for Steven's alleged idiomatic use of "by
eye", either -- _however_, my wife Anna (an American from Minnesota)
came up with exactly the same meaning when I asked her if "by eye" had
any idiomatic connotations, so I suspect it is indeed there, at least in
the Midwest. Funniest, of course, is that the literal translation into
Italian, "a occhio", has a similiar idiomatic meaning to _any_ native
speaker of Italian -- and THAT one is even in the Italian wikipedia!-)

I'll be the first to admit that this issue has nothing to do with the
substance of the argument (on which my wife, also my co-author of the
2nd ed of the Python Cookbook and a fellow PSF member, deeply agrees
with you, Aldo, and me), but natural language nuances and curios are my
third-from-the-top most consuming interest (after programming and...
Anna herself!-).

[[_Visual inspection_ plays a crucial role in many areas of engineering,
of course; for example, visual inspection of welds is a very reliable,
although costly, quality assurance process, particularly if you ensure
that the inspectors hold the top professional degrees from the American
Welding Society (if you're operating in the USA:-)]].
Alex

May 16 '07 #238

sjdevnull

Steven D'Aprano wrote:

On Tue, 15 May 2007 12:01:57 +0200, Rene Fleschenberg wrote:

Marc 'BlackJack' Rintsch schrieb:
You find it in the sources by the line number from the traceback and
the letters can be copy'n'pasted if you don't know how to input them
with your keymap or keyboard layout.
Typing them is not the only problem. They might not even *display*
correctly if you don't happen to use a font that supports them.

Then maybe you should catch up to the 21st century and install some fonts
and a modern editor.

It's not just about fonts installed on my desktop. I still do a _lot_
of debugging/code browsing remotely over terminal connections. I
still often have to sit down at someone else's machine and help them
troubleshoot, often going through the stack trace for whatever package
they're using--and I don't have control over which fonts they decide
to install. Even simple high-bit latin1 characters differ on vanilla
Windows machines vs. vanilla Linux/Mac machines. I even sometimes
read code snippets on email lists and websites from my handheld, which
is sadly still memory-limited enough that I'm really unlikely to
install anything approaching a full set of Unicode fonts.

May 16 '07 #239

sjdevnull

Steven D'Aprano wrote:

I've made various comments to other people's responses, so I guess it is
time to actually respond to the PEP itself.

On Sun, 13 May 2007 17:44:39 +0200, Martin v. Lo:wis wrote:

PEP 1 specifies that PEP authors need to collect feedback from the
community. As the author of PEP 3131, I'd like to encourage comments to
the PEP included below, either here (comp.lang.python), or to
py*********@python.org

In summary, this PEP proposes to allow non-ASCII letters as identifiers
in Python. If the PEP is accepted, the following identifiers would also
become valid as class, function, or variable names: Lo:ffelstiel, change,
oshibka, or ***ri*** (hoping that the latter one means "counter").

I believe this PEP differs from other Py3k PEPs in that it really
requires feedback from people with different cultural background to
evaluate it fully - most other PEPs are culture-neutral.

So, please provide feedback, e.g. perhaps by answering these questions:
- should non-ASCII identifiers be supported? why? - would you use them
if it was possible to do so? in what cases?

It seems to me that none of the objections to non-ASCII identifiers are
particularly strong. I've heard many accusations that they will introduce
"vulnerabilities", by analogy to unicode attacks in URLs, but I haven't
seen any credible explanations of how these vulnerabilities would work,
or how they are any different to existing threats. That's not to say that
there isn't a credible threat, but if there is, nobody has come close to
explaining it.

I would find it useful to be able to use non-ASCII characters for heavily
mathematical programs. There would be a closer correspondence between the
code and the mathematical equations if one could write D(u*p) instead of
delta(mu*pi).

Just as one risk here:
When reading the above on Google groups, it showed up as "if one could
write ?(u*p)..."
When quoting it for response, it showed up as "could write D(u*p)".

I'm sure that the symbol you used was neither a capital letter d nor a
question mark.

Using identifiers that are so prone to corruption when posting in a
rather popular forum seems dangerous to me--and I'd guess that a lot
of source code highlighters, email lists, etc have similar problems.
I'd even be surprised if some programming tools didn't have similar
problems.

May 16 '07 #240

Grant Edwards

On 2007-05-16, Alex Martelli <al***@mac.comwrote:

Aldo Cortesi <al**@nullcube.comwrote:

>Thus spake Steven D'Aprano (st****@REMOVE.THIS.cybersource.com.au):

Perhaps you aren't aware that doing something "by eye" is idiomatic
English for doing it quickly, roughly, imprecisely. It is the opposite of
taking the time and effort to do the job carefully and accurately. If you
measure something "by eye", you just look at it and take a guess.

Well, Steve, speaking as someone not entirely unfamiliar with idiomatic
English, I can say with some confidence that that's complete and utter
bollocks (idomatic usage for "nonsense", by the way). To do something "by
eye" means nothing more nor less than doing it visually. Unless you can
provide a citation to the contrary, please move on from this petty little
point of yours, and try to make a substantial technical argument instead.

I can't find any reference for Steven's alleged idiomatic use of "by
eye", either -- _however_, my wife Anna (an American from Minnesota)
came up with exactly the same meaning when I asked her if "by eye" had
any idiomatic connotations, so I suspect it is indeed there, at least in
the Midwest.

That's what it means to me (I'm also from the upper midwest).
One also hears the phrase "eyeball it" the the same context:
"You don't need to measure that, just eyeball it."

--
Grant Edwards grante Yow! BARBARA STANWYCK
at makes me nervous!!
visi.com

May 16 '07 #241

Aldo Cortesi

Thus spake Alex Martelli (al***@mac.com):

I can't find any reference for Steven's alleged idiomatic use of "by
eye", either -- _however_, my wife Anna (an American from Minnesota)
came up with exactly the same meaning when I asked her if "by eye" had
any idiomatic connotations, so I suspect it is indeed there, at least in
the Midwest. Funniest, of course, is that the literal translation into
Italian, "a occhio", has a similiar idiomatic meaning to _any_ native
speaker of Italian -- and THAT one is even in the Italian wikipedia!-)

I'll be the first to admit that this issue has nothing to do with the
substance of the argument (on which my wife, also my co-author of the
2nd ed of the Python Cookbook and a fellow PSF member, deeply agrees
with you, Aldo, and me), but natural language nuances and curios are my
third-from-the-top most consuming interest (after programming and...
Anna herself!-).

I must admit to a fascination with language myself - I even have a degree in
English literature to prove it! To be fair to Steven, I've asked some of my
colleagues here in Sydney about their reactions to the phrase "by eye", and
none of them have yet come up with anything that has the strong pejorative
taint Steven gave it. At any rate, it's clear that the phrase is not well
defined anywhere (not even in the OED), and I'm sure there are substantial
regional variations in interpretation.

In cases like these, however, context is paramount, so I will quote sentences
that started this petty bickering:

The security implications have not been sufficiently explored. I don't want
to be in a situation where I need to mechanically "clean" code (say, from a
submitted patch) with a tool because I can't reliably verify it by eye.

Surely, in context, the meaning is clear? "By eye" here means nothing more nor
less than a literal reading suggests. Taking these sentences to be an argument
for a slip-shod, careless approach to code, as Steven did, is surely perverse.

Regards,

Aldo

--
Aldo Cortesi
al**@nullcube.com
http://www.nullcube.com
Mob: 0419 492 863

May 16 '07 #242

rurpy

On May 15, 3:28 pm, René Fleschenberg <r...@korteklippe.dewrote:

We all know what the PEP is about (we can read). The point is: If we do
not *need* non-English/ASCII identifiers, we do not need the PEP. If the
PEP does not solve an actual *problem* and still introduces some
potential for *new* problems, it should be rejected. So far, the
"problem" seems to just not exist. The burden of proof is on those who
support the PEP.

I'm not sure how you conclude that no problem exists.
- Meaningful identifiers are critical in creating good code.
- Non-english speakers can not create or understand
english identifiers hence can't create good code nor
easily grok existing code.
Considering the vastly greater number of non-English
spreakers in the world, who are not thus unable to use
Python effectively, seems like a problem to me.

That all programers know enough english to create and
understand english identifiers is currently speculation or
based on tiny personaly observed samples.

I will add my own personal observation supporting the
opposite. A Japanese programmer friend was working
on a project last fall for a large Japanese company in
Japan. A lot of their programming was outsourced to
Korea. While the liason people on both side communicated
in a mixture of English and Japanese my understanding
was the all most all the programmers spoke almost
no English. The language used was Java. I don't know
how they handled identifiers but I have no reason to
believe they were English (though they may have been
transliterated Japanese).

Now that too is a tiny personaly observered sample
so it carries no more weight than the others. But it
is enough to make me question the original assertion
thal all programmers know english.

It's a big world and there are a lot of people out there.
Drawing conclusions based on 5 or 50 or 500 personal
contacts is pretty risky, particularly when being wrong
means putting up major barriers to Python use for
huge numbers of people.

May 16 '07 #243

Terry Reedy

"Aldo Cortesi" <al**@nullcube.comwrote in message
news:20********************@nullcube.com...
| I must admit to a fascination with language myself - I even have a degree
in
| English literature to prove it! To be fair to Steven, I've asked some of
my
| colleagues here in Sydney about their reactions to the phrase "by eye",
and
| none of them have yet come up with anything that has the strong
pejorative
| taint Steven gave it. At any rate, it's clear that the phrase is not well
| defined anywhere (not even in the OED), and I'm sure there are
substantial
| regional variations in interpretation.

As a native American, yes, 'by eye' is sometimes, maybe even often used
with a perjorative intent.

| In cases like these, however, context is paramount, so I will quote
sentences
| that started this petty bickering:

However, in this context
|
| The security implications have not been sufficiently explored. I don't
want
| to be in a situation where I need to mechanically "clean" code (say,
from a
| submitted patch) with a tool because I can't reliably verify it by eye.

I read it just as Aldo claims .

| Surely, in context, the meaning is clear? "By eye" here means nothing
more nor
| less than a literal reading suggests. Taking these sentences to be an
argument
| for a slip-shod, careless approach to code, as Steven did, is surely
perverse.

Perhaps because in this context, it is not at all clear what the 'more
exact' method would be.

Terry Jan Reedy

May 16 '07 #244

ru***@yahoo.com a écrit :

On May 15, 3:28 pm, René Fleschenberg <r...@korteklippe.dewrote:
>We all know what the PEP is about (we can read). The point is: If we do
not *need* non-English/ASCII identifiers, we do not need the PEP. If the
PEP does not solve an actual *problem* and still introduces some
potential for *new* problems, it should be rejected. So far, the
"problem" seems to just not exist. The burden of proof is on those who
support the PEP.

it *does* solve a huge problem: i have to use degenerate french, with
orthographic mistakes, or select in a small subset of words to use
only ascii. I'm limited in my expression, and I ressent this
everyday!

This is true, even if commercial french programmers don't object
the pep because they have to use english in their own work. This
is something i really cannot understand.

it's a problem of everyday, for million people!

and yes sometimes i publish code (rarely), even if it uses french
identifiers, because someone looking after a real solution *does*
prefer an existing solution than nothing.
--
Pierre

May 16 '07 #245

Steven D'Aprano schrieb:

But they aren't new risks and problems, that's the point. So far, every
single objection raised ALREADY EXISTS in some form or another.

No. The problem "The traceback shows function names having characters
that do not display on most systems' screens" for example does not exist
today, to the best of my knowledge. And "in some form or another"
basically means that the PEP would create more possibilities for things
to go wrong. That things can already go wrong today does not mean that
it does not matter if we create more occasions were things can go wrong
even worse.

There's
all this hysteria about the problems the proposed change will cause, but
those problems already exist. When was the last time a Black Hat tried to
smuggle in bad code by changing an identifier from xyz0 to xyzO?

Agreed, I don't think intended malicious use of the proposed feature
would be a big problem.

>I think it is not. I think that the problem only really applies to very
isolated use-cases.

Like the 5.5 billion people who speak no English.

No. The X people who speak "no English" and program in Python. I think X
actually is very low (close to zero), because programming in Python
virtually does require you to know some English, wether you can use
non-ASCII characters in identifiers or not. It is naive to believe that
you can program in Python without understanding any English once you can
use your native characters in identifiers. That will not happen. Please
understand that: You basically *must* know some English to program in
Python, and the reason for that is not that you cannot use non-ASCII
identifiers.

I admit that there may be occasions where you have domain-specific terms
that are hard to translate into English for a programmer. But is it
really not feasible to use an ASCII transliteration in these cases? This
does not seem to have been such a big problem so far, or else we would
have seen more discussions about it, I think.

>So isolated that they do not justify a change to
mainline Python. If someone thinks that non-ASCII identifiers are really
needed, he could maintain a special Python branch that supports them. I
doubt that there would be alot of demand for it.

Maybe so. But I guarantee with a shadow of a doubt that if the change
were introduced, people would use it -- even if right now they say they
don't want it.

May 16 '07 #246

Steven D'Aprano schrieb:

>Any program that uses non-English identifiers in Python is bound to
become gibberish, since it *will* be cluttered with English identifiers
all over the place anyway, wether you like it or not.

It won't be gibberish to the people who speak the language.

May 16 '07 #247

Gregor Horvath schrieb:

If comments are allowed to be none English, then why are identifier not?

I don't need to be able to type in the exact characters of a comment in
order to properly change the code, and if a comment does not display on
my screen correctly, I am not as fscked as badly as when an identifier
does not display (e.g. in a traceback).

--
RenÃ©

May 16 '07 #248

Gregor Horvath

today, to the best of my knowledge. And "in some form or another"
basically means that the PEP would create more possibilities for things
to go wrong. That things can already go wrong today does not mean that
it does not matter if we create more occasions were things can go wrong
even worse.

Following this logic we should not add any new features at all, because
all of them can go wrong and can be used the wrong way.

I love Python because it does not dictate how to do things.
I do not need a ASCII-Dictator, I can judge myself when to use this
feature and when to avoid it, like any other feature.

Gregor

May 16 '07 #249