473,387 Members | 1,440 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

PEP 3131: Supporting Non-ASCII Identifiers

PEP 1 specifies that PEP authors need to collect feedback from the
community. As the author of PEP 3131, I'd like to encourage comments
to the PEP included below, either here (comp.lang.python), or to
py*********@python.org

In summary, this PEP proposes to allow non-ASCII letters as
identifiers in Python. If the PEP is accepted, the following
identifiers would also become valid as class, function, or
variable names: Löffelstiel, changé, ошибка, or 売りå*´
(hoping that the latter one means "counter").

I believe this PEP differs from other Py3k PEPs in that it really
requires feedback from people with different cultural background
to evaluate it fully - most other PEPs are culture-neutral.

So, please provide feedback, e.g. perhaps by answering these
questions:
- should non-ASCII identifiers be supported? why?
- would you use them if it was possible to do so? in what cases?

Regards,
Martin
PEP: 3131
Title: Supporting Non-ASCII Identifiers
Version: $Revision: 55059 $
Last-Modified: $Date: 2007-05-01 22:34:25 +0200 (Di, 01 Mai 2007) $
Author: Martin v. Löwis <ma****@v.loewis.de>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 1-May-2007
Python-Version: 3.0
Post-History:
Abstract
========

This PEP suggests to support non-ASCII letters (such as accented
characters, Cyrillic, Greek, Kanji, etc.) in Python identifiers.

Rationale
=========

Python code is written by many people in the world who are not familiar
with the English language, or even well-acquainted with the Latin
writing system. Such developers often desire to define classes and
functions with names in their native languages, rather than having to
come up with an (often incorrect) English translation of the concept
they want to name.

For some languages, common transliteration systems exist (in particular,
for the Latin-based writing systems). For other languages, users have
larger difficulties to use Latin to write their native words.

Common Objections
=================

Some objections are often raised against proposals similar to this one.

People claim that they will not be able to use a library if to do so
they have to use characters they cannot type on their keyboards.
However, it is the choice of the designer of the library to decide on
various constraints for using the library: people may not be able to use
the library because they cannot get physical access to the source code
(because it is not published), or because licensing prohibits usage, or
because the documentation is in a language they cannot understand. A
developer wishing to make a library widely available needs to make a
number of explicit choices (such as publication, licensing, language
of documentation, and language of identifiers). It should always be the
choice of the author to make these decisions - not the choice of the
language designers.

In particular, projects wishing to have wide usage probably might want
to establish a policy that all identifiers, comments, and documentation
is written in English (see the GNU coding style guide for an example of
such a policy). Restricting the language to ASCII-only identifiers does
not enforce comments and documentation to be English, or the identifiers
actually to be English words, so an additional policy is necessary,
anyway.

Specification of Language Changes
=================================

The syntax of identifiers in Python will be based on the Unicode
standard annex UAX-31 [1]_, with elaboration and changes as defined
below.

Within the ASCII range (U+0001..U+007F), the valid characters for
identifiers are the same as in Python 2.5. This specification only
introduces additional characters from outside the ASCII range. For
other characters, the classification uses the version of the Unicode
Character Database as included in the ``unicodedata`` module.

The identifier syntax is ``<ID_Start<ID_Continue>*``.

``ID_Start`` is defined as all characters having one of the general
categories uppercase letters (Lu), lowercase letters (Ll), titlecase
letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers
(Nl), plus the underscore (XXX what are "stability extensions" listed in
UAX 31).

``ID_Continue`` is defined as all characters in ``ID_Start``, plus
nonspacing marks (Mn), spacing combining marks (Mc), decimal number
(Nd), and connector punctuations (Pc).

All identifiers are converted into the normal form NFC while parsing;
comparison of identifiers is based on NFC.

Policy Specification
====================

As an addition to the Python Coding style, the following policy is
prescribed: All identifiers in the Python standard library MUST use
ASCII-only identifiers, and SHOULD use English words wherever feasible.

As an option, this specification can be applied to Python 2.x. In that
case, ASCII-only identifiers would continue to be represented as byte
string objects in namespace dictionaries; identifiers with non-ASCII
characters would be represented as Unicode strings.

Implementation
==============

The following changes will need to be made to the parser:

1. If a non-ASCII character is found in the UTF-8 representation of the
source code, a forward scan is made to find the first ASCII
non-identifier character (e.g. a space or punctuation character)

2. The entire UTF-8 string is passed to a function to normalize the
string to NFC, and then verify that it follows the identifier syntax.
No such callout is made for pure-ASCII identifiers, which continue to
be parsed the way they are today.

3. If this specification is implemented for 2.x, reflective libraries
(such as pydoc) must be verified to continue to work when Unicode
strings appear in ``__dict__`` slots as keys.

References
==========

... [1] http://www.unicode.org/reports/tr31/
Copyright
=========

This document has been placed in the public domain.
May 13 '07
399 12587
In <f2**********@rumours.uwaterloo.ca>, Ross Ridge wrote:
<ma****@v.loewis.dewrote:
>>So, please provide feedback, e.g. perhaps by answering these
questions:
- should non-ASCII identifiers be supported? why?

Ross Ridge wrote:
>I think the biggest argument against this PEP is how little similar
features are used in other languages

Carsten Haese <ca*****@uniqsys.comwrote:
>>That observation is biased by your limited sample.

No. I've actually looked hard to find examples of source code that use
non-ASCII identifiers. While it's easy to find code where comments use
non-ASCII characters, I was never able to find a non-made up example
that used them in identifiers.
I think you have to search examples of ASCII sources with transliterated
identifiers too, because the authors may have skipped the transliteration
if they could have written the non-ASCII characters in the first place.

And then I dare to guess that much of that code is not open source. One
example are macros in office programs like spreadsheets. Often those are
written by semi professional programmers or even end users with
transliterated identifiers. If the OpenOffice API wouldn't be so
"javaesque" this would be a good use case for code with non-ASCII
identifiers.

Ciao,
Marc 'BlackJack' Rintsch
May 15 '07 #201
Nick Craig-Wood <ni**@craig-wood.comwrote:
>b) Unicode characters would creep into the public interface of public
libraries. I think this would be a step back for the homogeneous
nature of the python community.
One could decree that having a non-ASCII character in an identifier
would have the same meaning as a leading underscore 9-)

--
\S -- si***@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
"Frankly I have no feelings towards penguins one way or the other"
-- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump
May 15 '07 #202
On 15 May, 17:41, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
>
[javac -encoding Latin-1 Hallo.java]
From a Python perspective, I would rather call this behaviour broken. Do I
really have to pass the encoding as a command line option to the compiler?
They presumably weighed up the alternatives and decided that the most
convenient approach (albeit for the developers of Java) was to provide
such a compiler option. Meanwhile, developers get to write their
identifiers in the magic platform encoding, which isn't generally a
great idea but probably works well enough for some people - their
editor lets them write their programs in some writing system and the
Java compiler happens to choose the same writing system when reading
the file - although I wouldn't want to rely on such things myself.
Alternatively, they can do what Python programmers do now and specify
the encoding, albeit on the command line.

However, what I want to see is how people deal with such issues when
sharing their code: what are their experiences and what measures do
they mandate to make it all work properly? You can see some
discussions about various IDEs mandating UTF-8 as the default
encoding, along with UTF-8 being the required encoding for various
kinds of special Java configuration files. Is this because
heterogeneous technical environments even within the same cultural
environment cause too many problems?
I find Python's source encoding much cleaner here, and even more so when the
default encoding becomes UTF-8.
Yes, it should reduce confusion at a technical level. But what about
the tools, the editors, and so on? If every computing environment had
decent UTF-8 support, wouldn't it be easier to say that everything has
to be in UTF-8? Perhaps the developers of Java decided that the rules
should be deliberately vague to accommodate people who don't want to
think about encodings but still want to be able to use Windows Notepad
(or whatever) to write software in their own writing system.

And then, what about patterns of collaboration between groups who have
been able to exchange software with "localised" identifiers for a
number of years? Does it really happen, or do IBM's engineers in China
or India (for example) have to write everything strictly in ASCII? Do
people struggle with characters they don't understand or does copy/
paste work well enough when dealing with such code?

Paul

May 15 '07 #203
"René Fleschenberg" <rene@.deescribió en el mensaje
news:46***********************@newsspool1.arcor-online.net...
This is a very weak argument, IMHO. How do you want to use Python
without learning at least enough English to grasp a somewhat decent
understanding of the standard library?
By heart. I know a few _very good_ programmers
who are unable to understand an English text.
Knowing English helps, of course, but is not
required at all. Of course, they don't know how
to name identifiers in English, but it happens
they _cannot_ give them proper Spanish names,
either (I'm from Spain).

+1 for the PEP, definitely.
But having, for example, things like open() from the stdlib in your code
and then öffnen() as a name for functions/methods written by yourself is
just plain silly. It makes the code inconsistent and ugly without
significantly improving the readability for someone who speaks German
but not English.
Agreed. I always use English names (more or
less :-)), but this is not the PEP is about.

Javier
----------------------------------
http://www.texytipografia.com


May 15 '07 #204
Stefan Behnel <st******************@web.dewrote:
>I don't think all identifiers in the stdlib are
a) well chosen
b) correct English words
Never mind the standard library, by my count about 20% of keywords
and builtins (excluding exception types) are either not correct
English words ('elif', 'chr') or have some kind of mismatch between
their meaning and the usual English usage ('hex', 'intern').

The discussion on readability and natural language identifiers reminds
me of my first job in programming: looking after a pile of Fortran77
from the mid-80s. Case-insensitive, with different coders having
different preferences (sometimes within the same module), and using
more than four characters on an identifier considered shocking. Of
course you got identifiers which were unintelligable, and it wasn't
a great situation, but we coped and the whole thing didn't fall over
in a complete heap.

--
\S -- si***@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
"Frankly I have no feelings towards penguins one way or the other"
-- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump
May 15 '07 #205
Marc 'BlackJack' Rintsch <bj****@gmx.netwrote:
>I think you have to search examples of ASCII sources with transliterated
identifiers too, because the authors may have skipped the transliteration
if they could have written the non-ASCII characters in the first place.
The point of my search was to look for code that actually used non-ASCII
characters in languages that actually supported it (mainly Java at the
time). The point wasn't to create more speculation about what programmers
might or might not do, but to find out what they were actually doing.
>And then I dare to guess that much of that code is not open source.
Lots of non-open source code makes it on to the Internet in the form of
code snippets. You don't have to guess what closed-source are actually
doing either.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rr****@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //
May 15 '07 #206
Paul Boddie wrote:
Does it really happen, or do IBM's engineers in China
or India (for example) have to write everything strictly in ASCII?
I assume they simply use american keyboards. They're working for an american
company after all. That's like the call-center people in India who learn the
superball results by heart before they go to their cubicle, just to keep
north-american callers from realising they are not connected to the shop
around the corner.

Stefan
May 15 '07 #207
On Tue, 2007-05-15 at 18:18 +0200, René Fleschenberg wrote:
Carsten Haese schrieb:
Allowing people to use identifiers in their native language would
definitely be an advantage for people from such cultures. That's the use
case for this PEP. It's easy for Euro-centric people to say "just suck
it up and use ASCII", but the same people would probably starve to death
if they were suddenly teleported from Somewhere In Europe to rural China
which is so unimaginably different from what they know that it might
just as well be a different planet. "Learn English and use ASCII" is not
generally feasible advice in such cultures.

This is a very weak argument, IMHO. How do you want to use Python
without learning at least enough English to grasp a somewhat decent
understanding of the standard library? Let's face it: To do any "real"
programming, you need to know at least some English today, and I don't
see that changing anytime soon. And it is definitely not going to be
changed by allowing non-ASCII identifiers.
Even if it were impossible to do "real programming" in Python without
knowing English (which I will neither accept nor reject because I don't
have enough data either way), I don't think Python should be restricted
to "real" programming only. Python (the programming language) is an
inherently easy-to-learn language. I find it quite plausible that
somebody in China might want to teach their students programming before
teaching them English. The posts on this thread by a teacher from China
confirm this suspicion.

Once the students learn Python and realize that there are lots of Python
resources "out there" that are only in English, that will be a
motivation for them to learn English. Requiring all potential Python
programmers to learn English first (or assuming that they know English
already) is an unacceptable barrier of entry.

--
Carsten Haese
http://informixdb.sourceforge.net
May 15 '07 #208
Donn Cave <do**@u.washington.eduwrote:
[Spanish in Brazil? Not as much as you might think.]
Sorry temporary[*] brain failure, I really do know it is Portugese.
[*] I hope.
May 15 '07 #209
There are really two issues here, and they're being
confused.

One is allowing non-English identifiers, which is a political
issuer. The other is homoglyphs, two characters which look the same.
The latter is a real problem in a language like Python with implicit
declarations. If a maintenance programmer sees a variable name
and retypes it, they may silently create a new variable.

If Unicode characters are allowed, they must be done under some
profile restrictive enough to prohibit homoglyphs. I'm not sure
if UTS-39, profile 2, "Highly Restrictive", solves this problem,
but it's a step in the right direction. This limits mixing of scripts
in a single identifier; you can't mix Hebrew and ASCII, for example,
which prevents problems with mixing right to left and left to right
scripts. Domain names have similar restrictions.

We have to have visually unique identifiers.

There's also an issue with implementations that interface
with other languages. Some Python implementations generate
C, Java, or LISP code. Even CPython will call C code.
The representation of external symbols needs to be standardized
across those interfaces.

John Nagle
May 15 '07 #210
John Nagle wrote:
There are really two issues here, and they're being
confused.

One is allowing non-English identifiers, which is a political
issuer. The other is homoglyphs, two characters which look the same.
The latter is a real problem in a language like Python with implicit
declarations. If a maintenance programmer sees a variable name
and retypes it, they may silently create a new variable.

If Unicode characters are allowed, they must be done under some
profile restrictive enough to prohibit homoglyphs. I'm not sure
if UTS-39, profile 2, "Highly Restrictive", solves this problem,
but it's a step in the right direction. This limits mixing of scripts
in a single identifier; you can't mix Hebrew and ASCII, for example,
which prevents problems with mixing right to left and left to right
scripts. Domain names have similar restrictions.

We have to have visually unique identifiers.
As others stated before, this is unlikely to become a problem in practice.
Project-internal standards will usually define a specific language for a
project, in which case these issues will not arise. In general, programmers
from a specific language/script background will stick to that script and not
magically start typing foreign characters. And projects where multiple
languages are involved will have to define a target language anyway, most
likely (although not necessarily) English.

Note that adherence to a specific script can easily checked programmatically
through Unicode ranges - if the need ever arises.

Stefan
May 15 '07 #211
* Duncan Booth (15 May 2007 17:30:58 GMT)
Donn Cave <do**@u.washington.eduwrote:
[Spanish in Brazil? Not as much as you might think.]

Sorry temporary[*] brain failure, I really do know it is Portugese.
Yes, you do. Spanish is what's been used in the United States, right?

Thorsten
May 15 '07 #212
René Fleschenberg a écrit :
IMO, the burden of proof is on you. If this PEP has the potential to
introduce another hindrance for code-sharing, the supporters of this PEP
should be required to provide a "damn good reason" for doing so. So far,
you have failed to do that, in my opinion. All you have presented are
vague notions of rare and isolated use-cases.
you want to limit my liberty of using appealing names in my language.

this alone should be enough to accept the pep!
May 15 '07 #213
René Fleschenberg a écrit :
Your example does not prove much. The fact that some people use
non-ASCII identifiers when they can does not at all prove that it would
be a serious problem for them if they could not.
i have to make orthograph mistakes in my code to please you?
--
Pierre
May 15 '07 #214
hello

i work for a large phone maker, and for a long time
we thought, very arrogantly, our phones would be ok
for the whole world.

After all, using a phone uses so little words, and
some of them where even replaced with pictograms!
every body should be able to understand appel, bis,
renvoi, mévo, ...

nowdays we make chinese, corean, japanese talking
phones.

because we can do it, because graphics are cheaper
than they were, because it augments our market.
(also because some markets require it)

see the analogy?

of course, +1 for the pep
--
Pierre
May 15 '07 #215

"Hendrik van Rooyen" <ma**@microcorp.co.zawrote in message
news:ma***************************************@pyt hon.org...
<ru***@yahoo.comwrote:
[I fixed the broken attribution in your quote]
(2) Several posters have claimed non-native english speaker
status to bolster their position, but since they are clearly at
or near native-speaker levels of fluency, that english is not
their native language is really irrelevant.

I dispute the irrelevance strongly - I am one of the group referred
to, and I am here on this group because it works for me - I am not
aware of an Afrikaans python group - but even if one were to
exist - who, aside from myself, would frequent it? - would I have
access to the likes of the effbot, Steve Holden, Alex Martelli,
Irmen de Jongh, Eric Brunel, Tim Golden, John Machin, Martin
v Loewis, the timbot and the Nicks, the Pauls and other Stevens?
I didn't say that your (as a fluent but non-native English
speaker) views are irrelevant, only that when you say,
"I am a native speaker of Afrikaans and I don't want non-
ascii identifiers" it shouldn't carry any more weight
that if I (as a native English speaker) say the same
thing. (But I wouldn't of course :-).

My point was that this entire discussion is by English
speakers and that a consesious by such a group, that
non-english identfiers are bad, is neither surprising nor
legitimate.
- I somehow doubt it.

Fragmenting this resource into little national groups based
on language would be silly, if not downright stupid, and it seems
to me just as silly to allow native identifiers without also
allowing native reserved words, because you are just creating
a mess that is neither fish nor flesh if you do.
It already is fragmented. There is a Japanese Python users
group, complete with discussion forums, all in Japanese, not
English. Another poster said he was going to bring up this
issue on a French language discussion group.

How can you possibly propose that some authority should
decide what language a group of people should use to
discuss a common interest?!
And the downside to going the whole hog would be as follows:

Nobody would even want to look at my code if I write
"terwyl" instead of 'while', and "werknemer" instead of
"employee" - so where am I going to get help, and how,
once I am fully Python fit, can I contribute if I insist on
writing in a splinter language?
First "while" is a keyword and will remain "while" so
that has nothing to do with anything.

If nobody want to look at your code, it is not
the use of "werknemer" that is the cause. If you used
that as an identifier that I assume you decided
your code was exclusively of interest to Afrikaans
speakers. Otherwise use you would have used English
for for that indentifier. The point is that *you*
are in the best position to decide that, not the
designers of the language.
And while the Mandarin language group could be big enough
to be self sustaining, is that true of for example Finnish?

So I don't think my opinion on this is irrelevant just because
I miss spent my youth reading books by Pelham Grenfell
Wodehouse, amongst others.

And I also don't regard my own position as particularly unique
amongst python programmers that don't speak English as
their native language
Like I said, that English is not your native language is
irrelevant -- what matters is that you now speak English
fluently. Thus you are an English speaker argueing that
excluding non-english identifiers is not a problem.
May 15 '07 #216
Hi!

Yes, for legibility.

If letters with accents are possible:

d=dict(numéro=1234, name='Löwis', prénom='Martin', téléphone='+33123')

or

p1 = personn() #class
p1.numéro = 1234
p1.name='Löwis'
p1.prénom='Martin'
p1.téléphone='+33123'

Imagine the same code, is accents are not possible...

Don't forget: we must often be connected to databases who already
exists


--
@-salutations

Michel Claveau
May 15 '07 #217
Javier Bezos schrieb:
>But having, for example, things like open() from the stdlib in your code
and then öffnen() as a name for functions/methods written by yourself is
just plain silly. It makes the code inconsistent and ugly without
significantly improving the readability for someone who speaks German
but not English.

Agreed. I always use English names (more or
less :-)), but this is not the PEP is about.
We all know what the PEP is about (we can read). The point is: If we do
not *need* non-English/ASCII identifiers, we do not need the PEP. If the
PEP does not solve an actual *problem* and still introduces some
potential for *new* problems, it should be rejected. So far, the
"problem" seems to just not exist. The burden of proof is on those who
support the PEP.

--
René
May 15 '07 #218
On May 15, 7:44 am, George Sakkis <george.sak...@gmail.comwrote:
After 175 replies (and counting), the only thing that is clear is the
controversy around this PEP. Most people are very strong for or
against it, with little middle ground in between. I'm not saying that
every change must meet 100% acceptance, but here there is definitely a
strong opposition to it. Accepting this PEP would upset lots of people
as it seems, and it's interesting that quite a few are not even native
english speakers.
As I pointed out in a previous post,
http://groups.google.com/group/comp....9d327a6?hl=en&
http://groups.google.com/group/comp....25623c4?hl=en&
whether a person is or is not a native English speaker is
irrelevant -- what is relevant is their current ability with
English.
And my impression is that neally all of posts from people not
fluent in English (judging from grammar mistakes and such)
are in favor of the PEP.

May 15 '07 #219
On May 15, 11:08 am, Carsten Haese <cars...@uniqsys.comwrote:
snip
Once the students learn Python and realize that there are lots of Python
resources "out there" that are only in English, that will be a
motivation for them to learn English. Requiring all potential Python
programmers to learn English first (or assuming that they know English
already) is an unacceptable barrier of entry.
One the big concerns seems to be a hypothesized
negative impact on code sharing. Nobody has considered
the positive impact resulting from making Python
more accessible to non-English speakers, some of
whom will go on to become wiling and able to contribute
open "English python" code to the community. This
positive impact may well outweigh the negative.
May 15 '07 #220
On May 15, 10:18 am, René Fleschenberg <r...@korteklippe.dewrote:
Carsten Haese schrieb:
Allowing people to use identifiers in their native language would
definitely be an advantage for people from such cultures. That's the use
case for this PEP. It's easy for Euro-centric people to say "just suck
it up and use ASCII", but the same people would probably starve to death
if they were suddenly teleported from Somewhere In Europe to rural China
which is so unimaginably different from what they know that it might
just as well be a different planet. "Learn English and use ASCII" is not
generally feasible advice in such cultures.

This is a very weak argument, IMHO. How do you want to use Python
without learning at least enough English to grasp a somewhat decent
understanding of the standard library? Let's face it: To do any "real"
programming, you need to know at least some English today, and I don't
see that changing anytime soon. And it is definitely not going to be
changed by allowing non-ASCII identifiers.
snip

Another way of framing this discussion could be, "should
Python continue to maintain a barrier to it's use by non-English
speakers if it is not necessary?"

Virtually every guide to programming style I have ever read stresses
the importance of variable naming. For example, the Wikipedia article
"programming style" mentions variable naming right after layout
(indentation, etc) in importance:

"Appropriate choices for variable names are seen as the keystone
for good style. Poorly-named variables make code harder to read
and understand"

Even when English-as-non-native-language speakers can understand
English words, the level and speed of compression is often far below
that of their native language. Denying the ability to use native
language
identifiers puts these people at a significant disadvantage compared
to English speakers with regard to reading (their own!) code.
And the justification for this is the hypothetical case that someone
who doesn't understand that language *might* *someday* have to
read it. Besides the large number of programs that will never be
public (far larger than most of the worriers think is my guess), even
in public programs this is not necessarily a disaster. A public
application
written in "Chinese Python" might work perfectly and be completely
usable by me, even if it is difficult for me to understand. And why
should my difficulty count for more than a Chinese person's
difficultly
in understanding my "English Python" application?

That Python keywords are English is unimportant -- they are a small
finite set that can be memorized. Identifiers are a large unbounded
set that can't be.

That the standard library code and documentation is in English
is irrelevant. One shouldn't need to read the standard library code
to use it. (That one sometimes has to is a Python flaw that should
be fixed -- not bandaided by requiring Python programmers to
know English).

There is no need to understand english to use the standard library.
Documentation has and will (as Python becomes more popular) be
translated into native languages. Here is Python standard library
documentation in Japanese:

http://www.python.jp/doc/release/lib/

While encouraging English in shared/public code is fine,
trying by enforce it by continuing to enforce ascii-only identifiers
smacks to me of a "whites only country club" mentality.

Making Python more accessible to the world (the vast majority
of whom do not speak English) can only advance Python.

May 15 '07 #221
René Fleschenberg wrote:
We all know what the PEP is about (we can read).
BTW: who is this "we" if it doesn't include you?

Stefan
May 15 '07 #222
René Fleschenberg wrote:
Javier Bezos schrieb:
>>But having, for example, things like open() from the stdlib in your code
and then öffnen() as a name for functions/methods written by yourself is
just plain silly. It makes the code inconsistent and ugly without
significantly improving the readability for someone who speaks German
but not English.
Agreed. I always use English names (more or
less :-)), but this is not the PEP is about.

We all know what the PEP is about (we can read). The point is: If we do
not *need* non-English/ASCII identifiers, we do not need the PEP. If the
PEP does not solve an actual *problem* and still introduces some
potential for *new* problems, it should be rejected. So far, the
"problem" seems to just not exist. The burden of proof is on those who
support the PEP.
The main problem here seems to be proving the need of something to people who
do not need it themselves. So, if a simple "but I need it because a, b, c" is
not enough, what good is any further prove?

Stefan
May 15 '07 #223
On May 15, 6:44 pm, John Nagle <n...@animats.comwrote:
There are really two issues here, and they're being
confused.

One is allowing non-English identifiers, which is a political
issuer. The other is homoglyphs, two characters which look the same.
The latter is a real problem in a language like Python with implicit
declarations. If a maintenance programmer sees a variable name
and retypes it, they may silently create a new variable.

If Unicode characters are allowed, they must be done under some
profile restrictive enough to prohibit homoglyphs. I'm not sure
if UTS-39, profile 2, "Highly Restrictive", solves this problem,
but it's a step in the right direction. This limits mixing of scripts
in a single identifier; you can't mix Hebrew and ASCII, for example,
which prevents problems with mixing right to left and left to right
scripts. Domain names have similar restrictions.

We have to have visually unique identifiers.

There's also an issue with implementations that interface
with other languages. Some Python implementations generate
C, Java, or LISP code. Even CPython will call C code.
The representation of external symbols needs to be standardized
across those interfaces.
Surely it should be possible programmatically to compare the visual
appearance of the characters and highlight ones which are similar, or
colour-code various subsets when required.

May 15 '07 #224
On Tue, 15 May 2007 12:17:09 +0200, René Fleschenberg wrote:
Steven D'Aprano schrieb:
>How is that different from misreading "disk_burnt = True" as "disk_bumt
= True"? In the right (or perhaps wrong) font, like the ever-popular
Arial, the two can be visually indistinguishable. Or "call" versus
"cal1"?

That is the wrong question. The right question is: Why do you want to
introduce *more* possibilities to do such mistakes? Does this PEP solve
an actual problem, and if so, is that problem big enough to be worth the
introduction of these new risks and problems?
But they aren't new risks and problems, that's the point. So far, every
single objection raised ALREADY EXISTS in some form or another. There's
all this hysteria about the problems the proposed change will cause, but
those problems already exist. When was the last time a Black Hat tried to
smuggle in bad code by changing an identifier from xyz0 to xyzO?
I think it is not. I think that the problem only really applies to very
isolated use-cases.
Like the 5.5 billion people who speak no English.

So isolated that they do not justify a change to
mainline Python. If someone thinks that non-ASCII identifiers are really
needed, he could maintain a special Python branch that supports them. I
doubt that there would be alot of demand for it.
Maybe so. But I guarantee with a shadow of a doubt that if the change
were introduced, people would use it -- even if right now they say they
don't want it.

--
Steven.
May 16 '07 #225
Thorsten Kampe <th******@thorstenkampe.dewrote:
>* René Fleschenberg (Tue, 15 May 2007 14:35:33 +0200)
>I am talking about the stdlib, not about the very few keywords Python
has. Are you going to re-write the standard library in your native
language so you can have a consistent use of natural language among your
code?
Why would I want to do that? It's not my code. Identifier names are
mine. If I use modules from standard library I use some "foreign
words". There's no problem in that.
It could even be an advantage. I sometimes find that I have to use a
'second best' name myself because I want to avoid the possible
confusion caused if I choose a name which has a well-known existing
use.

-M-

May 16 '07 #226
On Tue, 15 May 2007 09:09:30 +0200, Eric Brunel wrote:
Joke aside, this just means that I won't ever be able to program math in
ADA, because I have absolutely no idea on how to do a 'pi' character on
my keyboard.
Maybe you should find out then? Personal ignorance is never an excuse for
rejecting technology.
--
Steven
May 16 '07 #227
On Tue, 15 May 2007 12:01:57 +0200, René Fleschenberg wrote:
Marc 'BlackJack' Rintsch schrieb:
>You find it in the sources by the line number from the traceback and
the letters can be copy'n'pasted if you don't know how to input them
with your keymap or keyboard layout.

Typing them is not the only problem. They might not even *display*
correctly if you don't happen to use a font that supports them.
Then maybe you should catch up to the 21st century and install some fonts
and a modern editor.
--
Steven.
May 16 '07 #228
On Tue, 15 May 2007 20:43:31 +1000, Aldo Cortesi wrote:
Thus spake Steven D'Aprano (st****@REMOVE.THIS.cybersource.com.au):
>Me, I try to understand a patch by reading it. Call me
old-fashioned.

I concur, Aldo. Indeed, if I _can't_ be sure I understand a patch, I
don't accept it -- I ask the submitter to make it clearer.


Yes, but there is a huge gulf between what Aldo originally said he does
("visual inspection") and *reading and understanding the code*.

Let's set aside the fact that you're guilty of sloppy quoting here,
since the phrase "visual inspection" is yours, not mine.
Yes, my bad, I apologize, that was sloppy of me. What you actually said
was "I can't reliably verify it by eye".
Regardless,
your interpretation of my words is just plain dumb. My phrasing was
intended to draw attention to the fact that one needs to READ code in
order to understand it. You know - with one's eyes. VISUALLY. And VISUAL
INSPECTION of code becomes unreliable if this PEP passes.
Not withstanding my misquote, I find it ... amusing ... that after
hauling me over the coals for using the term "visual inspection", you're
not only using it, but shouting it.

Perhaps you aren't aware that doing something "by eye" is idiomatic
English for doing it quickly, roughly, imprecisely. It is the opposite of
taking the time and effort to do the job carefully and accurately. If you
measure something "by eye", you just look at it and take a guess.

So, as I said, if you're relying on VISUAL INSPECTION (your words _now_)
you're already vulnerable. Fortunately for you, you're not relying on
visual inspection, you are actually _reading_ and _comprehending_ the
code. That might even mean, in extreme cases, you sit down with pencil
and paper and sketch out the program flow to understand what it is doing.

Now that (I hope!) you understand why I said what I said, can we agree
that _understanding_ is critical to the process? If you don't understand
the code, you don't accept it. If somebody submits a patch with
identifiers like a9472302 and a 9473202 you're going to reject it as too
difficult to understand.

How do non-ASCII identifiers change that situation? What will be
different?
>If I've understood Martin's post, the PEP states that identifiers are
converted to normal form. If two identifiers look the same, they will
be the same.

I'm sorry to have to tell you, but you understood Martin's post no
better than you did mine. There is no general way to detect homoglyphs
and "convert them to a normal form". Observe:

import unicodedata
print repr(unicodedata.normalize("NFC", u"\u2160")) print u"\u2160"
print "I"
Yes, I observe two very different glyphs, as different as the ASCII
characters I and |. What do you see?
So, a round 0 for reading comprehension this lesson, I'm afraid. Better
luck next time.
Ha ha, very funny.

So, let's summarize...

Non-ASCII identifiers are bad, because they are vulnerable to the exact
same problems as ASCII identifiers, only we're happy to live with those
problems if they are ASCII, and just install a font that makes I and l
look different, but we won't install a font that makes I and â…* look
different, because that's too hard.

Well, you've convinced me. Obviously expecting Python programmers to cope
with something as complicated as installing a decent set of fonts is such
a major huddle that people will abandon the language in droves, probably
taking up Haskel and Visual Basic and Lisp and all those other languages
that allow non-ASCII identifiers.


--
Steven.
May 16 '07 #229
On Tue, 15 May 2007 13:05:12 +0200, René Fleschenberg wrote:
Any program that uses non-English identifiers in Python is bound to
become gibberish, since it *will* be cluttered with English identifiers
all over the place anyway, wether you like it or not.
It won't be gibberish to the people who speak the language.
--
Steven.
May 16 '07 #230
On Tue, 15 May 2007 14:44:44 +0200, Anton Vredegoor wrote:
HYRY wrote:
>>- should non-ASCII identifiers be supported? why?
Yes. I want this for years. I am Chinese, and teaching some 12 years
old children learning programming. The biggest problem is we cannot use
Chinese words for the identifiers. As the program source becomes
longer, they always lost their thought about the program logic.

That is probably because they are just entering the developmental phase
of being able to use formal operational reasoning. I can understand that
they are looking for something to put the blame on but it is an error to
give in to the idea that it is hard for 12 year olds to learn a foreign
language. You realize that children learn new languages a lot faster
than adults?
Children soak up new languages between the ages of about one and four. By
12, they're virtually adults as far as learning new languages.

Again, it's probably not the language but the formal logic they have
problems with.
You have zero evidence for that, you're just applying your own
preconceptions and ignoring what HYRY has told you.

Please do *not* conclude that some child is not very good
at math or logic or programming when they are slow at first.
You're the one saying they're having problems with logic, not HYRY. He's
saying they are having problems with English.

--
Steven.
May 16 '07 #231
On Tue, 15 May 2007 11:58:35 +0200, René Fleschenberg wrote:

Unless you are 150% sure that there will *never* be the need for a
person who does not know your language of choice to be able to read or
modify your code, the language that "fits the environment best" is
English.
Just a touch of hyperbole perhaps?

You know, it may come to a surprise to some people that English is not
the only common language. In fact, it only ranks third, behind Mandarin
and Spanish, and just above Arabic. Although the exact number of speakers
vary according to the source you consult, the rankings are quite stable:
Mandarin, Spanish, then English. Any of those languages could equally
have claim to be the world's lingua franca.

And interestingly, with only one billion English speakers (as a first or
second language) in the world, and 5.5 billion people who don't speak
English, I think its probably fair to say that it is a small minority
that speak English.
--
Steven.
May 16 '07 #232
On Tue, 15 May 2007 10:44:37 -0700, John Nagle wrote:
We have to have visually unique identifiers.
Well, Python has existed for years without such a requirement, so I think
"have to" is too strong a term.

Compare:

thisisareallylongbutcompletelylegalidentiferandnot visuallyuniqueataglance

with

thisisareallylongbutcompletelylegalidentiferadnnot visuallyuniqueataglance

I imagine, decades ago, people arguing against the introduction of long
identifiers because of the risk that their projects will be flooded with
Black Hats trying to slip one over them by using the vulnerability cause
by really long identifiers. I can just see people banging away on their
keyboard, swearing black and blue that identifiers of more than four
characters are completely unnecessary (who needs more than 450,000
variables in a program?) and will just cause the End Of Programming As We
Know It.

rn = m = None
IIl0 = IlIO = None

I'm sure that the Python community has zero sympathy for anyone
suggesting that Python should _enforce_ rules like "don't use a single l
as an identifier", even if they have complete sympathy with anybody who
has such a rule in their own projects.
--
Steven.
May 16 '07 #233
Thus spake Steven D'Aprano (st****@REMOVE.THIS.cybersource.com.au):
Perhaps you aren't aware that doing something "by eye" is idiomatic
English for doing it quickly, roughly, imprecisely. It is the opposite of
taking the time and effort to do the job carefully and accurately. If you
measure something "by eye", you just look at it and take a guess.
Well, Steve, speaking as someone not entirely unfamiliar with idiomatic
English, I can say with some confidence that that's complete and utter bollocks
(idomatic usage for "nonsense", by the way). To do something "by eye" means
nothing more nor less than doing it visually. Unless you can provide a citation
to the contrary, please move on from this petty little point of yours, and try
to make a substantial technical argument instead.
So, as I said, if you're relying on VISUAL INSPECTION (your words _now_)
you're already vulnerable. Fortunately for you, you're not relying on
visual inspection, you are actually _reading_ and _comprehending_ the
code. That might even mean, in extreme cases, you sit down with pencil
and paper and sketch out the program flow to understand what it is doing.
Please, pick up a dictionary, and look up "visual" and "inspection", then
re-read my message. Ponder the fact that visual inspection is in fact a
necessary precursor to "reading" or "comprehending" code. Now, imagine reading
a piece of code where you can never be sure that a character is what it appears
to be...
If I've understood Martin's post, the PEP states that identifiers are
converted to normal form. If two identifiers look the same, they will
be the same.
I'm sorry to have to tell you, but you understood Martin's post no
better than you did mine. There is no general way to detect homoglyphs
and "convert them to a normal form". Observe:

import unicodedata
print repr(unicodedata.normalize("NFC", u"\u2160")) print u"\u2160"
print "I"

Yes, I observe two very different glyphs, as different as the ASCII
characters I and |. What do you see?
I recommend that you gain a basic understanding of the relationship between
Unicode code points and the glyphs on your screen before attempting to argue
this point again. The particular glyph your current font-set translates the
character into is irrelevant. Indeed, the fact that there is font variation
from client to client is one of the more obvious problems with your technically
illiterate hope that one could homogenize characters so that everything that
looks the same has the same meaning. Fiddle around with your fontsets a bit -
you only have to find one combination where the two glyps look the same to
prove my case...



Regards,

Aldo

--
Aldo Cortesi
al**@nullcube.com
http://www.nullcube.com
Mob: 0419 492 863
May 16 '07 #234
I've made various comments to other people's responses, so I guess it is
time to actually respond to the PEP itself.

On Sun, 13 May 2007 17:44:39 +0200, Martin v. Löwis wrote:
PEP 1 specifies that PEP authors need to collect feedback from the
community. As the author of PEP 3131, I'd like to encourage comments to
the PEP included below, either here (comp.lang.python), or to
py*********@python.org

In summary, this PEP proposes to allow non-ASCII letters as identifiers
in Python. If the PEP is accepted, the following identifiers would also
become valid as class, function, or variable names: Löffelstiel, changé,
ошибка, or 売りå*´ (hoping that the latter one means "counter").

I believe this PEP differs from other Py3k PEPs in that it really
requires feedback from people with different cultural background to
evaluate it fully - most other PEPs are culture-neutral.

So, please provide feedback, e.g. perhaps by answering these questions:
- should non-ASCII identifiers be supported? why? - would you use them
if it was possible to do so? in what cases?
It seems to me that none of the objections to non-ASCII identifiers are
particularly strong. I've heard many accusations that they will introduce
"vulnerabilities", by analogy to unicode attacks in URLs, but I haven't
seen any credible explanations of how these vulnerabilities would work,
or how they are any different to existing threats. That's not to say that
there isn't a credible threat, but if there is, nobody has come close to
explaining it.

I would find it useful to be able to use non-ASCII characters for heavily
mathematical programs. There would be a closer correspondence between the
code and the mathematical equations if one could write Δ(µ*π) instead of
delta(mu*pi).

(Aside: I wonder what the Numeric crowd would say about this?)


--
Steven.
May 16 '07 #235
René Fleschenberg schrieb:
>
We all know what the PEP is about (we can read). The point is: If we do
not *need* non-English/ASCII identifiers, we do not need the PEP. If the
PEP does not solve an actual *problem* and still introduces some
potential for *new* problems, it should be rejected. So far, the
"problem" seems to just not exist. The burden of proof is on those who
support the PEP.
A good product does not only react to problems but acts.

Solving current problems is only one thing. Great products are exploring
new ways, ideas and possibilities according to their underlying vision.

Python has a vision of being easy even for newbies to programming.
Making it easier for non native English speakers is a step forward in
this regard.

Gregor
May 16 '07 #236
Ross Ridge schrieb:
non-ASCII identifiers. While it's easy to find code where comments use
non-ASCII characters, I was never able to find a non-made up example
that used them in identifiers.
If comments are allowed to be none English, then why are identifier not?
This is inconsistent because there is a correlation between identifier
and comment.

The best identifier is one that needs no comment, because it
self-describes it's content. None English identifiers enhance the
meaning of identifiers for some projects. So why forbid them? We are all
adults.

Gregor
May 16 '07 #237
Aldo Cortesi <al**@nullcube.comwrote:
Thus spake Steven D'Aprano (st****@REMOVE.THIS.cybersource.com.au):
Perhaps you aren't aware that doing something "by eye" is idiomatic
English for doing it quickly, roughly, imprecisely. It is the opposite of
taking the time and effort to do the job carefully and accurately. If you
measure something "by eye", you just look at it and take a guess.

Well, Steve, speaking as someone not entirely unfamiliar with idiomatic
English, I can say with some confidence that that's complete and utter
bollocks (idomatic usage for "nonsense", by the way). To do something "by
eye" means nothing more nor less than doing it visually. Unless you can
provide a citation to the contrary, please move on from this petty little
point of yours, and try to make a substantial technical argument instead.
I can't find any reference for Steven's alleged idiomatic use of "by
eye", either -- _however_, my wife Anna (an American from Minnesota)
came up with exactly the same meaning when I asked her if "by eye" had
any idiomatic connotations, so I suspect it is indeed there, at least in
the Midwest. Funniest, of course, is that the literal translation into
Italian, "a occhio", has a similiar idiomatic meaning to _any_ native
speaker of Italian -- and THAT one is even in the Italian wikipedia!-)

I'll be the first to admit that this issue has nothing to do with the
substance of the argument (on which my wife, also my co-author of the
2nd ed of the Python Cookbook and a fellow PSF member, deeply agrees
with you, Aldo, and me), but natural language nuances and curios are my
third-from-the-top most consuming interest (after programming and...
Anna herself!-).

[[_Visual inspection_ plays a crucial role in many areas of engineering,
of course; for example, visual inspection of welds is a very reliable,
although costly, quality assurance process, particularly if you ensure
that the inspectors hold the top professional degrees from the American
Welding Society (if you're operating in the USA:-)]].
Alex
May 16 '07 #238
Steven D'Aprano wrote:
On Tue, 15 May 2007 12:01:57 +0200, Rene Fleschenberg wrote:
Marc 'BlackJack' Rintsch schrieb:
You find it in the sources by the line number from the traceback and
the letters can be copy'n'pasted if you don't know how to input them
with your keymap or keyboard layout.
Typing them is not the only problem. They might not even *display*
correctly if you don't happen to use a font that supports them.

Then maybe you should catch up to the 21st century and install some fonts
and a modern editor.
It's not just about fonts installed on my desktop. I still do a _lot_
of debugging/code browsing remotely over terminal connections. I
still often have to sit down at someone else's machine and help them
troubleshoot, often going through the stack trace for whatever package
they're using--and I don't have control over which fonts they decide
to install. Even simple high-bit latin1 characters differ on vanilla
Windows machines vs. vanilla Linux/Mac machines. I even sometimes
read code snippets on email lists and websites from my handheld, which
is sadly still memory-limited enough that I'm really unlikely to
install anything approaching a full set of Unicode fonts.

May 16 '07 #239
Steven D'Aprano wrote:
I've made various comments to other people's responses, so I guess it is
time to actually respond to the PEP itself.

On Sun, 13 May 2007 17:44:39 +0200, Martin v. Lo:wis wrote:
PEP 1 specifies that PEP authors need to collect feedback from the
community. As the author of PEP 3131, I'd like to encourage comments to
the PEP included below, either here (comp.lang.python), or to
py*********@python.org

In summary, this PEP proposes to allow non-ASCII letters as identifiers
in Python. If the PEP is accepted, the following identifiers would also
become valid as class, function, or variable names: Lo:ffelstiel, change,
oshibka, or ***ri*** (hoping that the latter one means "counter").

I believe this PEP differs from other Py3k PEPs in that it really
requires feedback from people with different cultural background to
evaluate it fully - most other PEPs are culture-neutral.

So, please provide feedback, e.g. perhaps by answering these questions:
- should non-ASCII identifiers be supported? why? - would you use them
if it was possible to do so? in what cases?

It seems to me that none of the objections to non-ASCII identifiers are
particularly strong. I've heard many accusations that they will introduce
"vulnerabilities", by analogy to unicode attacks in URLs, but I haven't
seen any credible explanations of how these vulnerabilities would work,
or how they are any different to existing threats. That's not to say that
there isn't a credible threat, but if there is, nobody has come close to
explaining it.

I would find it useful to be able to use non-ASCII characters for heavily
mathematical programs. There would be a closer correspondence between the
code and the mathematical equations if one could write D(u*p) instead of
delta(mu*pi).
Just as one risk here:
When reading the above on Google groups, it showed up as "if one could
write ?(u*p)..."
When quoting it for response, it showed up as "could write D(u*p)".

I'm sure that the symbol you used was neither a capital letter d nor a
question mark.

Using identifiers that are so prone to corruption when posting in a
rather popular forum seems dangerous to me--and I'd guess that a lot
of source code highlighters, email lists, etc have similar problems.
I'd even be surprised if some programming tools didn't have similar
problems.

May 16 '07 #240
On 2007-05-16, Alex Martelli <al***@mac.comwrote:
Aldo Cortesi <al**@nullcube.comwrote:
>Thus spake Steven D'Aprano (st****@REMOVE.THIS.cybersource.com.au):
Perhaps you aren't aware that doing something "by eye" is idiomatic
English for doing it quickly, roughly, imprecisely. It is the opposite of
taking the time and effort to do the job carefully and accurately. If you
measure something "by eye", you just look at it and take a guess.

Well, Steve, speaking as someone not entirely unfamiliar with idiomatic
English, I can say with some confidence that that's complete and utter
bollocks (idomatic usage for "nonsense", by the way). To do something "by
eye" means nothing more nor less than doing it visually. Unless you can
provide a citation to the contrary, please move on from this petty little
point of yours, and try to make a substantial technical argument instead.

I can't find any reference for Steven's alleged idiomatic use of "by
eye", either -- _however_, my wife Anna (an American from Minnesota)
came up with exactly the same meaning when I asked her if "by eye" had
any idiomatic connotations, so I suspect it is indeed there, at least in
the Midwest.
That's what it means to me (I'm also from the upper midwest).
One also hears the phrase "eyeball it" the the same context:
"You don't need to measure that, just eyeball it."

--
Grant Edwards grante Yow! BARBARA STANWYCK
at makes me nervous!!
visi.com
May 16 '07 #241

Thus spake Alex Martelli (al***@mac.com):
I can't find any reference for Steven's alleged idiomatic use of "by
eye", either -- _however_, my wife Anna (an American from Minnesota)
came up with exactly the same meaning when I asked her if "by eye" had
any idiomatic connotations, so I suspect it is indeed there, at least in
the Midwest. Funniest, of course, is that the literal translation into
Italian, "a occhio", has a similiar idiomatic meaning to _any_ native
speaker of Italian -- and THAT one is even in the Italian wikipedia!-)

I'll be the first to admit that this issue has nothing to do with the
substance of the argument (on which my wife, also my co-author of the
2nd ed of the Python Cookbook and a fellow PSF member, deeply agrees
with you, Aldo, and me), but natural language nuances and curios are my
third-from-the-top most consuming interest (after programming and...
Anna herself!-).
I must admit to a fascination with language myself - I even have a degree in
English literature to prove it! To be fair to Steven, I've asked some of my
colleagues here in Sydney about their reactions to the phrase "by eye", and
none of them have yet come up with anything that has the strong pejorative
taint Steven gave it. At any rate, it's clear that the phrase is not well
defined anywhere (not even in the OED), and I'm sure there are substantial
regional variations in interpretation.

In cases like these, however, context is paramount, so I will quote sentences
that started this petty bickering:
The security implications have not been sufficiently explored. I don't want
to be in a situation where I need to mechanically "clean" code (say, from a
submitted patch) with a tool because I can't reliably verify it by eye.
Surely, in context, the meaning is clear? "By eye" here means nothing more nor
less than a literal reading suggests. Taking these sentences to be an argument
for a slip-shod, careless approach to code, as Steven did, is surely perverse.

Regards,


Aldo


--
Aldo Cortesi
al**@nullcube.com
http://www.nullcube.com
Mob: 0419 492 863
May 16 '07 #242
On May 15, 3:28 pm, René Fleschenberg <r...@korteklippe.dewrote:
We all know what the PEP is about (we can read). The point is: If we do
not *need* non-English/ASCII identifiers, we do not need the PEP. If the
PEP does not solve an actual *problem* and still introduces some
potential for *new* problems, it should be rejected. So far, the
"problem" seems to just not exist. The burden of proof is on those who
support the PEP.
I'm not sure how you conclude that no problem exists.
- Meaningful identifiers are critical in creating good code.
- Non-english speakers can not create or understand
english identifiers hence can't create good code nor
easily grok existing code.
Considering the vastly greater number of non-English
spreakers in the world, who are not thus unable to use
Python effectively, seems like a problem to me.

That all programers know enough english to create and
understand english identifiers is currently speculation or
based on tiny personaly observed samples.

I will add my own personal observation supporting the
opposite. A Japanese programmer friend was working
on a project last fall for a large Japanese company in
Japan. A lot of their programming was outsourced to
Korea. While the liason people on both side communicated
in a mixture of English and Japanese my understanding
was the all most all the programmers spoke almost
no English. The language used was Java. I don't know
how they handled identifiers but I have no reason to
believe they were English (though they may have been
transliterated Japanese).

Now that too is a tiny personaly observered sample
so it carries no more weight than the others. But it
is enough to make me question the original assertion
thal all programmers know english.

It's a big world and there are a lot of people out there.
Drawing conclusions based on 5 or 50 or 500 personal
contacts is pretty risky, particularly when being wrong
means putting up major barriers to Python use for
huge numbers of people.

May 16 '07 #243

"Aldo Cortesi" <al**@nullcube.comwrote in message
news:20********************@nullcube.com...
| I must admit to a fascination with language myself - I even have a degree
in
| English literature to prove it! To be fair to Steven, I've asked some of
my
| colleagues here in Sydney about their reactions to the phrase "by eye",
and
| none of them have yet come up with anything that has the strong
pejorative
| taint Steven gave it. At any rate, it's clear that the phrase is not well
| defined anywhere (not even in the OED), and I'm sure there are
substantial
| regional variations in interpretation.

As a native American, yes, 'by eye' is sometimes, maybe even often used
with a perjorative intent.

| In cases like these, however, context is paramount, so I will quote
sentences
| that started this petty bickering:

However, in this context
|
| The security implications have not been sufficiently explored. I don't
want
| to be in a situation where I need to mechanically "clean" code (say,
from a
| submitted patch) with a tool because I can't reliably verify it by eye.

I read it just as Aldo claims .

| Surely, in context, the meaning is clear? "By eye" here means nothing
more nor
| less than a literal reading suggests. Taking these sentences to be an
argument
| for a slip-shod, careless approach to code, as Steven did, is surely
perverse.

Perhaps because in this context, it is not at all clear what the 'more
exact' method would be.

Terry Jan Reedy

May 16 '07 #244
ru***@yahoo.com a écrit :
On May 15, 3:28 pm, René Fleschenberg <r...@korteklippe.dewrote:
>We all know what the PEP is about (we can read). The point is: If we do
not *need* non-English/ASCII identifiers, we do not need the PEP. If the
PEP does not solve an actual *problem* and still introduces some
potential for *new* problems, it should be rejected. So far, the
"problem" seems to just not exist. The burden of proof is on those who
support the PEP.

it *does* solve a huge problem: i have to use degenerate french, with
orthographic mistakes, or select in a small subset of words to use
only ascii. I'm limited in my expression, and I ressent this
everyday!

This is true, even if commercial french programmers don't object
the pep because they have to use english in their own work. This
is something i really cannot understand.

it's a problem of everyday, for million people!

and yes sometimes i publish code (rarely), even if it uses french
identifiers, because someone looking after a real solution *does*
prefer an existing solution than nothing.
--
Pierre
May 16 '07 #245
Steven D'Aprano schrieb:
But they aren't new risks and problems, that's the point. So far, every
single objection raised ALREADY EXISTS in some form or another.
No. The problem "The traceback shows function names having characters
that do not display on most systems' screens" for example does not exist
today, to the best of my knowledge. And "in some form or another"
basically means that the PEP would create more possibilities for things
to go wrong. That things can already go wrong today does not mean that
it does not matter if we create more occasions were things can go wrong
even worse.
There's
all this hysteria about the problems the proposed change will cause, but
those problems already exist. When was the last time a Black Hat tried to
smuggle in bad code by changing an identifier from xyz0 to xyzO?
Agreed, I don't think intended malicious use of the proposed feature
would be a big problem.
>I think it is not. I think that the problem only really applies to very
isolated use-cases.

Like the 5.5 billion people who speak no English.
No. The X people who speak "no English" and program in Python. I think X
actually is very low (close to zero), because programming in Python
virtually does require you to know some English, wether you can use
non-ASCII characters in identifiers or not. It is naive to believe that
you can program in Python without understanding any English once you can
use your native characters in identifiers. That will not happen. Please
understand that: You basically *must* know some English to program in
Python, and the reason for that is not that you cannot use non-ASCII
identifiers.

I admit that there may be occasions where you have domain-specific terms
that are hard to translate into English for a programmer. But is it
really not feasible to use an ASCII transliteration in these cases? This
does not seem to have been such a big problem so far, or else we would
have seen more discussions about it, I think.
>So isolated that they do not justify a change to
mainline Python. If someone thinks that non-ASCII identifiers are really
needed, he could maintain a special Python branch that supports them. I
doubt that there would be alot of demand for it.

Maybe so. But I guarantee with a shadow of a doubt that if the change
were introduced, people would use it -- even if right now they say they
don't want it.
Well, that is exactly what I would like to avoid ;-)

--
René
May 16 '07 #246
Steven D'Aprano schrieb:
>Any program that uses non-English identifiers in Python is bound to
become gibberish, since it *will* be cluttered with English identifiers
all over the place anyway, wether you like it or not.

It won't be gibberish to the people who speak the language.
Hmmm, did you read my posting? By my experience, it will. I wonder: is
English an acquired language for you?

--
René
May 16 '07 #247
Gregor Horvath schrieb:
If comments are allowed to be none English, then why are identifier not?
I don't need to be able to type in the exact characters of a comment in
order to properly change the code, and if a comment does not display on
my screen correctly, I am not as fscked as badly as when an identifier
does not display (e.g. in a traceback).

--
René
May 16 '07 #248
René Fleschenberg schrieb:
today, to the best of my knowledge. And "in some form or another"
basically means that the PEP would create more possibilities for things
to go wrong. That things can already go wrong today does not mean that
it does not matter if we create more occasions were things can go wrong
even worse.
Following this logic we should not add any new features at all, because
all of them can go wrong and can be used the wrong way.

I love Python because it does not dictate how to do things.
I do not need a ASCII-Dictator, I can judge myself when to use this
feature and when to avoid it, like any other feature.

Gregor
May 16 '07 #249
ru***@yahoo.com schrieb:
I'm not sure how you conclude that no problem exists.
- Meaningful identifiers are critical in creating good code.
I agree.
- Non-english speakers can not create or understand
english identifiers hence can't create good code nor
easily grok existing code.
I agree that this is a problem, but please understand that is problem is
_not_ solved by allowing non-ASCII identifiers!
Considering the vastly greater number of non-English
spreakers in the world, who are not thus unable to use
Python effectively, seems like a problem to me.
Yes, but this problem is not really addressed by the PEP. If you want to
do something about this:
1) Translate documentation.
2) Create a way to internationalize the standard library (and possibly
the language keywords, too). Ideally, create a general standardized way
to internationalize code, possibly similiar to how people
internationalize strings today.

When that is done, non-ASCII identifiers could become useful. But of
course, doing that might create a hog of other problems.
That all programers know enough english to create and
understand english identifiers is currently speculation or
based on tiny personaly observed samples.
It is based on a look at the current Python environment. You do *at
least* have the problem that the standard library uses English names.
This assumes that there is documentation in the native language that is
good enough (i.e. almost as good as the official one), which I can tell
is not the case for German.

--
René
May 16 '07 #250

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Richie | last post by:
I went through the past six months or so of entries in c.l.javascript, and found a couple where people had expressed opinions about the value of supporting much older versions of Netscape and IE. ...
48
by: Nirvana | last post by:
How to make the font size constant in HTML code, so that in a web browser it remains fixed. For e.g in IE if you press CTRL and move mouse wheel front or back the font size changes, cheers
0
by: INGSOC | last post by:
Using remote debugging, I can attach to a windows service and run it in debug mode in VS.Net 2003. The problem is this service uses two supporting dlls. On the remote service, the dlls have...
12
by: Nick Hounsome | last post by:
Can anyone tell me what the rational is for not supporting optional arguments. It is obviously a trivial thing to implement and, since C++ has them, I would not expect them to be omitted without...
4
by: Ravi | last post by:
Hi, I want the list of browser which is not supporting Java Script. So far I am thinking only JavaScript is the standard scripting language supports in most the browser. Is any scripting language...
3
by: babyspring | last post by:
Hi All, I have encountered an annoying problem. I've read through all the post concerning this error. But yet, I still can't seem to solve the problem. When I run the program, it pops out this...
35
by: salad | last post by:
I have an application written in MS-Access. It is a complete application that manages the day-to-day operations of a business. The program is nearly ready to be used in other customer sites. ...
3
by: Phoe6 | last post by:
Hi, Am starting a new thread as I fear the old thread which more than a week old can go unnoticed. Sorry for the multiple mails. I took the approach of Subclassing ConfigParser to support...
3
by: =?Utf-8?B?U29hcHk=?= | last post by:
Hi: I heard from a friend that Microsoft will no longer support XP. There is a sign-up page for those people who still use it and would like MS to continue supporting it. I don't know if this...
0
by: Clive Dixon | last post by:
When working with lots of associated "supporting" classes alongside classes (by this, I mean things such as associated component editor classes specified by , debugger proxy classes specified by ...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.