PEP 3131: Supporting Non-ASCII Identifiers

Gregor Horvath schrieb:

RenÃ© Fleschenberg schrieb:

>today, to the best of my knowledge. And "in some form or another"
basically means that the PEP would create more possibilities for things
to go wrong. That things can already go wrong today does not mean that
it does not matter if we create more occasions were things can go wrong
even worse.

Following this logic we should not add any new features at all, because
all of them can go wrong and can be used the wrong way.

No, that does not follow from my logic. What I say is: When thinking
about wether to add a new feature, the potential benefits should be
weighed against the potential problems. I see some potential problems
with this PEP and very little potential benefits.

I love Python because it does not dictate how to do things.
I do not need a ASCII-Dictator, I can judge myself when to use this
feature and when to avoid it, like any other feature.

*That* logic can be used to justify the introduction of *any* feature.

--
RenÃ©

May 16 '07 #252

On Tue, 15 May 2007 17:35:11 +0200, Stefan Behnel
<st******************@web.dewrote:

Eric Brunel wrote:
>On Tue, 15 May 2007 15:57:32 +0200, Stefan Behnel
>>In-house developers are rather for this PEP as they see the advantage
of
expressing concepts in the way the "non-techies" talk about it.

No: I *am* an "in-house" developer. The argument is not
public/open-source against private/industrial. As I said in some of my
earlier posts, any code can pass through many people in its life, people
not having the same language. I dare to say that starting a project
today in any other language than english is almost irresponsible: the
chances that it will get at least read by people not talking the same
language as the original coders are very close to 100%, even if it
always stays "private".

Ok, so I'm an Open-Source guy who happens to work in-house. And I'm a
supporter of PEP 3131. I admit that I was simplifying in my round-up. :)

But I would say that "irresponsible" is a pretty self-centered word in
this
context. Can't you imagine that those who take the "irresponsible"
decisions
of working on (and starting) projects in "another language than English"
are
maybe as responsible as you are when you take the decision of starting a
project in English, but in a different context? It all depends on the
specific
constraints of the project, i.e. environment, developer skills, domain,
...

The more complex an application domain, the more important is clear and
correct domain terminology. And software developers just don't have
that. They
know their own domain (software development with all those concepts,
languages
and keywords), but there is a reason why they develop software for those
who
know the complex professional domain in detail but do not know how to
develop
software. And it's a good idea to name things in a way that is
consistent with
those who know the professional domain.

That's why keywords are taken from the domain of software development and
identifiers are taken (mostly) from the application domain. And that's
why I
support PEP 3131.

You keep eluding the question: even if the decisions made at the project
start seem quite sensible *at that time*, if the project ends up
maintained in Korea, you *will have* to translate all your identifiers to
something displayable, understandable and typable by (almost) anyone,
a.k.a ASCII-English... Since - as I already said - I'm quite convinced
that any application bigger than the average quick-n-dirty throwable
script is highly likely to end up in a different country than its original
coders', you'll end up losing the time you appeared to have gained in the
beginning. That's what I called "irresponsible" (even if I admit that the
word was a bit strong...).

Anyway, concerning the PEP, I've finally "put some water in my wine" as we
say in French, and I'm not so strongly against it now... Not for the
reasons you give (so we can continue our flame war on this ;-) ), but
mainly considering Python's usage in a learning context: this is a valid
reason why non-ASCII identifiers should be supported. I just wish I'll get
a '--ascii-only' switch on my Python interpreter (or any other means to
forbid non-ASCII identifiers and/or strings and/or comments).
--
python -c "print ''.join([chr(154 - ord(c)) for c in
'U(17zX(%,5.zmz5(17l8(%,5.Z*(93-965$l7+-'])"

May 16 '07 #253

RenÃ© Fleschenberg wrote:

ru***@yahoo.com schrieb:
>I'm not sure how you conclude that no problem exists.
- Meaningful identifiers are critical in creating good code.

I agree.

>- Non-english speakers can not create or understand
english identifiers hence can't create good code nor
easily grok existing code.

I agree that this is a problem, but please understand that is problem is
_not_ solved by allowing non-ASCII identifiers!

Well, as I said before, there are three major differences between the stdlib
and keywords on one hand and identifiers on the other hand. Ignoring arguments
does not make them any less true.

So, the problem is partly tackled by the people who face it by writing
degenerated transliterations and language mix in identifiers, but it would be
*solved* by means of the language if Unicode identifiers were available.

Stefan

May 16 '07 #254

Marc 'BlackJack' Rintsch

In <46**************@web.de>, Stefan Behnel wrote:

RenÃ© Fleschenberg wrote:
>We all know what the PEP is about (we can read). The point is: If we do
not *need* non-English/ASCII identifiers, we do not need the PEP. If the
PEP does not solve an actual *problem* and still introduces some
potential for *new* problems, it should be rejected. So far, the
"problem" seems to just not exist. The burden of proof is on those who
support the PEP.

The main problem here seems to be proving the need of something to people who
do not need it themselves. So, if a simple "but I need it because a, b, c" is
not enough, what good is any further prove?

Maybe all the (potential) programmers that can't understand english and
would benefit from the ability to use non-ASCII characters in identifiers
could step up and take part in this debate. In an english speaking
newsgroupâ€¦ =:o)

There are potential users of Python who don't know much english or no
english at all. This includes kids, old people, people from countries
that have "letters" that are not that easy to transliterate like european
languages, people who just want to learn Python for fun or to customize
their applications like office suites or GIS software with a Python
scripting option.

Some people here seem to think the user base is or should be only from the
computer science domain. Yes, if you are a programming professional it
may be mandatory to be able to write english identifiers, comments and
documentation, but there are not just programming professionals out there.

Ciao,
Marc 'BlackJack' Rintsch

May 16 '07 #255

RenÃ© Fleschenberg wrote:

Gregor Horvath schrieb:
>If comments are allowed to be none English, then why are identifier not?

I don't need to be able to type in the exact characters of a comment in
order to properly change the code, and if a comment does not display on
my screen correctly, I am not as fscked as badly as when an identifier
does not display (e.g. in a traceback).

Then get tools that match your working environment.

Stefan

May 16 '07 #256

gatti

Martin v. Lowis wrote:

Lorenzo Gatti wrote:
>Not providing an explicit listing of allowed characters is inexcusable
sloppiness.

That is a deliberate part of the specification. It is intentional that
it does *not* specify a precise list, but instead defers that list
to the version of the Unicode standard used (in the unicodedata
module).

Ok, maybe you considered listing characters but you earnestly decided
to follow an authority; but this reliance on the Unicode standard is
not a merit: it defers to an external entity (UAX 31 and the Unicode
database) a foundation of Python syntax.
The obvious purpose of Unicode Annex 31 is defining a framework for
parsing the identifiers of arbitrary programming languages, it's only,
in its own words, "specifications for recommended defaults for the use
of Unicode in the definitions of identifiers and in pattern-based
syntax". It suggests an orderly way to add tens of thousands of exotic
characters to programming language grammars, but it doesn't prove it
would be wise to do so.

You seem to like Unicode Annex 31, but keep in mind that:
- it has very limited resources (only the Unicode standard, i.e. lists
and properties of characters, and not sensible programming language
design, software design, etc.)
- it is culturally biased in favour of supporting as much of the
Unicode character set as possible, disregarding the practical
consequences and assuming without discussion that programming language
designers want to do so
- it is also culturally biased towards the typical Unicode patterns of
providing well explained general algorithms, ensuring forward
compatibility, and relying on existing Unicode standards (in this
case, character types) rather than introducing new data (but the
character list of Table 3 is unavoidable); the net result is caring
even less for actual usage.

>The XML standard is an example of how listings of large parts of the
Unicode character set can be provided clearly, exactly and (almost)
concisely.

And, indeed, this is now recognized as one of the bigger mistakes
of the XML recommendation: they provide an explicit list, and fail
to consider characters that are unassigned. In XML 1.1, they try
to address this issue, by now allowing unassigned characters in
XML names even though it's not certain yet what those characters
mean (until they are assigned).

XML 1.1 is, for practical purposes, not used except by mistake. I
challenge you to show me XML languages or documents of some importance
that need XML 1.1 because they use non-ASCII names.
XML 1.1 is supported by many tools and standards because of buzzword
compliance, enthusiastic obedience to the W3C and low cost of
implementation, but this doesn't mean that its features are an
improvement over XML 1.0.

>>``ID_Continue`` is defined as all characters in ``ID_Start``, plus
nonspacing marks (Mn), spacing combining marks (Mc), decimal number
(Nd), and connector punctuations (Pc).

Am I the first to notice how unsuitable these characters are?

Probably. Nobody in the Unicode consortium noticed, but what
do they know about suitability of Unicode characters...

Don't be silly. These characters are suitable for writing text, not
for use in identifiers; the fact that UAX 31 allows them merely proves
how disconnected from actual programming language needs that document
is.

In typical word processing, what characters are used is the editor's
problem and the only thing that matters is the correctness of the
printed result; program code is much more demanding, as it needs to do
more (exact comparisons, easy reading...) with less (straightforward
keyboard inputs and monospaced fonts instead of complex input systems
and WYSIWYG graphical text). The only way to work with program text
successfully is limiting its complexity.
Hard to input characters, hard to see characters, ambiguities and
uncertainty in the sequence of characters, sets of hard to distinguish
glyphs and similar problems are unacceptable.

It seems I'm not the first to notice a lot of Unicode characters that
are unsuitable for identifiers. Appendix I of the XML 1.1 standard
recommends to avoid variation selectors, interlinear annotations (I
missed them...), various decomposable characters, and "names which are
nonsensical, unpronounceable, hard to read, or easily confusable with
other names".
The whole appendix I is a clear admission of self-defeat, probably the
result of committee compromises. Do you think you could do better?

Regards,
Lorenzo Gatti

May 16 '07 #257

Christophe

sj*******@yahoo.com a Ã©crit :

Steven D'Aprano wrote:
>I would find it useful to be able to use non-ASCII characters for heavily
mathematical programs. There would be a closer correspondence between the
code and the mathematical equations if one could write D(u*p) instead of
delta(mu*pi).

Just as one risk here:
When reading the above on Google groups, it showed up as "if one could
write ?(u*p)..."
When quoting it for response, it showed up as "could write D(u*p)".

I'm sure that the symbol you used was neither a capital letter d nor a
question mark.

Using identifiers that are so prone to corruption when posting in a
rather popular forum seems dangerous to me--and I'd guess that a lot
of source code highlighters, email lists, etc have similar problems.
I'd even be surprised if some programming tools didn't have similar
problems.

So, it was google groups that continuously corrupted the good UTF-8
posts by force converting them to ISO-8859-1?

Of course, there's also the possibility that it is a problem on *your*
side so, to be fair I've launched google groups and looked for this
thread. And of course the result was that Steven's post displayed
perfectly. I didn't try to reply to it of course, no need to clutter
that thread anymore than it is.

--
Î”(Âµ*Ï€)

May 16 '07 #258

On Tue, 15 May 2007 21:07:30 +0200, Pierre Hanser
<ha****@club-internet.frwrote:

hello

i work for a large phone maker, and for a long time
we thought, very arrogantly, our phones would be ok
for the whole world.

After all, using a phone uses so little words, and
some of them where even replaced with pictograms!
every body should be able to understand appel, bis,
renvoi, mÃ©vo, ...

nowdays we make chinese, corean, japanese talking
phones.

because we can do it, because graphics are cheaper
than they were, because it augments our market.
(also because some markets require it)

see the analogy?

Absolutely not: you're talking about internationalization of the
user-interface here, not about the code. There are quite simple ways to
ensure users will see the displays in their own language, even if the
source code is the same for everyone. But your source code will not
automagically translate itself to the language of the guy who'll have to
maintain it or make it evolve. So the analogy actually seems to work
backwards: if you want any coder to be able to read/understand/edit your
code, just don't write it in your own language...
--
python -c "print ''.join([chr(154 - ord(c)) for c in
'U(17zX(%,5.zmz5(17l8(%,5.Z*(93-965$l7+-'])"

May 16 '07 #259

Stefan Behnel schrieb:

Then get tools that match your working environment.

Integration with existing tools *is* something that a PEP should
consider. This one does not do that sufficiently, IMO.

--
RenÃ©

May 16 '07 #260

sj*******@yahoo.com wrote:

I even sometimes
read code snippets on email lists and websites from my handheld, which
is sadly still memory-limited enough that I'm really unlikely to
install anything approaching a full set of Unicode fonts.

One of the arguments against this PEP was that it seemed to be impossible to
find either transliterated identifiers in code or native identifiers in Java
code using a web search. So it is very unlikely that you will need to upgrade
your handheld as it is very unlikely for you to stumble into such code.

Stefan

May 16 '07 #261

René Fleschenberg schrieb:

Gregor Horvath schrieb:
>René Fleschenberg schrieb:

>>today, to the best of my knowledge. And "in some form or another"
basically means that the PEP would create more possibilities for things
to go wrong. That things can already go wrong today does not mean that
it does not matter if we create more occasions were things can go wrong
even worse.
Following this logic we should not add any new features at all, because
all of them can go wrong and can be used the wrong way.

No, that does not follow from my logic. What I say is: When thinking
about wether to add a new feature, the potential benefits should be
weighed against the potential problems. I see some potential problems
with this PEP and very little potential benefits.

>I love Python because it does not dictate how to do things.
I do not need a ASCII-Dictator, I can judge myself when to use this
feature and when to avoid it, like any other feature.

*That* logic can be used to justify the introduction of *any* feature.

*Your* logic can be used to justify dropping *any* feature.

Stefan

May 16 '07 #262

Stefan Behnel schrieb:

>>- Non-english speakers can not create or understand
english identifiers hence can't create good code nor
easily grok existing code.
I agree that this is a problem, but please understand that is problem is
_not_ solved by allowing non-ASCII identifiers!

Well, as I said before, there are three major differences between the stdlib
and keywords on one hand and identifiers on the other hand. Ignoring arguments
does not make them any less true.

BTW: Please stop replying to my postings by E-Mail (in Thunderbird, use
"Reply" in stead of "Reply to all").

I agree that keywords are a different matter in many respects, but the
only difference between stdlib interfaces and other intefaces is that
the stdlib interfaces are part of the stdlib. That's it. You are still
ignoring the fact that, contrary to what has been suggested in this
thread, it is _not_ possible to write "German" or "Chinese" Python
without cluttering it up with many many English terms. It's not only the
stdlib, but also many many third party libraries. Show me one real
Python program that is feasibly written without throwing in tons of
English terms.

Now, very special environments (what I called "rare and isolated"
earlier) like special learning environments for children are a different
matter. It should be ok if you have to use a specially patched Python
branch there, or have to use an interpreter option that enables the
suggested behaviour. For general programming, it IMO is a bad idea.

--
RenÃ©

May 16 '07 #263

Marc 'BlackJack' Rintsch schrieb:

There are potential users of Python who don't know much english or no
english at all. This includes kids, old people, people from countries
that have "letters" that are not that easy to transliterate like european
languages, people who just want to learn Python for fun or to customize
their applications like office suites or GIS software with a Python
scripting option.

Make it an interpreter option that can be turned on for those cases.

--
RenÃ©

May 16 '07 #264

Eric Brunel wrote:

reason why non-ASCII identifiers should be supported. I just wish I'll
get a '--ascii-only' switch on my Python interpreter (or any other means
to forbid non-ASCII identifiers and/or strings and/or comments).

I could certainly live with that as it would be the right way around. Support
Unicode by default, but allow those who require the lowest common denominator
to enforce it.

Stefan

May 16 '07 #265

Stefan Behnel schrieb:

*Your* logic can be used to justify dropping *any* feature.

No. I am considering both the benefits and the problems. You just happen
to not like the outcome of my considerations [again, please don't reply
by E-Mail, I read the NG].

--
RenÃ©

May 16 '07 #266

On Wed, 16 May 2007 02:14:58 +0200, Steven D'Aprano
<st****@REMOVE.THIS.cybersource.com.auwrote:

On Tue, 15 May 2007 09:09:30 +0200, Eric Brunel wrote:

>Joke aside, this just means that I won't ever be able to program math in
ADA, because I have absolutely no idea on how to do a 'pi' character on
my keyboard.

Maybe you should find out then? Personal ignorance is never an excuse for
rejecting technology.

My "personal ignorance" is fine, thank you; how is yours?: there is no
keyboard *on Earth* allowing to type *all* characters in the whole Unicode
set. So my keyboard may just happen to provide no means at all to type a
greek 'pi', as it doesn't provide any to type Chinese, Japanese, Korean,
Russian, Hebrew, or whatever character set that is not in usage in my
country. And so are all keyboards all over the world.

Have I made my point clear or do you require some more explanations?
--
python -c "print ''.join([chr(154 - ord(c)) for c in
'U(17zX(%,5.zmz5(17l8(%,5.Z*(93-965$l7+-'])"

May 16 '07 #267

René Fleschenberg wrote:

Stefan Behnel schrieb:

>>>- Non-english speakers can not create or understand
english identifiers hence can't create good code nor
easily grok existing code.
I agree that this is a problem, but please understand that is problem is
_not_ solved by allowing non-ASCII identifiers!
Well, as I said before, there are three major differences between the stdlib
and keywords on one hand and identifiers on the other hand. Ignoring arguments
does not make them any less true.

I agree that keywords are a different matter in many respects, but the
only difference between stdlib interfaces and other intefaces is that
the stdlib interfaces are part of the stdlib. That's it. You are still
ignoring the fact that, contrary to what has been suggested in this
thread, it is _not_ possible to write "German" or "Chinese" Python
without cluttering it up with many many English terms. It's not only the
stdlib, but also many many third party libraries. Show me one real
Python program that is feasibly written without throwing in tons of
English terms.

Now, very special environments (what I called "rare and isolated"
earlier) like special learning environments for children are a different
matter. It should be ok if you have to use a specially patched Python
branch there, or have to use an interpreter option that enables the
suggested behaviour. For general programming, it IMO is a bad idea.

Ok, let me put it differently.

You *do not* design Python's keywords. You *do not* design the stdlib. You *do
not* design the concepts behind all that. You *use* them as they are. So you
can simply take the identifiers they define and use them the way the docs say.
You do not have to understand these names, they don't have to be words, they
don't have to mean anything to you. They are just tools. Even if you do not
understand English, they will not get in your way. You just learn them.

But you *do* design your own software. You *do* design its concepts. You *do*
design its APIs. You *do* choose its identifiers. And you want them to be
clear and telling. You want them to match your (or your clients) view of the
application. You do not care about the naming of the tools you use inside. But
you do care about clarity and readability in *your own software*.

See the little difference here?

Stefan

May 16 '07 #268

René Fleschenberg wrote:

Marc 'BlackJack' Rintsch schrieb:
>There are potential users of Python who don't know much english or no
english at all. This includes kids, old people, people from countries
that have "letters" that are not that easy to transliterate like european
languages, people who just want to learn Python for fun or to customize
their applications like office suites or GIS software with a Python
scripting option.

Make it an interpreter option that can be turned on for those cases.

No. Make "ASCII-only" an interpreter option that can be turned on for the
cases where it is really required.

Stefan

May 16 '07 #269

Ben

On May 15, 11:25 pm, Stefan Behnel <stefan.behnel-n05...@web.de>
wrote:

René Fleschenberg wrote:
Javier Bezos schrieb:
>But having, for example, things like open() from the stdlib in your code
and then öffnen() as a name for functions/methods written by yourself is
just plain silly. It makes the code inconsistent and ugly without
significantly improving the readability for someone who speaks German
but not English.
Agreed. I always use English names (more or
less :-)), but this is not the PEP is about.

We all know what the PEP is about (we can read). The point is: If we do
not *need* non-English/ASCII identifiers, we do not need the PEP. If the
PEP does not solve an actual *problem* and still introduces some
potential for *new* problems, it should be rejected. So far, the
"problem" seems to just not exist. The burden of proof is on those who
support the PEP.

The main problem here seems to be proving the need of something to peoplewho
do not need it themselves. So, if a simple "but I need it because a, b, c" is
not enough, what good is any further prove?

Stefan

For what it's worth, I can only speak English (bad English schooling!)
and I'm definitely +1 on the PEP. Anyone using tools from the last 5
years can handle UTF-8

Cheers,
Ben

May 16 '07 #270

Gregor Horvath

RenÃ© Fleschenberg schrieb:

>I love Python because it does not dictate how to do things.
I do not need a ASCII-Dictator, I can judge myself when to use this
feature and when to avoid it, like any other feature.

*That* logic can be used to justify the introduction of *any* feature.

No. That logic can only be used to justify the introduction of a feature
that brings freedom.

Who are we to dictate the whole python world how to spell an identifier?

Gregor

May 16 '07 #271

sjdevnull

Ben wrote:

On May 15, 11:25 pm, Stefan Behnel <stefan.behnel-n05...@web.de>
wrote:
Rene Fleschenberg wrote:
Javier Bezos schrieb:
>>But having, for example, things like open() from the stdlib in your code
>>and then o:ffnen() as a name for functions/methods written by yourself is
>>just plain silly. It makes the code inconsistent and ugly without
>>significantly improving the readability for someone who speaks German
>>but not English.
>Agreed. I always use English names (more or
>less :-)), but this is not the PEP is about.

We all know what the PEP is about (we can read). The point is: If we do
not *need* non-English/ASCII identifiers, we do not need the PEP. If the
PEP does not solve an actual *problem* and still introduces some
potential for *new* problems, it should be rejected. So far, the
"problem" seems to just not exist. The burden of proof is on those who
support the PEP.
The main problem here seems to be proving the need of something to people who
do not need it themselves. So, if a simple "but I need it because a, b, c" is
not enough, what good is any further prove?

Stefan

For what it's worth, I can only speak English (bad English schooling!)
and I'm definitely +1 on the PEP. Anyone using tools from the last 5
years can handle UTF-8

The falsehood of the last sentence is why I'm moderately against this
PEP. Even examples within this thread don't display correctly on
several of the machines I have access too (all of which are less than
5 year old OS/browser environments). It strikes me a similar to the
arguments for quoted-printable in the early 1990s, claiming that
everyone can view it or will be able to soon--and even a decade
_after_ "everyone can deal with latin1 just fine" it was still causing
massive headaches.

May 16 '07 #272

sjdevnull

Christophe wrote:

sj*******@yahoo.com a ecrit :
Steven D'Aprano wrote:
I would find it useful to be able to use non-ASCII characters for heavily
mathematical programs. There would be a closer correspondence between the
code and the mathematical equations if one could write D(u*p) instead of
delta(mu*pi).
Just as one risk here:
When reading the above on Google groups, it showed up as "if one could
write ?(u*p)..."
When quoting it for response, it showed up as "could write D(u*p)".

I'm sure that the symbol you used was neither a capital letter d nor a
question mark.

Using identifiers that are so prone to corruption when posting in a
rather popular forum seems dangerous to me--and I'd guess that a lot
of source code highlighters, email lists, etc have similar problems.
I'd even be surprised if some programming tools didn't have similar
problems.

So, it was google groups that continuously corrupted the good UTF-8
posts by force converting them to ISO-8859-1?

Of course, there's also the possibility that it is a problem on *your*
side

Well, that's part of the point isn't it? It seems incredibly naive to
me to think that you could use whatever symbol was intended and have
it show up, and the "well fix your machine!" argument doesn't fly. A
lot of the time programmers have to look at stack traces on end-user's
machines (whatever they may be) to help debug. They have to look at
code on the (GUI-less) production servers over a terminal link. They
have to use all kinds of environments where they can't install the
latest and greatest fonts. Promoting code that becomes very hard to
read and debug in real situations seems like a sound negative to me.

May 16 '07 #273

sjdevnull

Stefan Behnel wrote:

sj*******@yahoo.com wrote:
I even sometimes
read code snippets on email lists and websites from my handheld, which
is sadly still memory-limited enough that I'm really unlikely to
install anything approaching a full set of Unicode fonts.

One of the arguments against this PEP was that it seemed to be impossible to
find either transliterated identifiers in code or native identifiers in Java
code using a web search. So it is very unlikely that you will need to upgrade
your handheld as it is very unlikely for you to stumble into such code.

Sure, if the feature isn't going to be used then it won't present
problems. I can't really see much of an argument for a PEP that isn't
going to be used, though, and if it is used then it's worthwhile to
think about the implications of having code that many common systems
simply can't deal with (either displaying it incorrectly or actually
corrupting files that pass through them).

May 16 '07 #274

Steven D'Aprano schrieb:

>Unless you are 150% sure that there will *never* be the need for a
person who does not know your language of choice to be able to read or
modify your code, the language that "fits the environment best" is
English.

Just a touch of hyperbole perhaps?

You know, it may come to a surprise to some people that English is not
the only common language. In fact, it only ranks third, behind Mandarin
and Spanish, and just above Arabic. Although the exact number of speakers
vary according to the source you consult, the rankings are quite stable:
Mandarin, Spanish, then English. Any of those languages could equally
have claim to be the world's lingua franca.

For a language to be a (or the) lingua franca, the sheer number of
people who speak it is actually not as important as you seem to think.
Its use as an international exchange language is the decisive criterion
-- definitely not true for Mandarin, and for Spanish not nearly as much
as for English.

Also, there can be different "linguae francae" for different fields.
English definitely is the lingua franca of programming. But that is
actually off topic. Programming languages are not the same as natural
languages. I was talking about program code, not about works of literature.

--
RenÃ©

May 16 '07 #275

Neil Hodgson

Lorenzo Gatti:

Ok, maybe you considered listing characters but you earnestly decided
to follow an authority; but this reliance on the Unicode standard is
not a merit: it defers to an external entity (UAX 31 and the Unicode
database) a foundation of Python syntax.

PEP 3131 uses a similar definition to C# except that PEP 3131
disallows formatting characters (category Cf). See section 9.4.2 of
http://www.ecma-international.org/pu...s/Ecma-334.htm

Neil

May 16 '07 #276

"MÃ©ta-MCI" <enl...XmcX@Xm..uX.comwrote:

Hi!

- should non-ASCII identifiers be supported? why?
- would you use them if it was possible to do so? in what cases?

Yes.

JScript can use letters with accents in identifiers
XML (1.1) can use letters with accents in tags
C# can use letters with accents in variables
SQL: MySQL/MS-Sql/Oralcle/etc. can use accents in fields or request
etc.
etc.

Python MUST make up for its lost time.

All those lemmings are jumping over a cliff!
I must hurry to keep up!

- Hendrik

May 16 '07 #277

"Eric Brunel" <e..l@pr...ev.comwrote:

>So what? Does it mean that it's acceptable for the standard library and
keywords to be in English only, but the very same restriction on
user-defined identifiers is out of the question? Why? If I can use my own
language in my identifiers, why can't I write:

classe MaClasse:
dÃ©finir __init__(moi_mÃªme, maListe):
moi_mÃªme.monDictionnaire = {}
pour i dans maListe:
moi_mÃªme.monDictionnaire[i] = Rien

For a French-speaking person, this is far more readable than:

class MaClasse:
def __init__(self, maListe):
self.monDictionnaire = {}
for i in maListe:
self.monDictionnaire[i] = None

Now, *this* is mixing apples and peaches... And this would look even
weirder with a non-indo-european language...

I don't have any French, but I support this point absolutely - having
native identifiers is NFG if you can't also have native reserved words.

You may be stuck with English sentence construction though. - Would
be hard, I would imagine, to let the programmer change the word order,
or to incorporate something as weird as the standard double negative
in Afrikaans...

We say things that translate literally to: "I am not a big man not.", and it
is completely natural, so the if statements should follow the pattern.

- Hendrik

May 16 '07 #278

"Stefan Behnel" <ste...l-******@web.dewrote:

..:) This is not about "technical" English, this is about domain specific

>English. How big is your knowledge about, say, biological terms or banking
terms in English? Would you say you're capable of modelling an application
from the domain of biology, well specified in a large German document, in
perfect English terms?

And: why would you want to do that?

Possibly because it looks better and reads easier than
a dog ugly mix of perfectly good German words
all mixed up with English keywords in an English
style of sentence construction?

- Hendrik

--
"Hier sind wir unter uns" ;-)

May 16 '07 #279

"HYRY" <ru..88@gmail.comwrote:

If non-ASCII identifiers becomes true, I think it will be the best
gift for Children who donot know English.

How do you feel about the mix of English keywords and Chinese?
How does the English - like "sentences " look to a Chinese?

Would you support the extension of this PEP to include Chinese
Keywords?

Would that be a lesser or greater gift?

- Hendrik

May 16 '07 #280

<ru***@yahoo.comwrote:

>
"Hendrik van Rooyen" <m...l@m,,,.co.zawrote in message
news:ma***************************************@pyt hon.org...

<ru***@yahoo.comwrote:

[I fixed the broken attribution in your quote]

Sorry about that - I deliberately fudge email addys...

First "while" is a keyword and will remain "while" so
that has nothing to do with anything.

I think this cuts right down to why I oppose the PEP.
It is not so much for technical reasons as for aesthetic
ones - I find reading a mix of languages horrible, and I am
kind of surprised by the strength of my own reaction.

If I try to analyse my feelings, I think that really the PEP
does not go far enough, in a sense, and from memory
it seems to me that only E Brunel, R Fleschenberg and
to a lesser extent the Martellibot seem to somehow think
in a similar way as I do, but I seem to have an extreme
case of the disease...

And the summaries of reasons for and against have left
out objections based on this feeling of ugliness of mixed
language.

Interestingly, the people who seem to think a bit like that all
seem to be non native English speakers who are fluent in
English.

While the support seems to come from people whose English
is perfectly adequate, but who are unsure to the extent that they
apologise for their "bad" English.

Is this a pattern that you have identified? - I don't know.

I still don't like the thought of the horrible mix of "foreign"
identifiers and English keywords, coupled with the English
sentence construction. And that, in a nutshell, is the main
reason for my rather vehement opposition to this PEP.

The other stuff about sharing and my inability to even type
the OP's name correctly with the umlaut is kind of secondary
to this feeling of revulsion.

"Beautiful is better than ugly"

- Hendrik

May 16 '07 #281

Christophe

sj*******@yahoo.com a écrit :

Christophe wrote:
>sj*******@yahoo.com a ecrit :
>>Steven D'Aprano wrote:
I would find it useful to be able to use non-ASCII characters for heavily
mathematical programs. There would be a closer correspondence between the
code and the mathematical equations if one could write D(u*p) instead of
delta(mu*pi).
Just as one risk here:
When reading the above on Google groups, it showed up as "if one could
write ?(u*p)..."
When quoting it for response, it showed up as "could write D(u*p)".

I'm sure that the symbol you used was neither a capital letter d nor a
question mark.

Using identifiers that are so prone to corruption when posting in a
rather popular forum seems dangerous to me--and I'd guess that a lot
of source code highlighters, email lists, etc have similar problems.
I'd even be surprised if some programming tools didn't have similar
problems.
So, it was google groups that continuously corrupted the good UTF-8
posts by force converting them to ISO-8859-1?

Of course, there's also the possibility that it is a problem on *your*
side

Well, that's part of the point isn't it? It seems incredibly naive to
me to think that you could use whatever symbol was intended and have
it show up, and the "well fix your machine!" argument doesn't fly. A
lot of the time programmers have to look at stack traces on end-user's
machines (whatever they may be) to help debug. They have to look at
code on the (GUI-less) production servers over a terminal link. They
have to use all kinds of environments where they can't install the
latest and greatest fonts. Promoting code that becomes very hard to
read and debug in real situations seems like a sound negative to me.

Who displays stack frames? Your code. Whose code includes unicode
identifiers? Your code. Whose fault is it to create a stack trace
display procedure that cannot handle unicode? You. Even if you don't
make use of them, you still have to fix the stack trace display
procedure because the exception error message can include unicode text
*today*

You should know that displaying and editing UTF-8 text as if it was
latin-1 works very very well.

Also, Terminals have support for UTF-8 encodings already. Or you could
always use kate+fish to edit your script on the distant server without
such problems (fish is a KDE protocol used to access a computer with ssh
as if it was a hard disk and kate is the standard text/code editor) It's
a matter of tools.

May 16 '07 #282

Hendrik van Rooyen wrote:

"Beautiful is better than ugly"

Good point. Today's transliteration of German words into ASCII identifiers
definitely looks ugly. Time for this PEP to be accepted.

Stefan

May 16 '07 #283

Gregor Horvath

sj*******@yahoo.com schrieb:

code on the (GUI-less) production servers over a terminal link. They
have to use all kinds of environments where they can't install the
latest and greatest fonts. Promoting code that becomes very hard to
read and debug in real situations seems like a sound negative to me.

If someone wants to debug a Chinese program, he has in almost all cases
obviously already installed the correct fonts and his machine can handle
unicode.

Maybe yours and mine not, but I doubt that we are going to debug a
chinese program.

I have debugged German programs (not python) with unicode characters in
it for years and had no problem at all, because all customers and me
have obviously German machines.

Gregor

May 16 '07 #284

Gregor Horvath schrieb:

>*That* logic can be used to justify the introduction of *any* feature.

No. That logic can only be used to justify the introduction of a feature
that brings freedom.

That is any feature that you are not forced to use. So let's get gotos
and the like. Every programming language dictates some things. This is
not a bad thing.

--
RenÃ©

May 16 '07 #285

sj*******@yahoo.com wrote:

Stefan Behnel wrote:
>sj*******@yahoo.com wrote:
>>I even sometimes
read code snippets on email lists and websites from my handheld, which
is sadly still memory-limited enough that I'm really unlikely to
install anything approaching a full set of Unicode fonts.
One of the arguments against this PEP was that it seemed to be impossible to
find either transliterated identifiers in code or native identifiers in Java
code using a web search. So it is very unlikely that you will need to upgrade
your handheld as it is very unlikely for you to stumble into such code.

Sure, if the feature isn't going to be used then it won't present
problems.

Thing is, this feature *is* going to be used. Just not by projects that you
are likely to stumble into. Most OpenSource projects will continue to stick to
English-only, and posts to English-speaking newsgroups will also stick to
English. But Closed-Source programs and posts to non-English newsgroups *can*
use this feature if their developers want. And you still wouldn't even notice.

Stefan

May 16 '07 #286

Gregor Horvath

Hendrik van Rooyen schrieb:

It is not so much for technical reasons as for aesthetic
ones - I find reading a mix of languages horrible, and I am
kind of surprised by the strength of my own reaction.

This is a matter of taste.
In some programs I use German identifiers (not unicode). I and others
like the mix. My customers can understand the code better. (They are
only reading it)

>
"Beautiful is better than ugly"

Correct.
But why do you think you should enforce your taste to all of us?

With this logic you should all drive Alfa Romeos!

Gregor

May 16 '07 #287

HYRY

How do you feel about the mix of English keywords and Chinese?

How does the English - like "sentences " look to a Chinese?

Would you support the extension of this PEP to include Chinese
Keywords?

Would that be a lesser or greater gift?

Because the students can remember some English words, Mixing
characters is not a problem. But it's difficult to express their own
thought or logic in English or Pinyin(only mark the pronunciation of
the Chinese character).
As my experience, I found mixing identifiers of Chinese characters and
keywords of English is very easy for reading.
Because the large difference between Chinese characters and ASCII
characters, I can distinguish my identifiers with keywords and
library words quickly.

May 16 '07 #288

Neil Hodgson

Eric Brunel:

... there is no
keyboard *on Earth* allowing to type *all* characters in the whole
Unicode set.

My keyboard in conjunction with the operating system (US English
keyboard on a Windows XP system) allows me to type characters from any
language. I haven't learned how to type these all quickly but I can get
through a session of testing Japanese input by myself. Its a matter of
turning on different keyboard layouts through the "Text Services and
Input Languages" control panel. Then there are small windows called
Input Method Editors that provide a mapping from your input to the
target language. Other platforms provide similar services.

Neil

May 16 '07 #289

"Stefan Behnel" <st******************@web.dewrote:

Hendrik van Rooyen wrote:
"Beautiful is better than ugly"

Good point. Today's transliteration of German words into ASCII identifiers
definitely looks ugly. Time for this PEP to be accepted.

Nice out of context quote. :-)

Now look me in the eye and tell me that you find
the mix of proper German and English keywords
beautiful.

And I will call you a liar.

- Hendrik

May 16 '07 #290

Stefan Behnel schrieb:

>Now, very special environments (what I called "rare and isolated"
earlier) like special learning environments for children are a different
matter. It should be ok if you have to use a specially patched Python
branch there, or have to use an interpreter option that enables the
suggested behaviour. For general programming, it IMO is a bad idea.

Ok, let me put it differently.

You *do not* design Python's keywords. You *do not* design the stdlib. You *do
not* design the concepts behind all that. You *use* them as they are. So you
can simply take the identifiers they define and use them the way the docs say.
You do not have to understand these names, they don't have to be words, they
don't have to mean anything to you. They are just tools. Even if you do not
understand English, they will not get in your way. You just learn them.

I claim that this is *completely unrealistic*. When learning Python, you
*do* learn the actual meanings of English terms like "open",
"exception", "if" and so on if you did not know them before. It would be
extremely foolish not to do so. You do care about these names and you do
want to know their meaning if you want to write anything more in your
life than a 10-line throw-away script.

But you *do* design your own software. You *do* design its concepts. You *do*
design its APIs. You *do* choose its identifiers. And you want them to be
clear and telling. You want them to match your (or your clients) view of the
application. You do not care about the naming of the tools you use inside. But
you do care about clarity and readability in *your own software*.

I do care about the naming of my tools. I care alot. Part of why I like
Python is that it resisted the temptation to clutter the syntax up with
strange symbols like Perl. And I do dislike the decorator syntax, for
example.

Also, your distinction between "inside" and "your own" is nonsense,
because the "inside" does heavily leak into the "own". It is impossible
to write "your own software" with clarity and readability by your
definition (i.e. in your native language). Any real Python program is a
mix of identifiers you designed yourself and identifiers you did not
design yourself. And I think the ones chosen by yourself are even often
in the minority. It is not feasible in practice to just learn what the
"other" identifiers do without understanding their names. Not for
general programming. The standard library is already too big for that,
and most real programs use not only the standard library, but also third
party libraries that have English APIs.

--
RenÃ©

May 16 '07 #291

Christophe schrieb:

Who displays stack frames? Your code.

Wrong.

Whose code includes unicode
identifiers? Your code.

Wrong.

Whose fault is it to create a stack trace
display procedure that cannot handle unicode? You.

Wrong. If you never have to deal with other people's code,
congratulations to you. Many other people have to. And no, I can usualy
not just tell the person to fix his code. I need to deal with it.

Even if you don't
make use of them, you still have to fix the stack trace display
procedure because the exception error message can include unicode text
*today*

The error message can, but at least the function names and other
identifiers can not.

You should know that displaying and editing UTF-8 text as if it was
latin-1 works very very well.s

No, this only works for those characters that are in the ASCII range.
For all the other characters it does not work well at all.

Also, Terminals have support for UTF-8 encodings already.

Some have, some have not. And you not only need a terminal that can
handle UTF-8 data, you also need a font that has a glyph for all the
characters you need to handle, and you may also need a way to actualy
enter those characters with your keyboard.

--
RenÃ©

May 16 '07 #292

Christophe

Christophe schrieb:
>You should know that displaying and editing UTF-8 text as if it was
latin-1 works very very well.s

No, this only works for those characters that are in the ASCII range.
For all the other characters it does not work well at all.

This alone shows you don't know enouth about UTF-8 to talk about it.
UTF-8 will NEVER use < 128 chars to describe multibyte chars. When you
parse a UTF-8 file, each space is a space, each \n is an end of line and
each 'Z' is a 'Z'.

>Also, Terminals have support for UTF-8 encodings already.

Some have, some have not. And you not only need a terminal that can
handle UTF-8 data, you also need a font that has a glyph for all the
characters you need to handle, and you may also need a way to actualy
enter those characters with your keyboard.

Ever heard of the famous "cut/paste"? I use it all the time, even when
handling standard ASCII english code. It greatly cuts down my ability to
make some typo while writing code.

May 16 '07 #293

Christophe schrieb:

RenÃ© Fleschenberg a Ã©crit :
>Christophe schrieb:
>>You should know that displaying and editing UTF-8 text as if it was
latin-1 works very very well.s

No, this only works for those characters that are in the ASCII range.
For all the other characters it does not work well at all.

This alone shows you don't know enouth about UTF-8 to talk about it.
UTF-8 will NEVER use < 128 chars to describe multibyte chars. When you
parse a UTF-8 file, each space is a space, each \n is an end of line and
each 'Z' is a 'Z'.

So? Does that mean that you can just display UTF-8 "as if it was
Latin-1"? No, it does not. It means you can do that for exactly those
characters that are in the ASCII range. For all the others, you can not.

--
RenÃ©

May 16 '07 #294

René Fleschenberg wrote:

Stefan Behnel schrieb:

>>Now, very special environments (what I called "rare and isolated"
earlier) like special learning environments for children are a different
matter. It should be ok if you have to use a specially patched Python
branch there, or have to use an interpreter option that enables the
suggested behaviour. For general programming, it IMO is a bad idea.
Ok, let me put it differently.

You *do not* design Python's keywords. You *do not* design the stdlib. You *do
not* design the concepts behind all that. You *use* them as they are. So you
can simply take the identifiers they define and use them the way the docs say.
You do not have to understand these names, they don't have to be words, they
don't have to mean anything to you. They are just tools. Even if you do not
understand English, they will not get in your way. You just learn them.

I claim that this is *completely unrealistic*. When learning Python, you
*do* learn the actual meanings of English terms like "open",

Fine, then go ahead and learn their actual meaning in two languages (Python
and English). My point is: you don't have to. You only need to understand
their meaning in Python. Whether or not English can help here or can be useful
in your later life is completely off-topic.

>But you *do* design your own software. You *do* design its concepts. You *do*
design its APIs. You *do* choose its identifiers. And you want them to be
clear and telling. You want them to match your (or your clients) view of the
application. You do not care about the naming of the tools you use inside. But
you do care about clarity and readability in *your own software*.

I do care about the naming of my tools. I care alot. Part of why I like
Python is that it resisted the temptation to clutter the syntax up with
strange symbols like Perl. And I do dislike the decorator syntax, for
example.

Also, your distinction between "inside" and "your own" is nonsense,
because the "inside" does heavily leak into the "own". It is impossible
to write "your own software" with clarity and readability by your
definition (i.e. in your native language). Any real Python program is a
mix of identifiers you designed yourself and identifiers you did not
design yourself. And I think the ones chosen by yourself are even often
in the minority. It is not feasible in practice to just learn what the
"other" identifiers do without understanding their names. Not for
general programming. The standard library is already too big for that,
and most real programs use not only the standard library, but also third
party libraries that have English APIs.

Ok, I think the difference here is that I have practical experience with
developing that way and I am missing native identifiers in my daily work. You
don't have that experience and therefore do not feel that need. And you know
what? That's perfectly fine. I'm not criticising that at all. All I'm
criticising is that people without need for this feature are trying to prevent
those who need it and want to use it *where it is appropriate* from actually
getting this feature into the language.

Stefan

May 16 '07 #295

=?utf-8?B?QW5kcsOp?=

"Years ago", i wrote RUR-PLE (a python learning environment based on
Karel the Robot).
Someone mentioned using RUR-PLE to teach programming in Chinese to
kids. Here's a little text extracted from the English lessons (and an
even smaller one from the Turkish one). I believe that this is
relevant to this discussion.
==========
While the creators of Reeborg designed him so that he obeys
instructions in English, they realised that not everyone understands
English. So, they gave him the ability to easily learn a second
language. For example, if we want to tell someone to "move forward" in
French, we would say "avance". We can tell Reeborg that "avance" is a
synonym of "move" simply by writing
avance = move.
The order here is important; the known command has to be on the right,
and the new one has to be on the left. Note that we don't have any
parentheses "()" appearing since the parentheses would tell Reeborg
that we want him to obey an instruction; here, we are simply teaching
him a new word. When we want Reeborg to follow the new instruction, we
will use avance().
[snip]

If you want, you can also teach Reeborg a synonym for turn_off. Or,
you may give synonyms in a language other than French if you prefer,
even creating your own language. Then, watch Reeborg as he obeys
instructions written in your language.
[snip]
Note that, if English is not your favourite language, you can always
create a synonym in your language, as long as you define it first,
before using it. However, the synonym you introduce must use the
English alphabet (letters without any accents). For example, in
French, one might define vire_a_gauche = turn_left and use
vire_a_gauche() to instruct the robot to turn left.

----------(this last paragraph, now translated in Turkish)

EÄŸer Ä°ngilizce sizin favori diliniz deÄŸilse komutlarÄ± her zaman kendi
dilinizde de tanÄ±mlayabilirsiniz, ancak kendi dilinizde tanÄ±mladÄ±ÄŸÄ±nÄ±z
komutlarÄ± oluÅŸtururken yalnÄ±zca Ä°ngiliz alfabesindeki 26 harfi
kullanabilirsiniz. Ã–rneÄŸin TÃ¼rkÃ§ede sola dÃ¶nÃ¼ÅŸ iÃ§in sola_don =
turn_left kullanÄ±lmalÄ±dÄ±r (Ã¶ yerine o kullanÄ±lmÄ±ÅŸ dikkat ediniz). Bu
tanÄ±mlamayÄ± yaptÄ±ktan sonra Reeborg'u sola dÃ¶ndÃ¼rmek iÃ§in sola_don()
komutunu kullanabilirsiniz.
=================
I don't read Turkish, but I notice the number 26 there (plus a many
accented letters in the text), suspecting it refers to a small English
alphabet. It always bugged me that I could not have proper robot
commands in French.
While I would not use any non-ascii characters in my coding project
(because I like to be able to get bug reports [and patch!] from
others), I would love to be able to rewrite the lessons for RUR-PLE
using commands in proper French, rather than the bastardized purely
ascii based version. And I suspect it would be even more important in
Chinese...

AndrÃ©

May 16 '07 #296

On Wed, 16 May 2007 12:22:01 +0200, Neil Hodgson
<ny*****************@gmail.comwrote:

Eric Brunel:

>... there is no keyboard *on Earth* allowing to type *all* characters
in the whole Unicode set.

My keyboard in conjunction with the operating system (US English
keyboard on a Windows XP system) allows me to type characters from any
language. I haven't learned how to type these all quickly but I can get
through a session of testing Japanese input by myself. Its a matter of
turning on different keyboard layouts through the "Text Services and
Input Languages" control panel. Then there are small windows called
Input Method Editors that provide a mapping from your input to the
target language. Other platforms provide similar services.

Funny you talk about Japanese, a language I'm a bit familiar with and for
which I actually know some input methods. The thing is, these only work if
you know the transcription to the latin alphabet of the word you want to
type, which closely match its pronunciation. So if you don't know that å£²ã‚Š
å*´ is pronounced "uriba" for example, you have absolutely no way of
entering the word. Even if you could choose among a list of characters,
are you aware that there are almost 2000 "basic" Chinese characters used
in the Japanese language? And if I'm not mistaken, there are several tens
of thousands characters in the Chinese language itself. This makes typing
them virtually impossible if you don't know the language and/or have the
correct keyboard.
--
python -c "print ''.join([chr(154 - ord(c)) for c in
'U(17zX(%,5.zmz5(17l8(%,5.Z*(93-965$l7+-'])"

May 16 '07 #297

Carsten Haese

On Wed, 16 May 2007 09:12:40 +0200, René Fleschenberg wrote

The X people who speak "no English" and program in Python. I
think X actually is very low (close to zero), because programming in
Python virtually does require you to know some English, wether you
can use non-ASCII characters in identifiers or not. It is naive to
believe that you can program in Python without understanding any
English once you can use your native characters in identifiers. That
will not happen. Please understand that: You basically *must* know
some English to program in Python, and the reason for that is not
that you cannot use non-ASCII identifiers.

There is evidence against your assertions that knowing some English is a
prerequisite for programming in Python and that people won't use non-ASCII
identifiers if they could. Go read the posts by "HYRY" on this thread, a
teacher from China, who teaches his students programming in Python, and they
don't know any English. They *do* use non-ASCII identifiers, and then they use
a cleanup script the teacher wrote to replace the identifiers with ASCII
identifiers so that they can actually run their programs. This disproves your
assertion on both counts.

-Carsten

May 16 '07 #298

Ross Ridge

Ross Ridge schrieb:

non-ASCII identifiers. While it's easy to find code where comments use
non-ASCII characters, I was never able to find a non-made up example
that used them in identifiers.

Gregor Horvath <gh@gregor-horvath.comwrote:

>If comments are allowed to be none English, then why are identifier not?

In the code I was looking at identifiers were allowed to use non-ASCII
characters. For whatever reason, the programmers choose not use non-ASCII
indentifiers even though they had no problem using non-ASCII characters
in commonets.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rr****@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //

May 16 '07 #299