471,852 Members | 1,547 Online

# Allowing zero-dimensional subscripts

Hello,

I discovered that I needed a small change to the Python grammar. I
would like to hear what you think about it.

In two lines:
Currently, the expression "x[]" is a syntax error.
I suggest that it will be evaluated like "x[()]", just as "x[a, b]" is
evaluated like "x[(a, b)]" right now.

In a few more words: Currently, an object can be subscripted by a few
elements, separated by commas. It is evaluated as if the object was
subscripted by a tuple containing those elements. I suggest that an
object will also be subscriptable with no elements at all, and it will
be evaluated as if the object was subscripted by an empty tuple.

It involves no backwards incompatibilities, since we are dealing with
the legalization of a currently illegal syntax.

It is consistent with the current syntax. Consider that these
identities currently hold:

x[i, j, k] <--> x[(i, j, k)]
x[i, j] <--> x[(i, j)]
x[i, ] <--> x[(i, )]
x[i] <--> x[(i)]

I suggest that the next identity will hold too:

x[] <--> x[()]

I need this in order to be able to refer to zero-dimensional arrays
nicely. In NumPy, you can have arrays with a different number of
dimensions. In order to refer to a value in a two-dimensional array,
you write a[i, j]. In order to refer to a value in a one-dimensional
array, you write a[i]. You can also have a zero-dimensional array,
which holds a single value (a scalar). To refer to its value, you
currently need to write a[()], which is unexpected - the user may not
even know that when he writes a[i, j] he constructs a tuple, so he
won't guess the a[()] syntax. If my suggestion is accepted, he will be
able to write a[] in order to refer to the value, as expected. It will
even work without changing the NumPy package at all!

In the normal use of NumPy, you usually don't encounter
zero-dimensional arrays. However, I'm designing another library for
managing multi-dimensional arrays of data. Its purpose is similiar to
that of a spreadsheet - analyze data and preserve the relations between
a source of a calculation and its destination. In such an environment
you may have a lot of multi-dimensional arrays - for example, the sales
of several products over several time periods. But you may also have a
lot of zero-dimensional arrays, that is, single values - for example,
the income tax. I want the access to the zero-dimensional arrays to be
consistent with the access to the multi-dimensional arrays. Just using
the name of the zero-dimensional array to obtain its value isn't going
to work - the array and the value it contains have to be distinguished.

I have tried to change CPython to support it, and it was fairly easy.
You can see the diff against the current SVN here:
http://python.pastebin.com/768317
The test suite passes without changes, as expected. I didn't include
diffs of autogenerated files. I know almost nothing about the AST, so I
would appreciate it if someone who is familiar with the AST will check
to see if I did it right. It does seem to work, though.

Have a good day,
Noam Raphael

Jun 8 '06 #1
22 2155

<sp*******@gmail.com> wrote in message
Hello,

I discovered that I needed a small change to the Python grammar. I
would like to hear what you think about it.

In two lines:
Currently, the expression "x[]" is a syntax error.
I suggest that it will be evaluated like "x[()]", just as "x[a, b]" is
evaluated like "x[(a, b)]" right now.

In a few more words: Currently, an object can be subscripted by a few
elements, separated by commas. It is evaluated as if the object was
subscripted by a tuple containing those elements.
It is not 'as if'. 'a,b' *is* a tuple and the object *is* subcripted by a
tuple.
Adding () around the non-empty tuple adds nothing except a bit of noise.
dis(compile('x[a,b]','','eval')) 0 0 LOAD_NAME 0 (x)
3 LOAD_NAME 1 (a)
6 LOAD_NAME 2 (b)
9 BUILD_TUPLE 2
12 BINARY_SUBSCR
13 RETURN_VALUE dis(compile('x[(a,b)]','','eval'))

0 0 LOAD_NAME 0 (x)
3 LOAD_NAME 1 (a)
6 LOAD_NAME 2 (b)
9 BUILD_TUPLE 2
12 BINARY_SUBSCR
13 RETURN_VALUE

Parens around non-empty tuples are only needed for precedence grouping, the
same as in (a+b)*c.
I suggest that an
object will also be subscriptable with no elements at all, and it will
be evaluated as if the object was subscripted by an empty tuple.

Again, there would be no 'as if' about it. You are suggesting that the
parens around a tuple nothing be optional in this particular context.
While logically possible, Guido decided that tuple nothings should *always*
be distinguished from other nothings and set off from surrounding code by
parentheses. The reason is to avoid ambiguity and catch errors. I think
this is overall a good choice.

Terry Jan Reedy

Jun 8 '06 #2
Hello,

Terry Reedy wrote:
In a few more words: Currently, an object can be subscripted by a few
elements, separated by commas. It is evaluated as if the object was
subscripted by a tuple containing those elements.

It is not 'as if'. 'a,b' *is* a tuple and the object *is* subcripted by a
tuple.
Adding () around the non-empty tuple adds nothing except a bit of noise.

It doesn't necessarily matter, but technically, it is not "a tuple".
The "1, 2" in "x[1, 2]" isn't evaluated according to the same rules as
in "x = 1, 2" - for example, you can have "x[1, 2:3:4, ..., 5]", which
isn't a legal tuple outside of square braces - in fact, it even isn't
legal inside parens: "x[(1, 2:3:4, ..., 5)]" isn't legal syntax.

Noam

Jun 8 '06 #3

<sp*******@gmail.com> wrote in message
Terry Reedy wrote:
> In a few more words: Currently, an object can be subscripted by a few
> elements, separated by commas. It is evaluated as if the object was
> subscripted by a tuple containing those elements.
It is not 'as if'. 'a,b' *is* a tuple and the object *is* subcripted
by a
tuple.
Adding () around the non-empty tuple adds nothing except a bit of noise.

It doesn't necessarily matter, but technically, it is not "a tuple".

Tell that to the compiler. Here the code again.
dis(compile('x[a,b]','','eval')) 0 0 LOAD_NAME 0 (x)
3 LOAD_NAME 1 (a)
6 LOAD_NAME 2 (b)
9 BUILD_TUPLE 2
12 BINARY_SUBSCR
13 RETURN_VALUE dis(compile('x=a,b','','single')) 1 0 LOAD_NAME 0 (a)
3 LOAD_NAME 1 (b)
6 BUILD_TUPLE 2
9 STORE_NAME 2 (x)
12 LOAD_CONST 0 (None)
15 RETURN_VALUE

The same exact code to build the tuple a,b.
The "1, 2" in "x[1, 2]" isn't evaluated according to the same rules as
in "x = 1, 2" dis(compile('x[1,2]','','eval')) 0 0 LOAD_NAME 0 (x)
3 LOAD_CONST 2 ((1, 2))
6 BINARY_SUBSCR
7 RETURN_VALUE dis(compile('x=1,2','','single')) 1 0 LOAD_CONST 3 ((1, 2))
3 STORE_NAME 0 (x)
6 LOAD_CONST 2 (None)
9 RETURN_VALUE

Same exact tuple literal. The tuple rules are the same.
- for example, you can have "x[1, 2:3:4, ..., 5]", which
isn't a legal tuple outside of square braces - in fact, it even isn't
legal inside parens: "x[(1, 2:3:4, ..., 5)]" isn't legal syntax.

Yes, slice and ellipsis literals are only valid directly inside brackets.
And it is definitely worth knowing about them and that this is one place
where a tuple cannot be parenthesized. But once they are accepted, the
slice and ellipsis objects are members of the resulting tuple like any
other.
dis(compile("x[1, 2:3:4, ..., 5]", '','eval'))

0 0 LOAD_NAME 0 (x)
3 LOAD_CONST 0 (1)
6 LOAD_CONST 1 (2)
9 LOAD_CONST 2 (3)
12 LOAD_CONST 3 (4)
15 BUILD_SLICE 3
18 LOAD_CONST 4 (Ellipsis)
21 LOAD_CONST 5 (5)
24 BUILD_TUPLE 4
27 BINARY_SUBSCR
28 RETURN_VALUE

So I do not see any point or usefulness in saying that a tuple subcript is
not what it is.

Terry Jan Reedy

Jun 9 '06 #4
Hello,

Terry Reedy wrote:
So I do not see any point or usefulness in saying that a tuple subcript is
not what it is.

I know that a tuple is *constructed*. The question is, is this,
conceptually, the feature that allows you to ommit the parentheses of a
tuple in some cases. If we see this as the same feature, it's
reasonable that "nothing" won't be seen as an empty tuple, just like "a
= " doesn't mean "a = ()".

However, if we see this as a different feature, which allows
multidimensional subscript by constructing a tuple behind the scenes,
constructing an empty tuple for x[] seems very reasonable to me. Since
in some cases you can't have the parentheses at all, I think that x[]
makes sense.

Noam

Jun 9 '06 #5
sp*******@gmail.com wrote:
Hello,

Terry Reedy wrote:
So I do not see any point or usefulness in saying that a tuple subcript is
not what it is.

I know that a tuple is *constructed*. The question is, is this,
conceptually, the feature that allows you to ommit the parentheses of a
tuple in some cases. If we see this as the same feature, it's
reasonable that "nothing" won't be seen as an empty tuple, just like "a
= " doesn't mean "a = ()".

However, if we see this as a different feature, which allows
multidimensional subscript by constructing a tuple behind the scenes,
constructing an empty tuple for x[] seems very reasonable to me. Since
in some cases you can't have the parentheses at all, I think that x[]
makes sense.

Hey, I have an idea, why don't we look at the language reference manual
instead of imagining how we think it might work!

In section 3.2 we find:
"""
Tuples
The items of a tuple are arbitrary Python objects. Tuples of two or more
items are formed by comma-separated lists of expressions. A tuple of one
item (a `singleton') can be formed by affixing a comma to an expression
(an expression by itself does not create a tuple, since parentheses must
be usable for grouping of expressions). An empty tuple can be formed by
an empty pair of parentheses.
"""

So it seems that your speculation is false. Section 2.6 specifically
defines "[" and "]" as delimiters. Section 5.3.2 defines a subscription
(a term I've not really grown to love, but what the heck) as

subscription ::= primary "[" expression_list "]"

and section 5.12, which defines expression_list, explicitly says

"""An expression list containing at least one comma yields a tuple.""".

So it would appear that while your change might be very convenient to
allow you to refer to scalar values as zero-dimensional arrays, it
doesn't really fit into Python's conceptual framework. Sorry.

One further point: if you really do conceptualize scalars as
zero-dimensional arrays, where is the value conceptually stored?

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Love me, love my blog http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Jun 9 '06 #6
sp*******@gmail.com wrote:

-0 from me, but it's definitely a PEP-able proposal.

suggestion: turn your post into a pre-PEP and post it somewhere, post
the patch to the patch tracker, and post a brief heads-up to python-dev,
and see what happens.

(you probably have to do all that today if you want this to go into 2.5,
though...)

</F>

Jun 9 '06 #7

<sp*******@gmail.com> wrote in message
The question is, is this,
conceptually, the feature that allows you to ommit the parentheses of a
tuple in some cases.

To repeat: tuples are defined by commas. There are no 'parentheses of a
tuple', except for empty tuples, to be omitted. Consider: a+b is a sum; in
(a+b)*c, the sum is parenthesized to avoid confusion with a+b*c. Like
other expressions, tuples are parenthesezed when needed to avoid similar
confusion. So (1,2)+(3,4) needs parens because 1,2+3,4 would be something
different. In both examples, parens are used to reverse normal precedence
relations.

Terry Jan Reedy

Jun 9 '06 #8
Steve Holden wrote:
Hey, I have an idea, why don't we look at the language reference manual
instead of imagining how we think it might work!
I don't know. Sounds risky.

In section 3.2 we find:
"""
Tuples
The items of a tuple are arbitrary Python objects. Tuples of two or more
items are formed by comma-separated lists of expressions. A tuple of one
item (a `singleton') can be formed by affixing a comma to an expression
(an expression by itself does not create a tuple, since parentheses must
be usable for grouping of expressions). An empty tuple can be formed by
an empty pair of parentheses.
"""

So it seems that your speculation is false. Section 2.6 specifically
defines "[" and "]" as delimiters. Section 5.3.2 defines a subscription
(a term I've not really grown to love, but what the heck) as

subscription ::= primary "[" expression_list "]"

and section 5.12, which defines expression_list, explicitly says

"""An expression list containing at least one comma yields a tuple.""".

So it would appear that while your change might be very convenient to
allow you to refer to scalar values as zero-dimensional arrays, it
doesn't really fit into Python's conceptual framework. Sorry.
Yes, that would appear to be so. You would have a point... if the
documentation were correct. Only it's not.

According to the reference manual, the rule for an expression_list is:

expression_list ::= expression ( "," expression )* [","]

But take the following legal Python subscripted array:

a[1:2,...,3:4]

Is "1:2" an expression? How about "..."? When I enter 1:2 at the
Python prompt, I get a syntax error. The fact is, the documentation
here is either wrong or simplified or both. (I don't think it's a big
deal, actually: the real grammar has lots of complexity to handle
tricky cases that would needlessly complicate the reference manual for
a human reader.) So let's look at an excerpt the actual Python grammar
(from 2.4.3). You'll be happy to know subscription isn't used. :)

trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
subscriptlist: subscript (',' subscript)* [',']
sliceop: ':' [test]
subscript: '.' '.' '.' | test | [test] ':' [test] [sliceop]
testlist: test (',' test)* [',']

Clearly, the grammar rule used for list subscript is different from the
one used for list of expressions (for some reason, what an ordinary
person would call an expression is called a "test" in the grammar,
whereas "expr" is a non-short-circuiting expression).

So there's a regular way to create non-empty tuples, and a subscript
way.

And there's a regular way to create an empty tuple... but not a
subscript way.

So I'd say this change fits the conceptual framework of the tuple quite
well; in fact, it makes subscript tuples more parallel to their regular
counterparts.

One further point: if you really do conceptualize scalars as
zero-dimensional arrays, where is the value conceptually stored?

Think of it this way: an array with n-dimensions of length 3 would have
3**n total entries. How many entries would a 0-dimensional array have?
3**0 == 1.

Numeric has had zero-dimensional arrays for a long time, and has had no
problem storing them. Think of the rule for accessing an element of an
array: it's a base pointer + sum (indices*stride) for all indices. Now
generalize it down to zero: there are no indices, so the scalar is
stored at the base pointer.
Carl Banks

Jun 9 '06 #9
Carl Banks wrote:
Steve Holden wrote:
Hey, I have an idea, why don't we look at the language reference manual
instead of imagining how we think it might work!

I don't know. Sounds risky.
In section 3.2 we find:
"""
Tuples
The items of a tuple are arbitrary Python objects. Tuples of two or more
items are formed by comma-separated lists of expressions. A tuple of one
item (a `singleton') can be formed by affixing a comma to an expression
(an expression by itself does not create a tuple, since parentheses must
be usable for grouping of expressions). An empty tuple can be formed by
an empty pair of parentheses.
"""

So it seems that your speculation is false. Section 2.6 specifically
defines "[" and "]" as delimiters. Section 5.3.2 defines a subscription
(a term I've not really grown to love, but what the heck) as

subscription ::= primary "[" expression_list "]"

and section 5.12, which defines expression_list, explicitly says

"""An expression list containing at least one comma yields a tuple.""".

So it would appear that while your change might be very convenient to
allow you to refer to scalar values as zero-dimensional arrays, it
doesn't really fit into Python's conceptual framework. Sorry.

Yes, that would appear to be so. You would have a point... if the
documentation were correct. Only it's not.

According to the reference manual, the rule for an expression_list is:

expression_list ::= expression ( "," expression )* [","]

But take the following legal Python subscripted array:

a[1:2,...,3:4]

But the element inside the brackets there isn't an expression-list,
it's a slicing (see section 5.3.2).
Is "1:2" an expression? How about "..."? When I enter 1:2 at the
1:2 is a short slice.

.... is an ellipsis.

Neither of these elements are allowed in non-subscripting contexts.
Python prompt, I get a syntax error. The fact is, the documentation
here is either wrong or simplified or both. (I don't think it's a big
deal, actually: the real grammar has lots of complexity to handle
tricky cases that would needlessly complicate the reference manual for
a human reader.) So let's look at an excerpt the actual Python grammar
(from 2.4.3). You'll be happy to know subscription isn't used. :)

trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
subscriptlist: subscript (',' subscript)* [',']
sliceop: ':' [test]
subscript: '.' '.' '.' | test | [test] ':' [test] [sliceop]
testlist: test (',' test)* [',']
The simplification of the grammar is explicitly documented:

"""
Rather than further complicating the syntax, this is disambiguated by
defining that in this case the interpretation as a subscription takes
priority over the interpretation as a slicing (this is the case if the
slice list contains no proper slice nor ellipses). Similarly, when the
slice list has exactly one short slice and no trailing comma, the
interpretation as a simple slicing takes priority over that as an
extended slicing.
"""
Clearly, the grammar rule used for list subscript is different from the
one used for list of expressions (for some reason, what an ordinary
person would call an expression is called a "test" in the grammar,
whereas "expr" is a non-short-circuiting expression).

So there's a regular way to create non-empty tuples, and a subscript
way.

And there's a regular way to create an empty tuple... but not a
subscript way.

So I'd say this change fits the conceptual framework of the tuple quite
well; in fact, it makes subscript tuples more parallel to their regular
counterparts.
Although this debate is beginning to make me sound like one, I am really
not a language lawyer. However, I should point out that what you are
describing as a "tuple" should more correctly be described as a
"slice-list" once you include slices or an ellipsis as elements.
Slicings are described, as I am fairly sure you know, in section 5.3.3.
One further point: if you really do conceptualize scalars as
zero-dimensional arrays, where is the value conceptually stored?

Think of it this way: an array with n-dimensions of length 3 would have
3**n total entries. How many entries would a 0-dimensional array have?
3**0 == 1.

Numeric has had zero-dimensional arrays for a long time, and has had no
problem storing them. Think of the rule for accessing an element of an
array: it's a base pointer + sum (indices*stride) for all indices. Now
generalize it down to zero: there are no indices, so the scalar is
stored at the base pointer.

I can see that, and it doesn't seem unreasonable. Fortunately your
persistence has goaded me into determining the point that *did* seem
unreasonable to me: you were falsely trying to equate slicings and tuples.

Having said all of which, there probably *is* a case for proposing that
an empty slicing become syntactically acceptable, so why not write the
PEP and go for it?

But be quick: feature freeze for 2.5b1 looms ...

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Love me, love my blog http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Jun 9 '06 #10
Sybren Stuvel wrote:
Just curious: how would you initialize 'x' in such a case? If I simply
say 'x = []' and then try to index it with x[1, 2], I get "TypeError:
list indices must be integers".

that's up to the x implementation to decide, of course:
class MyContainer: .... def __getitem__(self, index):
.... return index
.... x = MyContainer()
x[1] 1 x[1, 2] (1, 2) x[(1, 2, 3)] (1, 2, 3) x[()] ()

noam's proposal is to make this work:
x[]

()

(but should it really result in an empty tuple? wouldn't None be a bit
more Pythonic?)

</F>

Jun 9 '06 #11
Hello,

Sybren Stuvel wrote:
I think it's ugly to begin with. In math, one would write simply 'x'
to denote an unsubscribed (ubsubscripted?) 'x'. And another point, why
would one call __getitem__ without an item to call?

I think that in this case, mathematical notation is different from
python concepts.

If I create a zero-dimensional array, with the value 5, like this:
a = array(5)
I refer to the array object as "a", and to the int it stores as "a[]".

For example, I can change the value it holds by writing a[] = 8

Writing "a = 8" would have a completely different meaning - create a
new name, a, pointing at a new int, 8.

Noam

Jun 9 '06 #12
Hello,

Fredrik Lundh wrote:
(but should it really result in an empty tuple? wouldn't None be a bit
more Pythonic?)

I don't think it would. First of all, x[()] already has the desired
meaning in numpy. But I think it's the right thing - if you think of
what's inside the brackets as a list of subscripts, one for each
dimension, which is translated to a call to __getitem__ or __setitem__
with a tuple of objects representing the subscripts, then an empty
tuple is what you want to represent no subscripts.

Of course, one item without a comma doesn't make a tuple, but I see
this as the special case - just like parentheses with any number of
commas are interpreted as tuples, except for parentheses with one item
without a comma.

(By the way, thanks for the tips for posting a PEP - I'll try to do it
quickly.)

Noam

Jun 9 '06 #13
Op 2006-06-08, sp*******@gmail.com schreef <sp*******@gmail.com>:
Hello,

Terry Reedy wrote:
> In a few more words: Currently, an object can be subscripted by a few
> elements, separated by commas. It is evaluated as if the object was
> subscripted by a tuple containing those elements.
It is not 'as if'. 'a,b' *is* a tuple and the object *is* subcripted by a
tuple.
Adding () around the non-empty tuple adds nothing except a bit of noise.

It doesn't necessarily matter, but technically, it is not "a tuple".

Yes it is.
The "1, 2" in "x[1, 2]" isn't evaluated according to the same rules as
in "x = 1, 2"
I was pretty sure it was.
- for example, you can have "x[1, 2:3:4, ..., 5]", which
isn't a legal tuple outside of square braces
Yes it is, it just is illegal notation outside square brackets.
You could have approximate the same effect by

I = 1, slice(2,3,4), Ellipsis, 5
x[i]
- in fact, it even isn't
legal inside parens: "x[(1, 2:3:4, ..., 5)]" isn't legal syntax.

But what is illegal is the notation, not the value.

--
Antoon Pardon
Jun 9 '06 #14
Hello,

Following Fredrik's suggestion, I wrote a pre-PEP. It's available on
the wiki, at http://wiki.python.org/moin/EmptySubscriptListPEP and I
also copied it to this message.

Have a good day,
Noam
PEP: XXX
Title: Allow Empty Subscript List Without Parentheses
Version: \$Revision\$
Last-Modified: \$Date\$
Author: Noam Raphael <sp*******@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 09-Jun-2006
Python-Version: 2.5?
Post-History: 30-Aug-2002

Abstract
========

This PEP suggests to allow the use of an empty subscript list, for
example ``x[]``, which is currently a syntax error. It is suggested
that in such a case, an empty tuple will be passed as an argument to
the __getitem__ and __setitem__ methods. This is consistent with the
current behaviour of passing a tuple with n elements to those methods
when a subscript list of length n is used, if it includes a comma.
Specification
=============

The Python grammar specifies that inside the square brackets trailing
an expression, a list of "subscripts", separated by commas, should be
given. If the list consists of a single subscript without a trailing
comma, a single object (an ellipsis, a slice or any other object) is
passed to the resulting __getitem__ or __setitem__ call. If the list
consists of many subscripts, or of a single subscript with a trailing
comma, a tuple is passed to the resulting __getitem__ or __setitem__
call, with an item for each subscript.

Here is the formal definition of the grammar:

::
trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
subscriptlist: subscript (',' subscript)* [',']
subscript: '.' '.' '.' | test | [test] ':' [test] [sliceop]
sliceop: ':' [test]

This PEP suggests to allow an empty subscript list, with nothing
inside the square brackets. It will result in passing an empty tuple
to the resulting __getitem__ or __setitem__ call.

The change in the grammar is to make "subscriptlist" in the first
quoted line optional:

::
trailer: '(' [arglist] ')' | '[' [subscriptlist] ']' | '.' NAME
Motivation
==========

This suggestion allows you to refer to zero-dimensional arrays
elegantly. In
NumPy, you can have arrays with a different number of dimensions. In
order to refer to a value in a two-dimensional array, you write
``a[i, j]``. In order to refer to a value in a one-dimensional array,
you write ``a[i]``. You can also have a zero-dimensional array, which
holds a single value (a scalar). To refer to its value, you currently
need to write ``a[()]``, which is unexpected - the user may not even
know that when he writes ``a[i, j]`` he constructs a tuple, so he
won't guess the ``a[()]`` syntax. If the suggestion is accepted, the
user will be able to write ``a[]`` in order to refer to the value, as
expected. It will even work without changing the NumPy package at all!

In the normal use of NumPy, you usually don't encounter
zero-dimensional arrays. However, the author of this PEP is designing
another library for managing multi-dimensional arrays of data. Its
purpose is similar to that of a spreadsheet - to analyze data and
preserve the relations between a source of a calculation and its
destination. In such an environment you may have many
multi-dimensional arrays - for example, the sales of several products
over several time periods. But you may also have several
zero-dimensional arrays, that is, single values - for example, the
income tax rate. It is desired that the access to the zero-dimensional
arrays will be consistent with the access to the multi-dimensional
arrays. Just using the name of the zero-dimensional array to obtain
its value isn't going to work - the array and the value it contains
have to be distinguished.
Rationale
=========

Passing an empty tuple to the __getitem__ or __setitem__ call was
chosen because it is consistent with passing a tuple of n elements
when a subscript list of n elements is used. Also, it will make NumPy
and similar packages work as expected for zero-dimensional arrays
without
any changes.

Another hint for consistency: Currently, these equivalences hold:

::
x[i, j, k] <--> x[(i, j, k)]
x[i, j] <--> x[(i, j)]
x[i, ] <--> x[(i, )]
x[i] <--> x[(i)]

If this PEP is accepted, another equivalence will hold:

::
x[] <--> x[()]
Backwards Compatibility
=======================

This change is fully backwards compatible, since it only assigns a
meaning to a previously illegal syntax.
Reference Implementation
========================

Available as SF Patch no. 1503556.
(and also in http://python.pastebin.com/768317 )

It passes the Python test suite, but currently doesn't provide
additional tests or documentation.
=========

This document has been placed in the public domain.

Jun 9 '06 #15

Steve Holden wrote:
Carl Banks wrote:
Steve Holden wrote:
Hey, I have an idea, why don't we look at the language reference manual
instead of imagining how we think it might work!

I don't know. Sounds risky.
In section 3.2 we find:
"""
Tuples
The items of a tuple are arbitrary Python objects. Tuples of two or more
items are formed by comma-separated lists of expressions. A tuple of one
item (a `singleton') can be formed by affixing a comma to an expression
(an expression by itself does not create a tuple, since parentheses must
be usable for grouping of expressions). An empty tuple can be formed by
an empty pair of parentheses.
"""

So it seems that your speculation is false. Section 2.6 specifically
defines "[" and "]" as delimiters. Section 5.3.2 defines a subscription
(a term I've not really grown to love, but what the heck) as

subscription ::= primary "[" expression_list "]"

and section 5.12, which defines expression_list, explicitly says

"""An expression list containing at least one comma yields a tuple.""".

So it would appear that while your change might be very convenient to
allow you to refer to scalar values as zero-dimensional arrays, it
doesn't really fit into Python's conceptual framework. Sorry.

Yes, that would appear to be so. You would have a point... if the
documentation were correct. Only it's not.

According to the reference manual, the rule for an expression_list is:

expression_list ::= expression ( "," expression )* [","]

But take the following legal Python subscripted array:

a[1:2,...,3:4]

But the element inside the brackets there isn't an expression-list,
it's a slicing (see section 5.3.2).

Section 5.3.2 says an expression-list is what's inside the brackets
(you quoted this rule yourself). Section 5.12 says an expression-list
consissts of comma-separated expressions. But 1:2 and ... aren't
expressions. I only brought this up to point out that the docs were
not exactly correct, and in particular it swept an important
distinction (for this nit-picky discussion) under the rug.
[snip]
trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
subscriptlist: subscript (',' subscript)* [',']
sliceop: ':' [test]
subscript: '.' '.' '.' | test | [test] ':' [test] [sliceop]
testlist: test (',' test)* [',']

The simplification of the grammar is explicitly documented:

[snip]

Nice to know.

Clearly, the grammar rule used for list subscript is different from the
one used for list of expressions (for some reason, what an ordinary
person would call an expression is called a "test" in the grammar,
whereas "expr" is a non-short-circuiting expression).

So there's a regular way to create non-empty tuples, and a subscript
way.

And there's a regular way to create an empty tuple... but not a
subscript way.

So I'd say this change fits the conceptual framework of the tuple quite
well; in fact, it makes subscript tuples more parallel to their regular
counterparts.

Although this debate is beginning to make me sound like one, I am really
not a language lawyer. However, I should point out that what you are
describing as a "tuple" should more correctly be described as a
"slice-list" once you include slices or an ellipsis as elements.
Slicings are described, as I am fairly sure you know, in section 5.3.3.

No, I don't think I am. I was careful to distinguish between grammar
rules and tuples. subscriptlist and testlist are grammar rules.
"Evaluation" of both of these guys can create a tuple. The two grammar
rules have exactly the same relationship to the tuple: an testlist with
a comma creates a tuple, a subscriptlist with a comma creates a tuple.

But note neither testlist nor subscriptlist can create an empty tuple.
You need a special case for that--only there's no special special case
for subscripts. (The visual effect, as OP noted, is that you need
parentheses for the empty tuple, but never need them for any other
subscript.)

One further point: if you really do conceptualize scalars as
zero-dimensional arrays, where is the value conceptually stored?

Think of it this way: an array with n-dimensions of length 3 would have
3**n total entries. How many entries would a 0-dimensional array have?
3**0 == 1.

Numeric has had zero-dimensional arrays for a long time, and has had no
problem storing them. Think of the rule for accessing an element of an
array: it's a base pointer + sum (indices*stride) for all indices. Now
generalize it down to zero: there are no indices, so the scalar is
stored at the base pointer.

I can see that, and it doesn't seem unreasonable. Fortunately your
persistence has goaded me into determining the point that *did* seem
unreasonable to me: you were falsely trying to equate slicings and tuples.

The OP might have been; I wasn't. But that's fair enough; everyone
should be clear on what's happening behind the scenes.
Having said all of which, there probably *is* a case for proposing that
an empty slicing become syntactically acceptable, so why not write the
PEP and go for it?

Well, I'm only +0 on it, so I'll leave it to the OP. I was mainly
concerned with how the language reference was obscuring what I felt
were important distinctions.
Carl Banks

Jun 9 '06 #16
Fredrik Lundh wrote:
noam's proposal is to make this work:
>>> x[]

()

(but should it really result in an empty tuple? wouldn't None be a bit
more Pythonic?)

How would you index a 2-D array? With a 2-tuple.
How would you index a 1-D array? With a 1-tuple.
How would you index a 0-D array? ...
Carl Banks

Jun 9 '06 #17
sp*******@gmail.com wrote:
However, I'm designing another library for
managing multi-dimensional arrays of data. Its purpose is similiar to
that of a spreadsheet - analyze data and preserve the relations between
a source of a calculation and its destination.

Sounds interesting. Will it be related at all to OLAP or the
Multi-Dimensional eXpressions language
(http://msdn2.microsoft.com/en-us/library/ms145506.aspx) ?

George

Jun 10 '06 #18
Carl Banks wrote:
How would you index a 2-D array? With a 2-tuple.
How would you index a 1-D array? With a 1-tuple.
How would you index a 0-D array? ...

array dimensions don't exist at the Python level. you're confusing
behaviour that a custom class may provide with Python's view of things.

(and None is of course the standard value for "not there")

</F>

Jun 10 '06 #19
Carl Banks wrote:
Think of it this way: an array with n-dimensions of length 3 would have
3**n total entries. How many entries would a 0-dimensional array have?
3**0 == 1.

Er, hang on a minute. Along which dimension of this
0-dimensional array does it have a length of 3? :-)

--
Greg
Jun 10 '06 #20

greg wrote:
Carl Banks wrote:
Think of it this way: an array with n-dimensions of length 3 would have
3**n total entries. How many entries would a 0-dimensional array have?
3**0 == 1.

Er, hang on a minute. Along which dimension of this
0-dimensional array does it have a length of 3? :-)

--
Greg

Against all zero of them... ;-)

Cheers,

--Tim

Jun 10 '06 #21
George Sakkis wrote:
sp*******@gmail.com wrote:
However, I'm designing another library for
managing multi-dimensional arrays of data. Its purpose is similiar to
that of a spreadsheet - analyze data and preserve the relations between
a source of a calculation and its destination.

Sounds interesting. Will it be related at all to OLAP or the
Multi-Dimensional eXpressions language
(http://msdn2.microsoft.com/en-us/library/ms145506.aspx) ?

Thanks for the reference! I didn't know about any of these. It will
probably be interesting to learn from them. From a brief look at OLAP
in wikipedia, it may have similarities to OLAP. I don't think it will
be related to Microsoft's language, because the language will simply by
Python, hopefully making it very easy to do whatever you like with the
data.

I posted to python-dev a message that (hopefully) better explains my
use for x[]. Here it is - I think that it also gives an idea on how it
will look like.
I'm talking about something similar to a spreadsheet in that it saves
data, calculation results, and the way to produce the results.
However, it is not similar to a spreadsheet in that the data isn't
saved in an infinite two-dimensional array with numerical indices.
Instead, the data is saved in a few "tables", each storing a different
kind of data. The tables may be with any desired number of dimensions,
and are indexed by meaningful indices, instead of by natural numbers.

For example, you may have a table called sales_data. It will store the
sales data in years from set([2003, 2004, 2005]), for car models from
set(['Subaru', 'Toyota', 'Ford']), for cities from set(['Jerusalem',
'Tel Aviv', 'Haifa']). To refer to the sales of Ford in Haifa in 2004,
you will simply write: sales_data[2004, 'Ford', 'Haifa']. If the table
is a source of data (that is, not calculated), you will be able to set
values by writing: sales_data[2004, 'Ford', 'Haifa'] = 1500.

Tables may be computed tables. For example, you may have a table which
holds for each year the total sales in that year, with the income tax
subtracted. It may be defined by a function like this:

lambda year: sum(sales_data[year, model, city] for model in models for
city in cities) / (1 + income_tax_rate)

Now, like in a spreadsheet, the function is kept, so that if you
change the data, the result will be automatically recalculated. So, if
you discovered a mistake in your data, you will be able to write:

sales_data[2004, 'Ford', 'Haifa'] = 2000

and total_sales[2004] will be automatically recalculated.

Now, note that the total_sales table depends also on the
income_tax_rate. This is a variable, just like sales_data. Unlike
sales_data, it's a single value. We should be able to change it, with
the result of all the cells of the total_sales table recalculated. But
how will we do it? We can write

income_tax_rate = 0.18

but it will have a completely different meaning. The way to make the
income_tax_rate changeable is to think of it as a 0-dimensional table.
It makes sense: sales_data depends on 3 parameters (year, model,
city), total_sales depends on 1 parameter (year), and income_tax_rate
depends on 0 parameters. That's the only difference. So, thinking of
it like this, we will simply write:

income_tax_rate[] = 0.18

Now the system can know that the income tax rate has changed, and
recalculate what's needed. We will also have to change the previous
function a tiny bit, to:

lambda year: sum(sales_data[year, model, city] for model in models for
city in cities) / (1 + income_tax_rate[])

But it's fine - it just makes it clearer that income_tax_rate[] is a
part of the model that may change its value.
Have a good day,
Noam

Jun 10 '06 #22
sp*******@gmail.com wrote:
George Sakkis wrote:
sp*******@gmail.com wrote:
However, I'm designing another library for
managing multi-dimensional arrays of data. Its purpose is similiar to
that of a spreadsheet - analyze data and preserve the relations between
a source of a calculation and its destination.

Sounds interesting. Will it be related at all to OLAP or the
Multi-Dimensional eXpressions language
(http://msdn2.microsoft.com/en-us/library/ms145506.aspx) ?

Thanks for the reference! I didn't know about any of these. It will
probably be interesting to learn from them. From a brief look at OLAP
in wikipedia, it may have similarities to OLAP. I don't think it will
be related to Microsoft's language, because the language will simply by
Python, hopefully making it very easy to do whatever you like with the
data.

Glad it helped, I thought you were already familiar withe these. As for
MDX, I didn't mean you should use it instead of python or implement it
at the syntax level, but whether you consider an API with similar
concepts. Given your description below, I think you should by all means
take a look at MDX focusing on the concepts (datacubes, dimensions,
hierarchies, levels, measures, etc.) and the functions, not its syntax.
Here's a decent manual I found online
http://support.sas.com/documentation...p_mdx_7002.pdf.

Regards,
George

Jun 10 '06 #23

### This discussion thread is closed

Replies have been disabled for this discussion.