Bug in slice type

Bryan Olson

The Python slice type has one method 'indices', and reportedly:

This method takes a single integer argument /length/ and
computes information about the extended slice that the slice
object would describe if applied to a sequence of length
items. It returns a tuple of three integers; respectively
these are the /start/ and /stop/ indices and the /step/ or
stride length of the slice. Missing or out-of-bounds indices
are handled in a manner consistent with regular slices.

http://docs.python.org/ref/types.html
It behaves incorrectly when step is negative and the slice
includes the 0 index.
class BuggerAll:

def __init__(self, somelist):
self.sequence = somelist[:]

def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]
print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]
The above prints:

[9, 7, 5, 3, 1]
[]

Un-commenting the print statement in __getitem__ shows:

Slice says start, stop, step are: 9 -1 -2

The slice object seems to think that -1 is a valid exclusive
bound, but when using it to actually slice, Python interprets
negative numbers as an offset from the high end of the sequence.

Good start-stop-step values are (9, None, -2), or (9, -11, -2),
or (-1, -11, -2). The later two have the advantage of being
consistend with the documented behavior of returning three
integers.
--
--Bryan

Aug 10 '05

Subscribe Post Reply

108

6277

skip

Paul> Steve Holden <st***@holdenweb.com> writes:

A corrected find() that returns None on failure is a five-liner.

Paul> If I wanted to write five lines instead of one everywhere in a
Paul> Python program, I'd use Java.

+1 for QOTW.

Skip

Aug 27 '05 #51

Steve Holden

Paul Rubin wrote:

Steve Holden <st***@holdenweb.com> writes:
A corrected find() that returns None on failure is a five-liner.

If I wanted to write five lines instead of one everywhere in a Python
program, I'd use Java.

We are arguing about trivialities here. Let's stop before it gets
interesting :-)

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Aug 27 '05 #52

Bryan Olson

Steve Holden wrote:

Paul Rubin wrote:
We are arguing about trivialities here. Let's stop before it gets
interesting :-)

Some of us are looking beyond the trivia of what string.find()
should return, at an unfortunate interaction of Python features,
brought on by the special-casing of negative indexes. The wart
bites novice or imperfect Python programmers in simple cases
such as string.find(), or when their subscripts accidentally
fall off the low end. It bites programmers who want to fully
implement Python slicing, because of the double-and-
contradictory- interpretation of -1, as both an exclusive ending
bound and the index of the last element. It bites documentation
authors who naturally think of the non-negative subscript as
*the* index of a sequence item.
--
--Bryan

Aug 28 '05 #53

bearophileHUGS

I agree with Bryan Olson.
I think it's a kind of bug, and it has to be fixed, like few other
things.

But I understand that this change can give little problems to the
already written code...

Bye,
bearophile

Aug 28 '05 #54

Steve Holden

Bryan Olson wrote:

Steve Holden wrote:
> Paul Rubin wrote:
> We are arguing about trivialities here. Let's stop before it gets
> interesting :-)
Some of us are looking beyond the trivia of what string.find()
should return, at an unfortunate interaction of Python features,
brought on by the special-casing of negative indexes. The wart
bites novice or imperfect Python programmers in simple cases
such as string.find(), or when their subscripts accidentally
fall off the low end. It bites programmers who want to fully
implement Python slicing, because of the double-and-
contradictory- interpretation of -1, as both an exclusive ending
bound and the index of the last element. It bites documentation
authors who naturally think of the non-negative subscript as
*the* index of a sequence item.

Sure. I wrote two days ago:
We might agree, before further discussion, that this isn't the most
elegant part of Python's design, and it's down to history that this tiny
little wart remains.

While I agree it's a trap for the unwary I still don't regard it as a
major wart. But I'm all in favor of discussions to make 3.0 a better
language.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Aug 28 '05 #55

Magnus Lycka

Robert Kern wrote:

If I may digress for a bit, my advisor is currently working on a project
that is processing seafloor depth datasets starting from a few decades
ago. A lot of this data was orginally to be processed using FORTRAN
software, so in the idiom of much FORTRAN software from those days, 9999
is often used to mark missing data. Unfortunately, 9999 is a perfectly
valid datum in most of the unit systems used by the various datasets.

Now he has to find a grad student to traul through the datasets and
clean up the really invalid 9999's (as well as other such fun tasks like
deciding if a dataset that says it's using feet is actually using meters).

I'm afraid this didn't end with FORTRAN. It's not that long ago
that I wrote a program for my wife that combined a data editor
with a graph display, so that she could clean up time lines with
length and weight data for children (from an international research
project performed during the 90's). 99cm is not unreasonable as a
length, but if you see it in a graph with other length measurements,
it's easy to spot most of the false ones, just as mistyped year part
in a date (common in the beginning of a new year).

Perhaps graphics can help this grad student too? It's certainly much
easier to spot deviations in curves than in an endless line of
numbers if the curves would normally be reasonably smooth.

Aug 29 '05 #56

Antoon Pardon

Op 2005-08-27, Terry Reedy schreef <tj*****@udel.edu>:

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...
"Terry Reedy" <tj*****@udel.edu> writes:
The try/except pattern is a pretty basic part of Python's design. One
could say the same about clutter for *every* function or method that
raises
an exception on invalid input. Should more or even all be duplicated?
Why
just this one?

Someone must have thought str.find was worth having, or else it
wouldn't be in the library.

Well, Guido no longer thinks it worth having and emphatically agreed that
it should be added to one of the 'To be removed' sections of PEP 3000.

I think a properly implented find is better than an index.

If we only have index, Then asking for permission is no longer a
possibility. If we have a find that returns None, we can either
ask permission before we index or be forgiven by the exception
that is raised.

--
Antoon Pardon

Aug 29 '05 #57

Antoon Pardon

Op 2005-08-27, Steve Holden schreef <st***@holdenweb.com>:

If you want an exception from your code when 'w' isn't in the string you
should consider using index() rather than find.

Sometimes it is convenient to have the exception thrown at a later
time.
Otherwise, whatever find() returns you will have to have an "if" in
there to handle the not-found case.
And maybe the more convenient place for this "if" is in a whole different
part of your program, a part where using -1 as an invalid index isn't
at all obvious.
This just sounds like whining to me. If you want to catch errors, use a
function that will raise an exception rather than relying on the
invalidity of the result.

You always seem to look at such things in a very narrow scope. You never
seem to consider that various parts of a program have to work together.

So what happens if you have a module that is collecting string-index
pair, colleted from various other parts. In one part you
want to select the last letter, so you pythonically choose -1 as
index. In an other part you get a result of find and are happy
with -1 as an indictation for an invalid index. Then these
data meet.

--
Antoon Pardon

Aug 29 '05 #58

Robert Kern

Magnus Lycka wrote:

Robert Kern wrote:
If I may digress for a bit, my advisor is currently working on a project
that is processing seafloor depth datasets starting from a few decades
ago. A lot of this data was orginally to be processed using FORTRAN
software, so in the idiom of much FORTRAN software from those days, 9999
is often used to mark missing data. Unfortunately, 9999 is a perfectly
valid datum in most of the unit systems used by the various datasets.

Now he has to find a grad student to traul through the datasets and
clean up the really invalid 9999's (as well as other such fun tasks like
deciding if a dataset that says it's using feet is actually using meters).

I'm afraid this didn't end with FORTRAN. It's not that long ago
that I wrote a program for my wife that combined a data editor
with a graph display, so that she could clean up time lines with
length and weight data for children (from an international research
project performed during the 90's). 99cm is not unreasonable as a
length, but if you see it in a graph with other length measurements,
it's easy to spot most of the false ones, just as mistyped year part
in a date (common in the beginning of a new year).

Perhaps graphics can help this grad student too? It's certainly much
easier to spot deviations in curves than in an endless line of
numbers if the curves would normally be reasonably smooth.

Yes! In fact, that was the context of the discussion when my advisor
told me about this project. Another student had written an interactive
GUI for exploring bathymetry maps. My advisor: "That kind of thing would
be really great for this new project, etc. etc."

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Aug 29 '05 #59

Steven Bethard

Antoon Pardon wrote:

I think a properly implented find is better than an index.

See the current thread in python-dev[1], which proposes a new method,
str.partition(). I believe that Raymond Hettinger has shown that almost
all uses of str.find() can be more clearly be represented with his
proposed function.

STeVe

[1]http://mail.python.org/pipermail/python-dev/2005-August/055781.html

Aug 29 '05 #60

Steve Holden

Antoon Pardon wrote:

Op 2005-08-27, Steve Holden schreef <st***@holdenweb.com>:
If you want an exception from your code when 'w' isn't in the string you
should consider using index() rather than find.

Sometimes it is convenient to have the exception thrown at a later
time.

Otherwise, whatever find() returns you will have to have an "if" in
there to handle the not-found case.

And maybe the more convenient place for this "if" is in a whole different
part of your program, a part where using -1 as an invalid index isn't
at all obvious.

This just sounds like whining to me. If you want to catch errors, use a
function that will raise an exception rather than relying on the
invalidity of the result.

You always seem to look at such things in a very narrow scope. You never
seem to consider that various parts of a program have to work together.

Or perhaps it's just that I try not to mix parts inappropriately.
So what happens if you have a module that is collecting string-index
pair, colleted from various other parts. In one part you
want to select the last letter, so you pythonically choose -1 as
index. In an other part you get a result of find and are happy
with -1 as an indictation for an invalid index. Then these
data meet.

That's when debugging has to start. Mixing data of such types is
somewhat inadvisable, don't you agree?

I suppose I can't deny that people do things like that, myself included,
but mixing data sets where -1 is variously an error flag and a valid
index is only going to lead to trouble when the combined data is used.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Aug 29 '05 #61

Terry Reedy

"Steve Holden" <st***@holdenweb.com> wrote in message
news:de**********@sea.gmane.org...

Antoon Pardon wrote:
So what happens if you have a module that is collecting string-index
pair, colleted from various other parts. In one part you
want to select the last letter, so you pythonically choose -1 as
index. In an other part you get a result of find and are happy
with -1 as an indictation for an invalid index. Then these
data meet.

That's when debugging has to start. Mixing data of such types is
somewhat inadvisable, don't you agree?

I suppose I can't deny that people do things like that, myself included,
but mixing data sets where -1 is variously an error flag and a valid
index is only going to lead to trouble when the combined data is used.

The fact that the -1 return *has* lead to bugs in actual code is the
primary reason Guido has currently decided that find and rfind should go.
A careful review of current usages in the standard library revealed at
least a couple bugs even there.

Terry J. Reedy

Aug 30 '05 #62

Paul Rubin

"Terry Reedy" <tj*****@udel.edu> writes:

The fact that the -1 return *has* lead to bugs in actual code is the
primary reason Guido has currently decided that find and rfind should go.
A careful review of current usages in the standard library revealed at
least a couple bugs even there.

Really it's x[-1]'s behavior that should go, not find/rfind.

Will socket.connect_ex also go? How about dict.get? Being able to
return some reasonable value for "failure" is a good thing, if failure
is expected. Exceptions are for unexpected, i.e., exceptional failures.

Aug 30 '05 #63

Antoon Pardon

Op 2005-08-29, Steve Holden schreef <st***@holdenweb.com>:

Antoon Pardon wrote:
Op 2005-08-27, Steve Holden schreef <st***@holdenweb.com>:

If you want an exception from your code when 'w' isn't in the string you
should consider using index() rather than find.

Sometimes it is convenient to have the exception thrown at a later
time.

Otherwise, whatever find() returns you will have to have an "if" in
there to handle the not-found case.

And maybe the more convenient place for this "if" is in a whole different
part of your program, a part where using -1 as an invalid index isn't
at all obvious.

This just sounds like whining to me. If you want to catch errors, use a
function that will raise an exception rather than relying on the
invalidity of the result.

You always seem to look at such things in a very narrow scope. You never
seem to consider that various parts of a program have to work together.

Or perhaps it's just that I try not to mix parts inappropriately.

I didn't know it was inappropriately to mix certain parts. Can you
give a list of modules in the standard list I shouldn't mix.

So what happens if you have a module that is collecting string-index
pair, colleted from various other parts. In one part you
want to select the last letter, so you pythonically choose -1 as
index. In an other part you get a result of find and are happy
with -1 as an indictation for an invalid index. Then these
data meet.

That's when debugging has to start. Mixing data of such types is
somewhat inadvisable, don't you agree?

The type of both data is the same, it is a string-index pair in
both cases. The problem is that a module from the standard lib
uses a certain value to indicate an illegal index, that has
a very legal value in python in general.
I suppose I can't deny that people do things like that, myself included,
It is not about what people do. If this was about someone implementing
find himself and using -1 as an illegal index, I would certainly agree
that it was inadvisable to do so. Yet when this is what python with
its libary offers the programmer, you seem reluctant find fault with
it.
but mixing data sets where -1 is variously an error flag and a valid
index is only going to lead to trouble when the combined data is used.

Yet this is what python does. Using -1 variously as an error flag and
a valid index and when people complain about that, you say it sounds like
whining.

--
Antoon Pardon

Aug 30 '05 #64

Bryan Olson

Steve Holden wrote:

I'm all in favor of discussions to make 3.0 a better
language.

This one should definitely be two-phase. First, the non-code-
breaking change that replaces-and-deprecates the warty handling
of negative indexes, and later the removal of the old style. For
the former, there's no need to wait for a X.0 release; for the
latter, 3.0 may be too early.

The draft PEP went to the PEP editors a couple days ago. Haven't
heard back yet.
--
--Bryan

Aug 30 '05 #65

Antoon Pardon

Op 2005-08-29, Steven Bethard schreef <st************@gmail.com>:

Antoon Pardon wrote:
I think a properly implented find is better than an index.

See the current thread in python-dev[1], which proposes a new method,
str.partition(). I believe that Raymond Hettinger has shown that almost
all uses of str.find() can be more clearly be represented with his
proposed function.

Do we really need this? As far as I understand most of this
functionality is already provided by str.split and str.rsplit

I think adding an optional third parameter 'full=False' to these
methods, would be all that is usefull here. If full was set
to True, split and rsplit would enforce that a list with
maxsplit + 1 elements was returned, filling up the list with
None's if necessary.
head, sep, tail = str.partion(sep)

would then almost be equivallent to

head, tail = str.find(sep, 1, True)
Code like the following:

head, found, tail = result.partition(' ')
if not found:
break
result = head + tail
Could be replaced by:

head, tail = result.split(' ', 1, full = True)
if tail is None
break
result = head + tail
I also think that code like this:

while tail:
head, _, tail = tail.partition('.')
mname = "%s.%s" % (m.__name__, head)
m = self.import_it(head, mname, m)
...
Would probably better be written as follows:

for head in tail.split('.'):
mname = "%s.%s" % (m.__name__, head)
m = self.import_it(head, mname, m)
...
Unless I'm missing something.
--
Antoon Pardon
[1]http://mail.python.org/pipermail/python-dev/2005-August/055781.html

Aug 30 '05 #66

Terry Reedy

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...

Really it's x[-1]'s behavior that should go, not find/rfind.
I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
useful, especially when 'x' is an expression instead of a name. But even
if -1 were not a legal subscript, I would still consider it a design error
for Python to mistype a non-numeric singleton indicator as an int. Such
mistyping is only necessary in a language like C that requires all return
values to be of the same type, even when the 'return value' is not really a
return value but an error signal. Python does not have that typing
restriction and should not act as if it does by copying C.
Will socket.connect_ex also go?
Not familiar with it.
How about dict.get?

A default value is not necessarily an error indicator. One can regard a
dict that is 'getted' as an infinite dict matching all keys with the
default except for a finite subset of keys, as recorded in the dict.

If the default is to be regarded a 'Nothing to return' indicator, then that
indicator *must not* be in the dict. A recommended idiom is to then create
a new, custom subset of object which *cannot* be a value in the dict.
Return values can they safely be compared with that indicator by using the
'is' operator.

In either case, .get is significantly different from .find.

Terry J. Reedy

Aug 30 '05 #67

Paul Rubin

"Terry Reedy" <tj*****@udel.edu> writes:

Really it's x[-1]'s behavior that should go, not find/rfind.
I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
useful, especially when 'x' is an expression instead of a name.

There are other abbreviations possible, for example the one in the
proposed PEP at the beginning of this thread.
But even
if -1 were not a legal subscript, I would still consider it a design error
for Python to mistype a non-numeric singleton indicator as an int.

OK, .find should return None if the string is not found.

Aug 30 '05 #68

Bryan Olson

Terry Reedy wrote:

"Paul Rubin" wrote:
Really it's x[-1]'s behavior that should go, not find/rfind.
I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is

extremely useful, especially when 'x' is an expression instead of a name.

Hear us out; your disagreement might not be so complete as you
think. From-the-far-end indexing is too useful a feature to
trash. If you look back several posts, you'll see that the
suggestion here is that the index expression should explicitly
call for it, rather than treat negative integers as a special
case.

I wrote up and sent off my proposal, and once the PEP-Editors
respond, I'll be pitching it on the python-dev list. Below is
the version I sent (not yet a listed PEP).
--
--Bryan
PEP: -1
Title: Improved from-the-end indexing and slicing
Version: $Revision: 1.00 $
Last-Modified: $Date: 2005/08/26 00:00:00 $
Author: Bryan G. Olson <br*********@acm.org>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 26 Aug 2005
Post-History:
Abstract

To index or slice a sequence from the far end, we propose
using a symbol, '$', to stand for the length, instead of
Python's current special-case interpretation of negative
subscripts. Where Python currently uses:

sequence[-i]

We propose:

sequence[$ - i]

Python's treatment of negative indexes as offsets from the
high end of a sequence causes minor obvious problems and
major subtle ones. This PEP proposes a consistent meaning
for indexes, yet still supports from-the-far-end
indexing. Use of new syntax avoids breaking existing code.
Specification

We propose a new style of slicing and indexing for Python
sequences. Instead of:

sequence[start : stop : step]

new-style slicing uses the syntax:

sequence[start ; stop ; step]

It works like current slicing, except that negative start or
stop values do not trigger from-the-high-end interpretation.
Omissions and 'None' work the same as in old-style slicing.

Within the square-brackets, the '$' symbol stands for the
length of the sequence. One can index from the high end by
subtracting the index from '$'. Instead of:

seq[3 : -4]

we write:

seq[3 ; $ - 4]

When square-brackets appear within other square-brackets,
the inner-most bracket-pair determines which sequence '$'
describes. The length of the next-outer sequence is denoted
by '$1', and the next-out after than by '$2', and so on. The
symbol '$0' behaves identically to '$'. Resolution of $x is
syntactic; a callable object invoked within square brackets
cannot use the symbol to examine the context of the call.

The '$' notation also works in simple (non-slice) indexing.
Instead of:

seq[-2]

we write:

seq[$ - 2]

If we did not care about backward compatibility, new-style
slicing would define seq[-2] to be out-of-bounds. Of course
we do care about backward compatibility, and rejecting
negative indexes would break way too much code. For now,
simple indexing with a negative subscript (and no '$') must
continue to index from the high end, as a deprecated
feature. The presence of '$' always indicates new-style
indexing, so a programmer who needs a negative index to
trigger a range error can write:

seq[($ - $) + index]
Motivation

From-the-far-end indexing is such a useful feature that we
cannot reasonably propose its removal; nevertheless Python's
current method, which is to treat a range of negative
indexes as special cases, is warty. The wart bites novice or
imperfect Pythoners by not raising an exceptions when they
need to know about a bug. For example, the following code
prints 'y' with no sign of error:

s = 'buggy'
print s[s.find('w')]

The wart becomes an even bigger problem with more
sophisticated use of Python sequences. What is the 'stop'
value for a slice when the step is negative and the slice
includes the zero index? An instance of Python's slice type
will report that the stop value is -1, but if we use this
stop value to slice, it gets misinterpreted as the last
index in the sequence. Here's an example:

class BuggerAll:

def __init__(self, somelist):
self.sequence = somelist[:]

def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]
print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]

The above prints:

[9, 7, 5, 3, 1]
[]

Un-commenting the print statement in __getitem__ shows:

Slice says start, stop, step are: 9 -1 -2

The slice object seems to think that -1 is a valid exclusive
bound, but when using it to actually slice, Python
interprets the negative number as an offset from the high
end of the sequence.

Steven Bethard offered the simpler example:

py> range(10)[slice(None, None, -2)]
[9, 7, 5, 3, 1]
py> slice(None, None, -2).indices(10)
(9, -1, -2)
py> range(10)[9:-1:-2]
[]

The double-meaning of -1, as both an exclusive stopping
bound and an alias for the highest valid index, is just
plain whacked. So what should the slice object return? With
Python's current indexing/slicing, there is no value that
just works. 'None' will work as a stop value in a slice, but
index arithmetic will fail. The value 0 - (len(sequence) +
1) will work as a stop value, and slice arithmetic and
range() will happily use it, but the result is not what the
programmer probably intended.

The problem is subtle. A Python sequence starts at index
zero. There is some appeal to giving negative indexes a
useful interpretation, on the theory that they were invalid
as subscripts and thus useless otherwise. That theory is
wrong, because negative indexes were already useful, even
though not legal subscripts, and the reinterpretation often
breaks their exiting use. Specifically, negative indexes are
useful in index arithmetic, and as exclusive stopping
bounds.

The problem is fixable. We propose that negative indexes not
be treated as a special case. To index from the far end of a
sequence, we use a syntax that explicitly calls for far-end
indexing.
Rationale

New-style slicing/indexing is designed to fix the problems
described above, yet live happily in Python along-side the
old style. The new syntax leaves the meaning of existing
code unchanged, and is even more Pythonic than current
Python.

Semicolons look a lot like colons, so the new semicolon
syntax follows the rule that things that are similar should
look similar. The semicolon syntax is currently illegal, so
its addition will not break existing code. Python is
historically tied to C, and the semicolon syntax is
evocative of the similar start-stop-step expressions of C's
'for' loop. JPython is tied to Java, which uses a similar
'for' loop syntax.

The '$' character currently has no place in a Python index,
so its new interpretation will not break existing code. We
chose it over other unused symbols because the usage roughly
corresponds to its meaning in the Python library's regular
expression module.

We expect use of the $0, $1, $2 ... syntax to be rare;
nevertheless, it has a Pythonic consistency. Thanks to Paul
Rubin for advocating it over the inferior multiple-$ syntax
that this author initially proposed.
Backwards Compatibility

To avoid braking code, we use new syntax that is currently
illegal. The new syntax more-or-less looks like current
Python, which may help Python programmers adjust.

User-defined classes that implement the sequence protocol
are likely to work, unchanged, with new-style slicing.
'Likely' is not certain; we've found one subtle issue (and
there may be others):

Currently, user-defined classes can implement Python
subscripting and slicing without implementing Python's len()
function. In our proposal, the '$' symbol stands for the
sequence's length, so classes must be able to report their
length in order for $ to work within their slices and
indexes.

Specifically, to support new-style slicing, a class that
accepts index or slice arguments to any of:

__getitem__
__setitem__
__delitem__
__getslice__
__setslice__
__delslice__

must also consistently implement:

__len__

Sane programmers already follow this rule.

Copyright:

This document has been placed in the public domain.

Aug 30 '05 #69

Paul Rubin

Bryan Olson <fa*********@nowhere.org> writes:

Specifically, to support new-style slicing, a class that
accepts index or slice arguments to any of:

__getitem__
__setitem__
__delitem__
__getslice__
__setslice__
__delslice__

must also consistently implement:

__len__

Sane programmers already follow this rule.

It should be ok to use new-style slicing without implementing __len__
as long as you don't use $ in any slices. Using $ in a slice without
__len__ would throw a runtime error. I expect using negative
subscripts in old-style slices on objects with no __len__ also throws
an error.

Not every sequence needs __len__; for example, infinite sequences, or
sequences that implement slicing and subscripts by doing lazy
evaluation of iterators:

digits_of_pi = memoize(generate_pi_digits()) # 3,1,4,1,5,9,2,...
print digits_of_pi[5] # computes 6 digits and prints '9'
print digits_of_pi($-5) # raises exception

Aug 30 '05 #70

Antoon Pardon

Op 2005-08-30, Terry Reedy schreef <tj*****@udel.edu>:

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...
Really it's x[-1]'s behavior that should go, not find/rfind.

I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
useful, especially when 'x' is an expression instead of a name.

I don't think the ability to easily index sequences from the right is
in dispute. Just the fact that negative numbers on their own provide
this functionality.

Because I sometimes find it usefull to have a sequence start and
end at arbitrary indexes, I have written a table class. So I
can have a table that is indexed from e.g. -4 to +6. So how am
I supposed to easily get at that last value?

--
Antoon Pardon

Aug 30 '05 #71

Robert Kern

Bryan Olson wrote:

Currently, user-defined classes can implement Python
subscripting and slicing without implementing Python's len()
function. In our proposal, the '$' symbol stands for the
sequence's length, so classes must be able to report their
length in order for $ to work within their slices and
indexes.

Specifically, to support new-style slicing, a class that
accepts index or slice arguments to any of:

__getitem__
__setitem__
__delitem__
__getslice__
__setslice__
__delslice__

must also consistently implement:

__len__

Sane programmers already follow this rule.

Incorrect. Some sane programmers have multiple dimensions they need to
index.

from Numeric import *
A = array([[0, 1], [2, 3], [4, 5]])
A[$-1, $-1]

The result of len(A) has nothing to do with the second $.

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Aug 30 '05 #72

Antoon Pardon

Op 2005-08-30, Robert Kern schreef <rk***@ucsd.edu>:

Bryan Olson wrote:
Currently, user-defined classes can implement Python
subscripting and slicing without implementing Python's len()
function. In our proposal, the '$' symbol stands for the
sequence's length, so classes must be able to report their
length in order for $ to work within their slices and
indexes.

Specifically, to support new-style slicing, a class that
accepts index or slice arguments to any of:

__getitem__
__setitem__
__delitem__
__getslice__
__setslice__
__delslice__

must also consistently implement:

__len__

Sane programmers already follow this rule.
Incorrect. Some sane programmers have multiple dimensions they need to
index.

I don't see how that contradicts Bryan's statement.
from Numeric import *
A = array([[0, 1], [2, 3], [4, 5]])
A[$-1, $-1]

The result of len(A) has nothing to do with the second $.

But that is irrelevant to the fact wether or not sane
programmes follow Bryan's stated rule. That the second
$ has nothing to do with len(A), doesn't contradict
__len__ has to be implemented nor that sane programers
already do.

--
Antoon Pardon

Aug 30 '05 #73

Bryan Olson

Robert Kern wrote:

Bryan Olson wrote:

Currently, user-defined classes can implement Python
subscripting and slicing without implementing Python's len()
function. In our proposal, the '$' symbol stands for the
sequence's length, so classes must be able to report their
length in order for $ to work within their slices and
indexes.

Specifically, to support new-style slicing, a class that
accepts index or slice arguments to any of:

__getitem__
__setitem__
__delitem__
__getslice__
__setslice__
__delslice__

must also consistently implement:

__len__

Sane programmers already follow this rule.

Incorrect. Some sane programmers have multiple dimensions they need to
index.

from Numeric import *
A = array([[0, 1], [2, 3], [4, 5]])
A[$-1, $-1]

The result of len(A) has nothing to do with the second $.

I think you have a good observation there, but I'll stand by my
correctness.

My initial post considered re-interpreting tuple arguments, but
I abandoned that alternative after Steven Bethard pointed out
how much code it would break. Modules/classes would remain free
to interpret tuple arguments in any way they wish. I don't think
my proposal breaks any sane existing code.

Going forward, I would advocate that user classes which
implement their own kind of subscripting adopt the '$' syntax,
and interpret it as consistently as possible. For example, they
could respond to __len__() by returning a type that supports the
"Emulating numeric types" methods from the Python Language
Reference 3.3.7, and also allows the class's methods to tell
that it stands for the length of the dimension in question.
--
--Bryan

Aug 30 '05 #74

Robert Kern

Antoon Pardon wrote:

Op 2005-08-30, Robert Kern schreef <rk***@ucsd.edu>:
Bryan Olson wrote:
Currently, user-defined classes can implement Python
subscripting and slicing without implementing Python's len()
function. In our proposal, the '$' symbol stands for the
sequence's length, so classes must be able to report their
length in order for $ to work within their slices and
indexes.

Specifically, to support new-style slicing, a class that
accepts index or slice arguments to any of:

__getitem__
__setitem__
__delitem__
__getslice__
__setslice__
__delslice__

must also consistently implement:

__len__

Sane programmers already follow this rule.

Incorrect. Some sane programmers have multiple dimensions they need to
index.

I don't see how that contradicts Bryan's statement.
from Numeric import *
A = array([[0, 1], [2, 3], [4, 5]])
A[$-1, $-1]

The result of len(A) has nothing to do with the second $.

But that is irrelevant to the fact wether or not sane
programmes follow Bryan's stated rule. That the second
$ has nothing to do with len(A), doesn't contradict
__len__ has to be implemented nor that sane programers
already do.

Except that the *consistent* implementation is supposed to support the
interpretation of $. It clearly can't for multiple dimensions.

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Aug 30 '05 #75

Robert Kern

Bryan Olson wrote:

Robert Kern wrote:
> from Numeric import *
> A = array([[0, 1], [2, 3], [4, 5]])
> A[$-1, $-1]
>
> The result of len(A) has nothing to do with the second $.

I think you have a good observation there, but I'll stand by my
correctness.

len() cannot be used to determine the value of $ in the context of
multiple dimensions.
My initial post considered re-interpreting tuple arguments, but
I abandoned that alternative after Steven Bethard pointed out
how much code it would break. Modules/classes would remain free
to interpret tuple arguments in any way they wish. I don't think
my proposal breaks any sane existing code.
What it does do is provide a second way to do indexing from the end that
can't be extended to multiple dimensions.
Going forward, I would advocate that user classes which
implement their own kind of subscripting adopt the '$' syntax,
and interpret it as consistently as possible.
How? You haven't proposed how an object gets the information that
$-syntax is being used. You've proposed a syntax and some semantics; you
also need to flesh out the pragmatics.
For example, they
could respond to __len__() by returning a type that supports the
"Emulating numeric types" methods from the Python Language
Reference 3.3.7, and also allows the class's methods to tell
that it stands for the length of the dimension in question.

I have serious doubts about __len__() returning anything but a bona-fide
integer. We shouldn't need to use incredible hacks like that to support
a core language feature.

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Aug 30 '05 #76

phil hunt

On Tue, 30 Aug 2005 08:53:27 GMT, Bryan Olson <fa*********@nowhere.org> wrote:

Specifically, to support new-style slicing, a class that
accepts index or slice arguments to any of:

__getitem__
__setitem__
__delitem__
__getslice__
__setslice__
__delslice__

must also consistently implement:

__len__

Sane programmers already follow this rule.

Wouldn't it be more sensible to have an abstract IndexedCollection
superclass, which imlements all the slicing stuff, then when someone
writes their own collection class they just have to implement
__len__ and __getitem__ and slicing works automatically?
--
Email: zen19725 at zen dot co dot uk

Aug 30 '05 #77

Steve Holden

Antoon Pardon wrote:

Op 2005-08-29, Steve Holden schreef <st***@holdenweb.com>:
Antoon Pardon wrote:
Op 2005-08-27, Steve Holden schreef <st***@holdenweb.com>:
If you want an exception from your code when 'w' isn't in the string you
should consider using index() rather than find.
Sometimes it is convenient to have the exception thrown at a later
time.

Otherwise, whatever find() returns you will have to have an "if" in
there to handle the not-found case.
And maybe the more convenient place for this "if" is in a whole different
part of your program, a part where using -1 as an invalid index isn't
at all obvious.

This just sounds like whining to me. If you want to catch errors, use a
function that will raise an exception rather than relying on the
invalidity of the result.
You always seem to look at such things in a very narrow scope. You never
seem to consider that various parts of a program have to work together.

Or perhaps it's just that I try not to mix parts inappropriately.

I didn't know it was inappropriately to mix certain parts. Can you
give a list of modules in the standard list I shouldn't mix.

So what happens if you have a module that is collecting string-index
pair, colleted from various other parts. In one part you
want to select the last letter, so you pythonically choose -1 as
index. In an other part you get a result of find and are happy
with -1 as an indictation for an invalid index. Then these
data meet.

That's when debugging has to start. Mixing data of such types is
somewhat inadvisable, don't you agree?

The type of both data is the same, it is a string-index pair in
both cases. The problem is that a module from the standard lib
uses a certain value to indicate an illegal index, that has
a very legal value in python in general.

Since you are clearly feeling pedantic enough to beat this one to death
with a 2 x 4 please let me substitute "usages" for "types".

In the case of a find() result -1 *isn't* a string index, it's a failure
flag. Which is precisely why it should be filtered out of any set of
indexes. once it's been inserted it can no longer be distinguished as a
failure indication.

I suppose I can't deny that people do things like that, myself included,

It is not about what people do. If this was about someone implementing
find himself and using -1 as an illegal index, I would certainly agree
that it was inadvisable to do so. Yet when this is what python with
its libary offers the programmer, you seem reluctant find fault with
it.

I've already admitted that the choice of -1 as a return value wasn't
smart. However you appear to be saying that it's sensible to mix return
values from find() with general-case index values. I'm saying that you
should do so only with caution. The fact that the naiive user will often
not have the wisdom to apply such caution is what makes a change desirable.

but mixing data sets where -1 is variously an error flag and a valid
index is only going to lead to trouble when the combined data is used.

Yet this is what python does. Using -1 variously as an error flag and
a valid index and when people complain about that, you say it sounds like
whining.

What I am trying to say is that this doesn't make sense: if you want to
combine find() results with general-case indexes (i.e. both positive and
negative index values) it behooves you to strip out the -1's before you
do so. Any other behaviour is asking for trouble.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Aug 30 '05 #78

Bengt Richter

On Tue, 30 Aug 2005 08:53:27 GMT, Bryan Olson <fa*********@nowhere.org> wrote:
[...]

Specification

We propose a new style of slicing and indexing for Python
sequences. Instead of:

sequence[start : stop : step]

new-style slicing uses the syntax:

sequence[start ; stop ; step]

I don't mind the semantics, but I don't like the semicolons ;-)

What about if when brackets trail as if attributes, it means
your-style slicing written with colons instead of semicolons?

sequence.[start : stop : step]

I think that would just be a tweak on the trailer syntax.
I just really dislike the semicolons ;-)

Regards,
Bengt Richter

Aug 30 '05 #79

Paul Rubin

bo**@oz.net (Bengt Richter) writes:

What about if when brackets trail as if attributes, it means
your-style slicing written with colons instead of semicolons?

sequence.[start : stop : step]

This is nice. It gets rid of the whole $1,$2,etc syntax as well.

Aug 30 '05 #80

Bengt Richter

On Tue, 30 Aug 2005 11:56:24 GMT, Bryan Olson <fa*********@nowhere.org> wrote:

Robert Kern wrote:
Bryan Olson wrote:

Currently, user-defined classes can implement Python
subscripting and slicing without implementing Python's len()
function. In our proposal, the '$' symbol stands for the
sequence's length, so classes must be able to report their
length in order for $ to work within their slices and
indexes.

Specifically, to support new-style slicing, a class that
accepts index or slice arguments to any of:

__getitem__
__setitem__
__delitem__
__getslice__
__setslice__
__delslice__

must also consistently implement:

__len__

Sane programmers already follow this rule.

Incorrect. Some sane programmers have multiple dimensions they need to
index.

from Numeric import *
A = array([[0, 1], [2, 3], [4, 5]])
A[$-1, $-1]

The result of len(A) has nothing to do with the second $.

I think you have a good observation there, but I'll stand by my
correctness.

My initial post considered re-interpreting tuple arguments, but
I abandoned that alternative after Steven Bethard pointed out
how much code it would break. Modules/classes would remain free
to interpret tuple arguments in any way they wish. I don't think
my proposal breaks any sane existing code.

Going forward, I would advocate that user classes which
implement their own kind of subscripting adopt the '$' syntax,
and interpret it as consistently as possible. For example, they
could respond to __len__() by returning a type that supports the
"Emulating numeric types" methods from the Python Language
Reference 3.3.7, and also allows the class's methods to tell
that it stands for the length of the dimension in question.

(OTTOMH ;-)
Perhaps the slice triple could be extended with a flag indicating
which of the other elements should have $ added to it, and $ would
take meaning from the subarray being indexed, not the whole. E.g.,

arr.[1:$-1, $-5:$-2]

would call arr.__getitem__((slice(1,-1,None,STOP), slice(-5,-2,None,START|STOP))

(Hypothesizing bitmask constants START and STOP)

Regards,
Bengt Richter

Aug 30 '05 #81

Bengt Richter

On 30 Aug 2005 10:07:06 GMT, Antoon Pardon <ap*****@forel.vub.ac.be> wrote:

Op 2005-08-30, Terry Reedy schreef <tj*****@udel.edu>:

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...
Really it's x[-1]'s behavior that should go, not find/rfind.

I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
useful, especially when 'x' is an expression instead of a name.

I don't think the ability to easily index sequences from the right is
in dispute. Just the fact that negative numbers on their own provide
this functionality.

Because I sometimes find it usefull to have a sequence start and
end at arbitrary indexes, I have written a table class. So I
can have a table that is indexed from e.g. -4 to +6. So how am
I supposed to easily get at that last value?

Give it a handy property? E.g.,

table.as_python_list[-1]
Regards,
Bengt Richter

Aug 30 '05 #82

Antoon Pardon

Op 2005-08-30, Steve Holden schreef <st***@holdenweb.com>:

Antoon Pardon wrote:
Op 2005-08-29, Steve Holden schreef <st***@holdenweb.com>:
Antoon Pardon wrote:

Op 2005-08-27, Steve Holden schreef <st***@holdenweb.com>:
>If you want an exception from your code when 'w' isn't in the string you
>should consider using index() rather than find.
Sometimes it is convenient to have the exception thrown at a later
time.

>Otherwise, whatever find() returns you will have to have an "if" in
>there to handle the not-found case.
And maybe the more convenient place for this "if" is in a whole different
part of your program, a part where using -1 as an invalid index isn't
at all obvious.

>This just sounds like whining to me. If you want to catch errors, use a
>function that will raise an exception rather than relying on the
>invalidity of the result.
You always seem to look at such things in a very narrow scope. You never
seem to consider that various parts of a program have to work together.
Or perhaps it's just that I try not to mix parts inappropriately.

I didn't know it was inappropriately to mix certain parts. Can you
give a list of modules in the standard list I shouldn't mix.

So what happens if you have a module that is collecting string-index
pair, colleted from various other parts. In one part you
want to select the last letter, so you pythonically choose -1 as
index. In an other part you get a result of find and are happy
with -1 as an indictation for an invalid index. Then these
data meet.
That's when debugging has to start. Mixing data of such types is
somewhat inadvisable, don't you agree?

The type of both data is the same, it is a string-index pair in
both cases. The problem is that a module from the standard lib
uses a certain value to indicate an illegal index, that has
a very legal value in python in general.

Since you are clearly feeling pedantic enough to beat this one to death
with a 2 x 4 please let me substitute "usages" for "types".

But it's not my usage but python's usage.
In the case of a find() result -1 *isn't* a string index, it's a failure
flag. Which is precisely why it should be filtered out of any set of
indexes. once it's been inserted it can no longer be distinguished as a
failure indication.
Which is precisely why it was such a bad choice in the first place.

If I need to write code like this:

var = str.find('.')
if var == -1:
var = None

each time I want to store an index for later use, then surely '-1'
shouldn't have been used here.

I suppose I can't deny that people do things like that, myself included,

It is not about what people do. If this was about someone implementing
find himself and using -1 as an illegal index, I would certainly agree
that it was inadvisable to do so. Yet when this is what python with
its libary offers the programmer, you seem reluctant find fault with
it. I've already admitted that the choice of -1 as a return value wasn't
smart. However you appear to be saying that it's sensible to mix return
values from find() with general-case index values.
I'm saying it should be possible without a problem. It is poor design
to return a legal value as an indication for an error flag.
I'm saying that you
should do so only with caution. The fact that the naiive user will often
not have the wisdom to apply such caution is what makes a change desirable.

I don't think it is naive, if you expect that no legal value will be
returned as an error flag.

but mixing data sets where -1 is variously an error flag and a valid
index is only going to lead to trouble when the combined data is used.

Yet this is what python does. Using -1 variously as an error flag and
a valid index and when people complain about that, you say it sounds like
whining.

What I am trying to say is that this doesn't make sense: if you want to
combine find() results with general-case indexes (i.e. both positive and
negative index values) it behooves you to strip out the -1's before you
do so. Any other behaviour is asking for trouble.

I would say that choosing this particular return value as an error flag
was asking for trouble. My impression is that you are putting more
blame on the programmer which fails to take corrective action, instead
of on the design of find, which makes that corrective action needed
in the first place.

--
Antoon Pardon

Aug 31 '05 #83

Antoon Pardon

Op 2005-08-30, Bengt Richter schreef <bo**@oz.net>:

On 30 Aug 2005 10:07:06 GMT, Antoon Pardon <ap*****@forel.vub.ac.be> wrote:
Op 2005-08-30, Terry Reedy schreef <tj*****@udel.edu>:

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...

Really it's x[-1]'s behavior that should go, not find/rfind.

I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
useful, especially when 'x' is an expression instead of a name.

I don't think the ability to easily index sequences from the right is
in dispute. Just the fact that negative numbers on their own provide
this functionality.

Because I sometimes find it usefull to have a sequence start and
end at arbitrary indexes, I have written a table class. So I
can have a table that is indexed from e.g. -4 to +6. So how am
I supposed to easily get at that last value?

Give it a handy property? E.g.,

table.as_python_list[-1]

Your missing the point, I probably didn't make it clear.

It is not about the possibilty of doing such a thing. It is
about python providing a frame for such things that work
in general without the need of extra properties in 'special'
cases.

--
Antoon Pardon

Aug 31 '05 #84

Bengt Richter

On 31 Aug 2005 07:26:48 GMT, Antoon Pardon <ap*****@forel.vub.ac.be> wrote:

Op 2005-08-30, Bengt Richter schreef <bo**@oz.net>:
On 30 Aug 2005 10:07:06 GMT, Antoon Pardon <ap*****@forel.vub.ac.be> wrote:
Op 2005-08-30, Terry Reedy schreef <tj*****@udel.edu>:

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...

> Really it's x[-1]'s behavior that should go, not find/rfind.

I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
useful, especially when 'x' is an expression instead of a name.

I don't think the ability to easily index sequences from the right is
in dispute. Just the fact that negative numbers on their own provide
this functionality.

Because I sometimes find it usefull to have a sequence start and
end at arbitrary indexes, I have written a table class. So I
can have a table that is indexed from e.g. -4 to +6. So how am
I supposed to easily get at that last value?

Give it a handy property? E.g.,

table.as_python_list[-1]

Your missing the point, I probably didn't make it clear.

It is not about the possibilty of doing such a thing. It is
about python providing a frame for such things that work
in general without the need of extra properties in 'special'
cases.

How about interpreting seq[i] as an abbreviation of seq[i%len(seq)] ?
That would give a consitent interpretation of seq[-1] and no errors
for any value ;-)

Regards,
Bengt Richter

Aug 31 '05 #85

Antoon Pardon

Op 2005-08-31, Bengt Richter schreef <bo**@oz.net>:

On 31 Aug 2005 07:26:48 GMT, Antoon Pardon <ap*****@forel.vub.ac.be> wrote:
Op 2005-08-30, Bengt Richter schreef <bo**@oz.net>:
On 30 Aug 2005 10:07:06 GMT, Antoon Pardon <ap*****@forel.vub.ac.be> wrote:

Op 2005-08-30, Terry Reedy schreef <tj*****@udel.edu>:
>
> "Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
> news:7x************@ruckus.brouhaha.com...
>
>> Really it's x[-1]'s behavior that should go, not find/rfind.
>
> I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
> useful, especially when 'x' is an expression instead of a name.

I don't think the ability to easily index sequences from the right is
in dispute. Just the fact that negative numbers on their own provide
this functionality.

Because I sometimes find it usefull to have a sequence start and
end at arbitrary indexes, I have written a table class. So I
can have a table that is indexed from e.g. -4 to +6. So how am
I supposed to easily get at that last value?
Give it a handy property? E.g.,

table.as_python_list[-1]

Your missing the point, I probably didn't make it clear.

It is not about the possibilty of doing such a thing. It is
about python providing a frame for such things that work
in general without the need of extra properties in 'special'
cases.

How about interpreting seq[i] as an abbreviation of seq[i%len(seq)] ?
That would give a consitent interpretation of seq[-1] and no errors
for any value ;-)

But the question was not about having a consistent interpretation for
-1, but about an easy way to get the last value.

But I like your idea. I just think there should be two differnt ways
to index. maybe use braces in one case.

seq{i} would be pure indexing, that throws exceptions if you
are out of bound

seq[i] would then be seq{i%len(seq)}

--
Antoon Pardon

Aug 31 '05 #86

Bryan Olson

Paul Rubin wrote:

Not every sequence needs __len__; for example, infinite sequences, or
sequences that implement slicing and subscripts by doing lazy
evaluation of iterators:

digits_of_pi = memoize(generate_pi_digits()) # 3,1,4,1,5,9,2,...
print digits_of_pi[5] # computes 6 digits and prints '9'
print digits_of_pi($-5) # raises exception

Good point. I like the memoize thing, so here is one:
class memoize (object):
""" Build a sequence from an iterable, evaluating as needed.
"""

def __init__(self, iterable):
self.it = iterable
self.known = []

def extend_(self, stop):
while len(self.known) < stop:
self.known.append(self.it.next())

def __getitem__(self, key):
if isinstance(key, (int, long)):
self.extend_(key + 1)
return self.known[key]
elif isinstance(key, slice):
start, stop, step = key.start, key.stop, key.step
stop = start + 1 + (stop - start - 1) // step * step
self.extend_(stop)
return self.known[start : stop : step]
else:
raise TypeError(_type_err_note), "Bad subscript type"
--
--Bryan

Aug 31 '05 #87

Kay Schluehr

Bengt Richter wrote:

How about interpreting seq[i] as an abbreviation of seq[i%len(seq)] ?
That would give a consitent interpretation of seq[-1] and no errors
for any value ;-)

Cool, indexing becomes cyclic by default ;)

But maybe it's better to define it explicitely:

seq[!i] = seq[i%len(seq)]

Well, I don't like the latter definition very much because it
introduces special syntax for __getitem__. A better solution may be the
introduction of new syntax and arithmetics for positive and negative
infinite values. Sequencing has to be adapted to handle them.

The semantics follows that creating of limits of divergent sequences:

!0 = lim n
n->infinity

That enables consistent arithmetics:

!0+k = lim n+k -> !0
n->infinity

!0/k = lim n/k -> !0 for k>0,
n->infinity -!0 for k<0
ZeroDevisionError for k==0
etc.

In Python notation:

!0 !0 !0+1 !0 !0>n # if n is int True !0/!0 Traceback (...)
....
UndefinedValue !0 - !0 Traceback (...)
....
UndefinedValue -!0 -!0 range(9)[4:!0] == range(9)[4:] True range(9)[4:-!0:-1] == range(5)

True

Life can be simpler with unbound limits.

Kay

Aug 31 '05 #88

Kay Schluehr

Bengt Richter wrote:

How about interpreting seq[i] as an abbreviation of seq[i%len(seq)] ?
That would give a consitent interpretation of seq[-1] and no errors
for any value ;-)

!0 !0 !0+1 !0 !0>n # if n is int True !0/!0 Traceback (...)
....
UndefinedValue !0 - !0 Traceback (...)
....
UndefinedValue -!0 -!0 range(9)[4:!0] == range(9)[4:] True range(9)[4:-!0:-1] == range(5)

True

Life can be simpler with unbound limits.

Kay

Aug 31 '05 #89

Ron Adam

Antoon Pardon wrote:

Op 2005-08-31, Bengt Richter schreef <bo**@oz.net>:
On 31 Aug 2005 07:26:48 GMT, Antoon Pardon <ap*****@forel.vub.ac.be> wrote:

Op 2005-08-30, Bengt Richter schreef <bo**@oz.net>:

On 30 Aug 2005 10:07:06 GMT, Antoon Pardon <ap*****@forel.vub.ac.be> wrote:
>Op 2005-08-30, Terry Reedy schreef <tj*****@udel.edu>:
>
>>"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
>>news:7x************@ruckus.brouhaha.com...
>>
>>
>>>Really it's x[-1]'s behavior that should go, not find/rfind.
>>
>>I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
>>useful, especially when 'x' is an expression instead of a name.
>
>I don't think the ability to easily index sequences from the right is
>in dispute. Just the fact that negative numbers on their own provide
>this functionality.
>
>Because I sometimes find it usefull to have a sequence start and
>end at arbitrary indexes, I have written a table class. So I
>can have a table that is indexed from e.g. -4 to +6. So how am
>I supposed to easily get at that last value?

Give it a handy property? E.g.,

table.as_python_list[-1]

Your missing the point, I probably didn't make it clear.

It is not about the possibilty of doing such a thing. It is
about python providing a frame for such things that work
in general without the need of extra properties in 'special'
cases.

How about interpreting seq[i] as an abbreviation of seq[i%len(seq)] ?
That would give a consitent interpretation of seq[-1] and no errors
for any value ;-)

But the question was not about having a consistent interpretation for
-1, but about an easy way to get the last value.

But I like your idea. I just think there should be two differnt ways
to index. maybe use braces in one case.

seq{i} would be pure indexing, that throws exceptions if you
are out of bound

seq[i] would then be seq{i%len(seq)}

The problem with negative index's are that positive index's are zero
based, but negative index's are 1 based. Which leads to a non
symmetrical situations.

Note that you can insert an item before the first item using slices. But
not after the last item without using len(list) or some value larger
than len(list).

a = list('abcde')
a[len(a):len(a)] = ['end']
a ['a', 'b', 'c', 'd', 'e', 'end']
a[-1:-1] = ['last']
a ['a', 'b', 'c', 'd', 'e', 'last', 'end'] # Second to last.
a[100:100] = ['final']
a

['a', 'b', 'c', 'd', 'e', 'last', 'end', 'final']
Cheers,
Ron

Aug 31 '05 #90

Bengt Richter

On 31 Aug 2005 07:13:26 -0700, "Kay Schluehr" <ka**********@gmx.net> wrote:

Bengt Richter wrote:
How about interpreting seq[i] as an abbreviation of seq[i%len(seq)] ?
That would give a consitent interpretation of seq[-1] and no errors
for any value ;-)
Cool, indexing becomes cyclic by default ;)

But maybe it's better to define it explicitely:

seq[!i] = seq[i%len(seq)]

Well, I don't like the latter definition very much because it
introduces special syntax for __getitem__. A better solution may be the
introduction of new syntax and arithmetics for positive and negative
infinite values. Sequencing has to be adapted to handle them.

The semantics follows that creating of limits of divergent sequences:

!0 = lim n
n->infinity

That enables consistent arithmetics:

!0+k = lim n+k -> !0
n->infinity

!0/k = lim n/k -> !0 for k>0,
n->infinity -!0 for k<0
ZeroDevisionError for k==0
etc.

In Python notation:
!0!0 !0+1!0 !0>n # if n is intTrue !0/!0Traceback (...)
...
UndefinedValue !0 - !0Traceback (...)
...
UndefinedValue -!0-!0 range(9)[4:!0] == range(9)[4:]True range(9)[4:-!0:-1] == range(5)True

Interesting, but wouldn't that last line be
range(9)[4:-!0:-1] == range(5)[::-1]

Life can be simpler with unbound limits.

Hm, is "!0" a di-graph symbol for infinity?
What if we get full unicode on our screens? Should
it be rendered with unichr(0x221e) ? And how should
symbols be keyed in? Is there a standard mnemonic
way of using an ascii keyboard, something like typing
Japanese hiragana in some word processing programs?

I'm not sure about '!' since it already has some semantic
ties to negation and factorial and execution (not to mention
exclamation ;-) If !0 means infinity, what does !2 mean?

Just rambling ... ;-)

Regards,
Bengt Richter

Aug 31 '05 #91

Kay Schluehr

Bengt Richter wrote:

> range(9)[4:-!0:-1] == range(5)True

Interesting, but wouldn't that last line be
>>> range(9)[4:-!0:-1] == range(5)[::-1]
Ups. Yes of course.

Life can be simpler with unbound limits.

Hm, is "!0" a di-graph symbol for infinity?
What if we get full unicode on our screens? Should
it be rendered with unichr(0x221e) ? And how should
symbols be keyed in? Is there a standard mnemonic
way of using an ascii keyboard, something like typing
Japanese hiragana in some word processing programs?

You can ask questions ;-)
I'm not sure about '!' since it already has some semantic
ties to negation and factorial and execution (not to mention
exclamation ;-) If !0 means infinity, what does !2 mean?

Just rambling ... ;-)

I'm not shure too. Probably Inf as a keyword is a much better choice.
The only std-library module I found that used Inf was Decimal where Inf
has the same meaning. Inf is quick to write ( just one more character
than !0 ) and easy to parse for human readers. Rewriting the above
statements/expressions leads to:
Inf Inf Inf+1 Inf Inf>n # if n is int True Inf/Inf Traceback (...)
....
UndefinedValue Inf - Inf Traceback (...)
....
UndefinedValue -Inf -Inf range(9)[4:Inf] == range(9)[4:] True range(9)[4:-Inf:-1] == range(5)[::-1]

True

IMO it's still consice.

Kay

Aug 31 '05 #92

Stefan Rank

> [snipped alot from others about indexing, slicing problems,

and the inadequacy of -1 as Not Found indicator]
on 31.08.2005 16:16 Ron Adam said the following: The problem with negative index's are that positive index's are zero
based, but negative index's are 1 based. Which leads to a non
symmetrical situations.

Hear, hear.

This is, for me, the root of the problem.

But changing the whole of Python to the (more natural and consistent)
one-based indexing style, for indexing from left and right, is...
difficult.

Sep 1 '05 #93

Fredrik Lundh

Ron Adam wrote:

The problem with negative index's are that positive index's are zero
based, but negative index's are 1 based. Which leads to a non
symmetrical situations.

indices point to the "gap" between items, not to the items themselves.

positive indices start from the left end, negative indices from the righept end.

straight indexing returns the item just to the right of the given gap (this is
what gives you the perceived assymmetry), slices return all items between
the given gaps.

</F>

Sep 1 '05 #94

Terry Reedy

"Fredrik Lundh" <fr*****@pythonware.com> wrote in message
news:df**********@sea.gmane.org...

[slice] indices point to the "gap" between items, not to the items
themselves.

positive indices start from the left end, negative indices from the
righept end.

straight indexing returns the item just to the right of the given gap
(this is
what gives you the perceived assymmetry), slices return all items between
the given gaps.

Well said. In some languages, straight indexing returns the item to the
left instead. The different between items and gaps in seen in old
terminals and older screens versus modern gui screens. Then, a cursur sat
on top of or under a character space. Now, a cursur sits between chars.

Terry J. Reedy

Sep 1 '05 #95

Terry Reedy

"Stefan Rank" <st*********@ofai.at> wrote in message
news:43**************@ofai.at...

on 31.08.2005 16:16 Ron Adam said the following:
The problem with negative index's are that positive index's are zero
based, but negative index's are 1 based. Which leads to a non
symmetrical situations.
Hear, hear.

This is, for me, the root of the problem.

The root of the problem is the misunderstanding of slice indexes and the
symmetry-breaking desire to denote an interval of length 1 by 1 number
instead of 2. Someday, I may look at the tutorial to see if I can suggest
improvements. In the meanwhile, see Fredrik's reply and my supplement
thereto and the additional explanation below.
But changing the whole of Python to the (more natural and consistent)
one-based indexing style, for indexing from left and right, is...
difficult.

Consider a mathematical axis

|_|_|_|_|...
0 1 2 3 4

The numbers represent points separating unit intervals and representing the
total count of intervals from the left. Count 'up' to the right is
standard practice. Intervals of length n are denoted by 2 numbers, a:b,
where b-a = n.

Now consider the 'tick marks' to be gui cursor positions. Characters go in
the spaces *between* the cursor. (Fixed versus variable space
representations are irrelevant here.) More generally, one can put 'items'
or 'item indicators' in the spaces to form general sequences rather than
just char strings.

It seems convenient to indicate a single char or item with a single number
instead of two. We could use the average coordinate, n.5. But that is a
nuisance, and the one number representation is about convenience, so we
round down or up, depending on the language. Each choice has pluses and
minuses; Python rounds down.

The axis above and Python iterables are potentially unbounded. But actual
strings and sequences are finite and have a right end also. Python then
gives the option of counting 'down' from that right end and makes the count
negative, as is standard. (But it does not make the string/sequence
circular).

One can devise slight different sequence models, but the above is the one
used by Python. It is consistent and not buggy once understood. I hope
this clears some of the confusion seen in this thread.

Terry J. Reedy

Sep 1 '05 #96

Ron Adam

Fredrik Lundh wrote:

Ron Adam wrote:

The problem with negative index's are that positive index's are zero
based, but negative index's are 1 based. Which leads to a non
symmetrical situations.

indices point to the "gap" between items, not to the items themselves.

So how do I express a -0? Which should point to the gap after the last
item.

straight indexing returns the item just to the right of the given gap (this is
what gives you the perceived assymmetry), slices return all items between
the given gaps.

If this were symmetrical, then positive index's would return the value
to the right and negative index's would return the value to the left.

Have you looked at negative steps? They also are not symmetrical.

All of the following get the center 'd' from the string.

a = 'abcdefg'
print a[3] # d 4 gaps from beginning
print a[-4] # d 5 gaps from end
print a[3:4] # d
print a[-4:-3] # d
print a[-4:4] # d
print a[3:-3] # d
print a[3:2:-1] # d These are symetric?!
print a[-4:-5:-1] # d
print a[3:-5:-1] # d
print a[-4:2:-1] # d

This is why it confuses so many people. It's a shame too, because slice
objects could be so much more useful for indirectly accessing list
ranges. But I think this needs to be fixed first.

Cheers,
Ron

Sep 2 '05 #97

Terry Reedy

"Ron Adam" <rr*@ronadam.com> wrote in message
news:SX***************@tornado.tampabay.rr.com...

Fredrik Lundh wrote:
Ron Adam wrote:
The problem with negative index's are that positive index's are zero
based, but negative index's are 1 based. Which leads to a non
symmetrical situations.
indices point to the "gap" between items, not to the items themselves.

So how do I express a -0?

You just did ;-) but I probably do not know what you mean.
Which should point to the gap after the last item.
The slice index of the gap after the last item is len(seq).

straight indexing returns the item just to the right of the given gap
(this is
what gives you the perceived assymmetry), slices return all items
between
the given gaps.

If this were symmetrical, then positive index's would return the value
to the right and negative index's would return the value to the left.

As I posted before (but perhaps it arrived after you sent this), one number
indexing rounds down, introducing a slight asymmetry.
Have you looked at negative steps? They also are not symmetrical.
???
All of the following get the center 'd' from the string.

a = 'abcdefg'
print a[3] # d 4 gaps from beginning
print a[-4] # d 5 gaps from end
It is 3 and 4 gaps *from* the left and right end to the left side of the
'd'. You can also see the asymmetry as coming from rounding 3.5 and -3.5
down to 3 and down to -4.
print a[3:4] # d
print a[-4:-3] # d
These are is symmetric, as we claimed.
print a[-4:4] # d
Here you count down past and up past the d.
print a[3:-3] # d
Here you count up to and down to the d. The count is one more when you
cross the d than when you do not. You do different actions, you get
different counts. I would not recommend mixing up and down counting to a
beginner, and not down and up counting to anyone who did not absolutely
have to.
print a[3:2:-1] # d These are symetric?!
print a[-4:-5:-1] # d
print a[3:-5:-1] # d
print a[-4:2:-1] # d

The pattern seems to be: left-gap-index : farther-to-left-index : -1 is
somehow equivalent to left:right, but I never paid much attention to
strides and don't know the full rule.

Stride slices are really a different subject from two-gap slicing. They
were introduced in the early years of Python specificly and only for
Numerical Python. The rules were those needed specificly for Numerical
Python arrays. They was made valid for general sequence use only a few
years ago. I would say that they are only for careful mid-level to expert
use by those who actually need them for their code.

Terry J. Reedy

Sep 2 '05 #98

Paul Rubin

Ron Adam <rr*@ronadam.com> writes:

All of the following get the center 'd' from the string.

a = 'abcdefg'
print a[3] # d 4 gaps from beginning
print a[-4] # d 5 gaps from end
print a[3:4] # d
print a[-4:-3] # d
print a[-4:4] # d
print a[3:-3] # d
print a[3:2:-1] # d These are symetric?!
print a[-4:-5:-1] # d
print a[3:-5:-1] # d
print a[-4:2:-1] # d

+1 QOTW

Sep 2 '05 #99

Fredrik Lundh

Ron Adam wrote:

indices point to the "gap" between items, not to the items themselves.
So how do I express a -0? Which should point to the gap after the last
item.

that item doesn't exist when you're doing plain indexing, so being able
to express -0 would be pointless.

when you're doing slicing, you express it by leaving the value out, or by
using len(seq) or (in recent versions) None.

straight indexing returns the item just to the right of the given gap (this is
what gives you the perceived assymmetry), slices return all items between
the given gaps.

If this were symmetrical, then positive index's would return the value
to the right and negative index's would return the value to the left.

the gap addressing is symmetrical, but indexing always picks the item to
the right.
Have you looked at negative steps? They also are not symmetrical. print a[3:2:-1] # d These are symetric?!
the gap addressing works as before, but to understand exactly what characters
you'll get, you have to realize that the slice is really a gap index generator. when
you use step=1, you can view slice as a "cut here and cut there, and return what's
in between". for other step sizes, you have to think in gap indexes (for which the
plain indexing rules apply).

and if you know range(), you already know how the indexes are generated for
various step sizes.

from the range documentation:

... returns a list of plain integers [start, start + step, start + 2 * step, ...].
If step is positive, the last element is the largest start + i * step less than
stop; if step is negative, the last element is the largest start + i * step
greater than stop.

or, in sequence terms (see http://docs.python.org/lib/typesseq.html )

(3) If i or j is negative, the index is relative to the end of the string: len(s) + i
or len(s) + j is substituted.

...

(5) The slice of s from i to j with step k is defined as the sequence of items
with index x = i + n*k for n in the range(0,(j-i)/k). In other words, the
indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached
(but never including j).

so in this case, you get

3 + 0*-1 3 3 + 1*-1 2 # which is your stop condition

so a[3:2:-1] is the same as a[3].
print a[-4:-5:-1] # d
same as a[-4]
print a[3:-5:-1] # d
now you're mixing addressing modes, which is a great way to confuse
yourself. if you normalize the gap indexes (rule 3 above), you'll get
a[3:2:-1] which is the same as your earlier example. you can use the
"indices" method to let Python do this for you:
slice(3,-5,-1).indices(len(a)) (3, 2, -1) range(*slice(3,-5,-1).indices(len(a))) [3]
print a[-4:2:-1] # d
same problem here; indices will tell you what that really means:
slice(-4,2,-1).indices(len(a)) (3, 2, -1) range(*slice(-4,2,-1).indices(len(a)))

[3]

same example again, in other words. and same result.
This is why it confuses so many people. It's a shame too, because slice
objects could be so much more useful for indirectly accessing list
ranges. But I think this needs to be fixed first.

as everything else in Python, if you use the wrong mental model, things
may look "assymmetrical" or "confusing" or "inconsistent". if you look at
how things really work, it's usually extremely simple and more often than
not internally consistent (since the designers have the "big picture", and
knows what they're tried to be consistent with; when slice steps were
added, existing slicing rules and range() were the obvious references).

it's of course pretty common that people who didn't read the documentation
very carefully and therefore adopted the wrong model will insist that Python
uses a buggy implementation of their model, rather than a perfectly consistent
implementation of the actual model. slices with non-standard step sizes are
obviously one such thing, immutable/mutable objects and the exact behaviour
of for-else, while-else, and try-else are others. as usual, being able to reset
your brain is the only thing that helps.

</F>

Sep 2 '05 #100

Similar topics