Bug in slice type

Bryan Olson

The Python slice type has one method 'indices', and reportedly:

This method takes a single integer argument /length/ and
computes information about the extended slice that the slice
object would describe if applied to a sequence of length
items. It returns a tuple of three integers; respectively
these are the /start/ and /stop/ indices and the /step/ or
stride length of the slice. Missing or out-of-bounds indices
are handled in a manner consistent with regular slices.

http://docs.python.org/ref/types.html
It behaves incorrectly when step is negative and the slice
includes the 0 index.
class BuggerAll:

def __init__(self, somelist):
self.sequence = somelist[:]

def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]
print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]
The above prints:

[9, 7, 5, 3, 1]
[]

Un-commenting the print statement in __getitem__ shows:

Slice says start, stop, step are: 9 -1 -2

The slice object seems to think that -1 is a valid exclusive
bound, but when using it to actually slice, Python interprets
negative numbers as an offset from the high end of the sequence.

Good start-stop-step values are (9, None, -2), or (9, -11, -2),
or (-1, -11, -2). The later two have the advantage of being
consistend with the documented behavior of returning three
integers.
--
--Bryan

Aug 10 '05 #1

Subscribe Post Reply

108

6280

Steven Bethard

Bryan Olson wrote:

class BuggerAll:

def __init__(self, somelist):
self.sequence = somelist[:]

def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]
print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]

The above prints:

[9, 7, 5, 3, 1]
[]

Un-commenting the print statement in __getitem__ shows:

Slice says start, stop, step are: 9 -1 -2

The slice object seems to think that -1 is a valid exclusive
bound, but when using it to actually slice, Python interprets
negative numbers as an offset from the high end of the sequence.

Good start-stop-step values are (9, None, -2), or (9, -11, -2),
or (-1, -11, -2). The later two have the advantage of being
consistend with the documented behavior of returning three
integers.

I suspect there's a reason that it's done this way, but I agree with you
that this seems strange. Have you filed a bug report on Sourceforge?

BTW, a simpler example of the same phenomenon is:

py> range(10)[slice(None, None, -2)]
[9, 7, 5, 3, 1]
py> slice(None, None, -2).indices(10)
(9, -1, -2)
py> range(10)[9:-1:-2]
[]

STeVe

Aug 11 '05 #2

Bryan Olson

Steven Bethard wrote:

I suspect there's a reason that it's done this way, but I agree with you
that this seems strange. Have you filed a bug report on Sourceforge?
I gather that the slice class is young, so my guess is bug. I
filed the report -- my first Sourceforge bug report.
BTW, a simpler example of the same phenomenon is:

py> range(10)[slice(None, None, -2)]
[9, 7, 5, 3, 1]
py> slice(None, None, -2).indices(10)
(9, -1, -2)
py> range(10)[9:-1:-2]
[]

Ah, thanks.
--
--Bryan

Aug 12 '05 #3

John Machin

Steven Bethard wrote:

Bryan Olson wrote:

class BuggerAll:

def __init__(self, somelist):
self.sequence = somelist[:]

def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]
print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]

The above prints:

[9, 7, 5, 3, 1]
[]

Un-commenting the print statement in __getitem__ shows:

Slice says start, stop, step are: 9 -1 -2

The slice object seems to think that -1 is a valid exclusive
bound, but when using it to actually slice, Python interprets
negative numbers as an offset from the high end of the sequence.

Good start-stop-step values are (9, None, -2), or (9, -11, -2),
or (-1, -11, -2). The later two have the advantage of being
consistend with the documented behavior of returning three
integers.

I suspect there's a reason that it's done this way, but I agree with you
that this seems strange. Have you filed a bug report on Sourceforge?

BTW, a simpler example of the same phenomenon is:

py> range(10)[slice(None, None, -2)]
[9, 7, 5, 3, 1]
py> slice(None, None, -2).indices(10)
(9, -1, -2)
py> range(10)[9:-1:-2]
[]

rt = range(10)
rt[slice(None, None, -2)] [9, 7, 5, 3, 1] rt[::-2] [9, 7, 5, 3, 1] slice(None, None, -2).indices(10) (9, -1, -2) [rt[x] for x in range(9, -1, -2)] [9, 7, 5, 3, 1]

Looks good to me. indices has returned a usable (start, stop, step).
Maybe the docs need expanding.

Aug 12 '05 #4

Bryan Olson

John Machin wrote:

Steven Bethard wrote:

[...]

BTW, a simpler example of the same phenomenon is:

py> range(10)[slice(None, None, -2)]
[9, 7, 5, 3, 1]
py> slice(None, None, -2).indices(10)
(9, -1, -2)
py> range(10)[9:-1:-2]
[]

>>> rt = range(10)
>>> rt[slice(None, None, -2)] [9, 7, 5, 3, 1] >>> rt[::-2] [9, 7, 5, 3, 1] >>> slice(None, None, -2).indices(10) (9, -1, -2) >>> [rt[x] for x in range(9, -1, -2)] [9, 7, 5, 3, 1] >>>

Looks good to me. indices has returned a usable (start, stop, step).
Maybe the docs need expanding.

But not a usable [start: stop: step], which is what 'slice' is
all about.
--
--Bryan

Aug 12 '05 #5

Michael Hudson

Bryan Olson <fa*********@nowhere.org> writes:

The Python slice type has one method 'indices', and reportedly:

This method takes a single integer argument /length/ and
computes information about the extended slice that the slice
object would describe if applied to a sequence of length
items. It returns a tuple of three integers; respectively
these are the /start/ and /stop/ indices and the /step/ or
stride length of the slice. Missing or out-of-bounds indices
are handled in a manner consistent with regular slices.

http://docs.python.org/ref/types.html
It behaves incorrectly
In some sense; it certainly does what I intended it to do.
when step is negative and the slice includes the 0 index.
class BuggerAll:

def __init__(self, somelist):
self.sequence = somelist[:]

def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]
But if that's what you want to do with the slice object, just write

start, stop, step = key.start, key.stop, key.step
return self.sequence[start : stop : step]

or even

return self.sequence[key]

What the values returned from indices are for is to pass to the
range() function, more or less. They're not intended to be
interpreted in the way things passed to __getitem__ are.

(Well, _actually_ the main motivation for writing .indices() was to
use it in unittests...)
print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]
The above prints:

[9, 7, 5, 3, 1]
[]

Un-commenting the print statement in __getitem__ shows:

Slice says start, stop, step are: 9 -1 -2

The slice object seems to think that -1 is a valid exclusive
bound,
It is, when you're doing arithmetic, which is what the client code to
PySlice_GetIndicesEx() which in turn is what indices() is a thin
wrapper of, does
but when using it to actually slice, Python interprets negative
numbers as an offset from the high end of the sequence.

Good start-stop-step values are (9, None, -2), or (9, -11, -2),
or (-1, -11, -2). The later two have the advantage of being
consistend with the documented behavior of returning three
integers.

I'm not going to change the behaviour. The docs probably aren't
especially clear, though.

Cheers,
mwh

--
(ps: don't feed the lawyers: they just lose their fear of humans)
-- Peter Wood, comp.lang.lisp

Aug 12 '05 #6

bryanjugglercryptographer

Michael Hudson wrote:

Bryan Olson writes:
In some sense; it certainly does what I intended it to do.
[...] I'm not going to change the behaviour. The docs probably aren't
especially clear, though.

The docs and the behavior contradict:

[...] these are the /start/ and /stop/ indices and the
/step/ or stride length of the slice [emphasis added].
I'm fine with your favored behavior. What do we do next to get
the doc fixed?
--
--Bryan

Aug 16 '05 #7

Michael Hudson

br***********************@yahoo.com writes:

Michael Hudson wrote:
Bryan Olson writes:
In some sense; it certainly does what I intended it to do.

[...]
I'm not going to change the behaviour. The docs probably aren't
especially clear, though.

The docs and the behavior contradict:

[...] these are the /start/ and /stop/ indices and the
/step/ or stride length of the slice [emphasis added].
I'm fine with your favored behavior. What do we do next to get
the doc fixed?

I guess one of us comes up with some less misleading words. It's not
totally obvious to me what to do, seeing as the returned values *are*
indices is a sense, just not the sense in which they are used in
Python. Any ideas?

Cheers,
mwh

--
First of all, email me your AOL password as a security measure. You
may find that won't be able to connect to the 'net for a while. This
is normal. The next thing to do is turn your computer upside down
and shake it to reboot it. -- Darren Tucker, asr

Aug 18 '05 #8

Steven Bethard

Michael Hudson wrote:

br***********************@yahoo.com writes:
I'm fine with your favored behavior. What do we do next to get
the doc fixed?

I guess one of us comes up with some less misleading words. It's not
totally obvious to me what to do, seeing as the returned values *are*
indices is a sense, just not the sense in which they are used in
Python. Any ideas?

Maybe you could replace:

"these are the start and stop indices and the step or stride length of
the slice"

with

"these are start, stop and step values suitable for passing to range or
xrange"
I wanted to say something about what happens with a negative stride, to
indicate that it produces (9, -1, -2) instead of (-1, -11, -2), but I
wasn't able to navigate the Python documentation well enough.

Looking at the Language Reference section on the slice type[1] (section
3.2), I find that "Missing or out-of-bounds indices are handled in a
manner consistent with regular slices." So I looked for the
documentation of "regular slices". My best guess was that this meant
looking at the Language Reference on slicings[2]. But all I could find
in this documentation about the "stride" argument was:

"The conversion of a proper slice is a slice object (see section 3.2)
whose start, stop and step attributes are the values of the expressions
given as lower bound, upper bound and stride, respectively, substituting
None for missing expressions."

This feels circular to me. Can someone help me find where the semantics
of a negative stride index is defined?
Steve

[1] http://docs.python.org/ref/types.html
[2] http://docs.python.org/ref/slicings.html

Aug 18 '05 #9

Steven Bethard

I wrote:

I wanted to say something about what happens with a negative stride, to
indicate that it produces (9, -1, -2) instead of (-1, -11, -2), but I
wasn't able to navigate the Python documentation well enough.

Looking at the Language Reference section on the slice type[1] (section
3.2), I find that "Missing or out-of-bounds indices are handled in a
manner consistent with regular slices." So I looked for the
documentation of "regular slices". My best guess was that this meant
looking at the Language Reference on slicings[2]. But all I could find
in this documentation about the "stride" argument was:

"The conversion of a proper slice is a slice object (see section 3.2)
whose start, stop and step attributes are the values of the expressions
given as lower bound, upper bound and stride, respectively, substituting
None for missing expressions."

This feels circular to me. Can someone help me find where the semantics
of a negative stride index is defined?

Well, I couldn't find where the general semantics of a negative stride
index are defined, but for sequences at least[1]:

"The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k."

This seems to contradict list behavior though.
range(10)[9:-1:-2] == []
But the values of n that satisfy
0 <= n < (-1 - 9)/-2 = -10/-2 = 5
are 0, 1, 2, 3, 4, corresponding to the x values of 9, 7, 5, 3, 1. But
[range(10)[x] for x in [9, 7, 5, 3, 1]] == [9, 7, 5, 3, 1]
Does this mean that there's a bug in the list object?

STeVe

[1] http://docs.python.org/lib/typesseq.html

Aug 18 '05 #10

Bryan Olson

Steven Bethard wrote:

Well, I couldn't find where the general semantics of a negative stride
index are defined, but for sequences at least[1]:

"The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k."

This seems to contradict list behavior though. [...]

The conclusion is inescapable: Python's handling of negative
subscripts is a wart. Indexing from the high end is too useful
to give up, but it should be specified by the slicing/indexing
operation, not by the value of the index expression.
PPEP (Proposed Python Enhancement Proposal): New-Style Indexing

Instead of:

sequence[start : stop : step]

new-style slicing uses the syntax:

sequence[start ; stop ; step]

It works like current slicing, except that negative start or
stop values do not trigger from-the-high-end interpretation.
Omissions and None work the same as in old-style slicing.

Within the square-brackets, the '$' symbol stands for the length
of the sequence. One can index from the high end by subtracting
the index from '$'. Instead of:

seq[3 : -4]

we write:

seq[3 ; $ - 4]

When square-brackets appear within other square-brackets, the
inner-most bracket-pair determines which sequence '$' describes.
(Perhaps '$$' should be the length of the next containing
bracket pair, and '$$$' the next-out and...?)

So far, I don't think the proposal breaks anything; let's keep
it that way. The next bit is tricky...

Obviously '$' should also work in simple (non-slice) indexing.
Instead of:

seq[-2]

we write:

seq[$ - 2]

So really seq[-2] should be out-of-bounds. Alas, that would
break way too much code. For now, simple indexing with a
negative subscript (and no '$') should continue to index from
the high end, as a deprecated feature. The presence of '$'
always indicates new-style slicing, so a programmer who needs a
negative index to trigger a range error can write:

seq[($ - $) + index]

An Alternative Variant:

Suppose instead of using semicolons as the PPEP proposes, we use
commas, as in:

sequence[start, stop, step]

Commas are already in use to form tuples, and we let them do
just that. A slice is a subscript that is a tuple (or perhaps we
should allow any sequence). We could just as well write:

index_tuple = (start, stop, step)
sequence[index_tuple]

This variant *reduces* the number and complexity of rules that
define Python semantics. There is no special interpretation of
the comma, and no need for a distinct slice type.

The '$' character works as in the PPEP above. It is undefined
outside square brackets, but that makes no real difference; the
programmer can use len(sequence).

This variant might break some tricky code.
--
--Bryan

Aug 20 '05 #11

Steven Bethard

Bryan Olson wrote:

Steven Bethard wrote:
> Well, I couldn't find where the general semantics of a negative stride
> index are defined, but for sequences at least[1]:
>
> "The slice of s from i to j with step k is defined as the sequence of
> items with index x = i + n*k such that 0 <= n < (j-i)/k."
>
> This seems to contradict list behavior though. [...]
The conclusion is inescapable: Python's handling of negative
subscripts is a wart.

I'm not sure I'd go that far. Note that my confusion above was the
order of combination of points (3) and (5) on the page quoted above[1].
I think the problem is not the subscript handling so much as the
documentation thereof. I posted a message about this [2], and a
documentation patch based on that message [3].
[1] http://docs.python.org/lib/typesseq.html
[2] http://mail.python.org/pipermail/pyt...st/295260.html
[3] http://www.python.org/sf/1265100

Suppose instead of using semicolons as the PPEP proposes, we use
commas, as in:

sequence[start, stop, step]

This definitely won't work. This is already valid syntax, and is used
heavily by the numarray/numeric folks.

STeVe

Aug 20 '05 #12

Kay Schluehr

Steven Bethard wrote:

"The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k."

This seems to contradict list behavior though.
range(10)[9:-1:-2] == []

No, both is correct. But we don't have to interpret the second slice
argument m as the limit j of the above definition. For positive values
of m the identity
m==j holds. For negative values of m we have j = max(0,i+m). This is
consistent with the convenient negative indexing:

range(9)[-1] == range(9)[8]

If we remember how -1 is interpreted as an index not as some limit the
behaviour makes perfect sense.

Kay

Aug 21 '05 #13

Kay Schluehr

Bryan Olson wrote:

Steven Bethard wrote:
> Well, I couldn't find where the general semantics of a negative stride
> index are defined, but for sequences at least[1]:
>
> "The slice of s from i to j with step k is defined as the sequence of
> items with index x = i + n*k such that 0 <= n < (j-i)/k."
>
> This seems to contradict list behavior though. [...]

The conclusion is inescapable: Python's handling of negative
subscripts is a wart. Indexing from the high end is too useful
to give up, but it should be specified by the slicing/indexing
operation, not by the value of the index expression.

It is a Python gotcha, but the identity X[-1] == X[len(X)-1] holds and
is very usefull IMO. If you want to slice to the bottom, take 0 as
bottom value. The docs have to be extended in this respect.

Kay

Aug 21 '05 #14

Paul Rubin

Bryan Olson <fa*********@nowhere.org> writes:

seq[3 : -4]

we write:

seq[3 ; $ - 4]
+1
When square-brackets appear within other square-brackets, the
inner-most bracket-pair determines which sequence '$' describes.
(Perhaps '$$' should be the length of the next containing
bracket pair, and '$$$' the next-out and...?)
Not sure. $1, $2, etc. might be better, or $<tag> like in regexps, etc.
So really seq[-2] should be out-of-bounds. Alas, that would
break way too much code. For now, simple indexing with a
negative subscript (and no '$') should continue to index from
the high end, as a deprecated feature. The presence of '$'
always indicates new-style slicing, so a programmer who needs a
negative index to trigger a range error can write:

seq[($ - $) + index]
+1
Commas are already in use to form tuples, and we let them do
just that. A slice is a subscript that is a tuple (or perhaps we
should allow any sequence). We could just as well write:

index_tuple = (start, stop, step)
sequence[index_tuple]

Hmm, tuples are hashable and are already valid indices to mapping
objects like dictionaries. Having slices means an object can
implement both the mapping and sequence interfaces. Whether that's
worth caring about, I don't know.

Aug 21 '05 #15

Bryan Olson

Paul Rubin wrote:

Bryan Olson writes:
seq[3 : -4]

we write:

seq[3 ; $ - 4]

+1

I think you're wrong about the "+1". I defined '$' to stand for
the length of the sequence (not the address of the last
element).

When square-brackets appear within other square-brackets, the
inner-most bracket-pair determines which sequence '$' describes.
(Perhaps '$$' should be the length of the next containing
bracket pair, and '$$$' the next-out and...?)

Not sure. $1, $2, etc. might be better, or $<tag> like in regexps, etc.

Sounds reasonable.
[...] Hmm, tuples are hashable and are already valid indices to mapping
objects like dictionaries. Having slices means an object can
implement both the mapping and sequence interfaces. Whether that's
worth caring about, I don't know.

Yeah, I thought that alternative might break peoples code, and
it turns out it does.
--
--Bryan

Aug 24 '05 #16

Bryan Olson

Kay Schluehr wrote:

Bryan Olson wrote:
Steven Bethard wrote:
> Well, I couldn't find where the general semantics of a negative stride
> index are defined, but for sequences at least[1]:
>
> "The slice of s from i to j with step k is defined as the sequence of
> items with index x = i + n*k such that 0 <= n < (j-i)/k."
>
> This seems to contradict list behavior though. [...]
The conclusion is inescapable: Python's handling of negative
subscripts is a wart. Indexing from the high end is too useful
to give up, but it should be specified by the slicing/indexing
operation, not by the value of the index expression.

It is a Python gotcha, but the identity X[-1] == X[len(X)-1] holds and
is very usefull IMO.

No question index-from-the-far-end is useful, but I think
special-casing some otherwise-out-of-bounds indexes is a
mistake.

Are there any cases in popular Python code where my proposal
would not allow as elegant a solution?
If you want to slice to the bottom, take 0 as
bottom value. The docs have to be extended in this respect.

I'm not sure what you mean. Slicing with a negative step and a
stop value of zero will not reach the bottom (unless the
sequence is empty). In general, Python uses inclusive beginning
bounds and exclusive ending bounds. (The rule is frequently
stated incorrectly as "inclusive lower bounds and exclusive
upper bounds," which fails to consider negative increments.)
--
--Bryan

Aug 24 '05 #17

Bryan Olson

Kay Schluehr wrote:

Steven Bethard wrote:
"The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k."

This seems to contradict list behavior though.
range(10)[9:-1:-2] == []

No, both is correct. But we don't have to interpret the second slice
argument m as the limit j of the above definition.

Even if "we don't have to," it sure reads like we should.

For positive values
of m the identity
m==j holds. For negative values of m we have j = max(0,i+m).

First, the definition from the doc is still ambiguous: Is the
division in

0 <= n < (j-i)/k

real division, or is it Python integer (truncating) division? It
matters.

Second, the rule Kay Schluehr states is wrong for either type
of division. Look at:

range(5)[4 : -6 : -2]

Since Python is so programmer-friendly, I wrote some code to
make the "look at" task easy:

slice_definition = """"
The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k.
"""

Kay_Schluehr_rule = """
For positive values of m the identity m==j holds. For negative values
of m we have j = max(0,i+m).
"""

def m_to_j(i, m):
""" Compute slice_definition's 'j' according to Kay_Schluehr_rule
when the slice of sequence is specified as,
sequence[i : m : k].
"""
if m > 0:
j = m
else:
j = max(0, i + m)
return j

def extract_slice(sequence, i, m, k, div_type='i'):
""" Apply the slice definition with Kay Schluehr's rule to find
what the slice should be. Pass div_type of 'i' to use integer
division, or 'f' for float (~real) division, in the
slice_definition expression,
(j-i)/k.
"""
j = m_to_j(i, m)
result = []
n = 0
if div_type == 'i':
end_bound = (j - i) / k
else:
assert div_type == 'f', "div_type must be 'i' or 'f'."
end_bound = float(j - i) / k
while n < end_bound:
result.append(sequence[i + n * k])
n += 1
return result

def show(sequence, i, m, k):
""" Print what happens, both actually and according to stated rules.
"""
print "Checking: %s[%d : %d : %d]" % (sequence, i, m, k)
print "actual :", sequence[i : m : k]
print "Kay's rule, int division :", extract_slice(sequence, i, m, k)
print "Kay's rule, real division:", extract_slice(sequence, i, m,
k, 'f')
print

show(range(5), 4, -6, -2)

--
--Bryan

Aug 24 '05 #18

Bryan Olson

Steven Bethard wrote:

Bryan Olson wrote:
Steven Bethard wrote:
> Well, I couldn't find where the general semantics of a negative stride > index are defined, but for sequences at least[1]:
>
> "The slice of s from i to j with step k is defined as the sequence of
> items with index x = i + n*k such that 0 <= n < (j-i)/k."
>
> This seems to contradict list behavior though. [...]
The conclusion is inescapable: Python's handling of negative
subscripts is a wart.

I'm not sure I'd go that far. Note that my confusion above was the
order of combination of points (3) and (5) on the page quoted above[1].
I think the problem is not the subscript handling so much as the
documentation thereof.

Any bug can be pseudo-fixed by changing the documentation to
conform to the behavior. Here, the doc clearly went wrong by
expecting Python's behavior to follow from a few consistent
rules. The special-case handling of negative indexes looks
handy, but raises more difficulties than people realized.

I believe my PPEP avoids the proliferation of special cases. The
one additional issue I've discovered is that user-defined types
that are to support __getitem__ and/or __setitem__ *must* also
implement __len__. Sensible sequence types already do, so I
don't think it's much of an issue.

This is already valid syntax, and is used
heavily by the numarray/numeric folks.

Yeah, I thought that variant might break some code. I didn't
know it would be that much. Forget that variant.
--
--Bryan

Aug 24 '05 #19

Robert Kern

Bryan Olson wrote:

Paul Rubin wrote:
> Bryan Olson writes:
>
>> seq[3 : -4]
>>
>>we write:
>>
>> seq[3 ; $ - 4]

>
> +1

I think you're wrong about the "+1". I defined '$' to stand for
the length of the sequence (not the address of the last
element).

By "+1" he means, "I like it." He's not correcting you.

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Aug 24 '05 #20

Bryan Olson

Robert Kern wrote:

By "+1" he means, "I like it." He's not correcting you.

Ah, O.K. Thanks.
--
--Bryan

Aug 25 '05 #21

Bryan Olson

The doc for the find() method of string objects, which is
essentially the same as the string.find() function, states:

find(sub[, start[, end]])
Return the lowest index in the string where substring sub
is found, such that sub is contained in the range [start,
end). Optional arguments start and end are interpreted as
in slice notation. Return -1 if sub is not found.

Consider:

print 'Hello'.find('o')

or:

import string
print string.find('Hello', 'o')

The substring 'o' is found in 'Hello' at the index -1, and at
the index 4, and it is not found at any other index. Both the
locations found are in the range [start, end), and obviously -1
is less than 4, so according to the documentation, find() should
return -1.

What the either of the above actually prints is:

4

which shows yet another bug resulting from Python's handling of
negative indexes. This one is clearly a documentation error, but
the real fix is to cure the wart so that Python's behavior is
consistent enough that we'll be able to describe it correctly.
--
--Bryan

Aug 25 '05 #22

Steve Holden

Bryan Olson wrote:

The doc for the find() method of string objects, which is
essentially the same as the string.find() function, states:

find(sub[, start[, end]])
Return the lowest index in the string where substring sub
is found, such that sub is contained in the range [start,
end). Optional arguments start and end are interpreted as
in slice notation. Return -1 if sub is not found.

Consider:

print 'Hello'.find('o')

or:

import string
print string.find('Hello', 'o')

The substring 'o' is found in 'Hello' at the index -1, and at
the index 4, and it is not found at any other index. Both the
locations found are in the range [start, end), and obviously -1
is less than 4, so according to the documentation, find() should
return -1.

What the either of the above actually prints is:

4

which shows yet another bug resulting from Python's handling of
negative indexes. This one is clearly a documentation error, but
the real fix is to cure the wart so that Python's behavior is
consistent enough that we'll be able to describe it correctly.

Do you just go round looking for trouble?

As far as position reporting goes, it seems pretty clear that find()
will always report positive index values. In a five-character string
then -1 and 4 are effectively equivalent.

What on earth makes you call this a bug? And what are you proposing that
find() should return if the substring isn't found at all? please don't
suggest it should raise an exception, as index() exists to provide that
functionality.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Aug 25 '05 #23

Casey Hawthorne

>contained in the range [start, end)

Does range(start, end) generate negative integers in Python if start

= 0 and end >= start?

--
Regards,
Casey

Aug 25 '05 #24

On Thu, 25 Aug 2005 00:05:18 -0400
Steve Holden wrote:

What on earth makes you call this a bug? And what are you proposing that
find() should return if the substring isn't found at all? please don't
suggest it should raise an exception, as index() exists to provide that
functionality.

Returning -1 looks like C-ism for me. It could better return None when none
is found.

index = "Hello".find("z")
if index is not None:
# ...

Now it's too late for it, I know.

--
jk

Aug 25 '05 #25

Paul Rubin

Steve Holden <st***@holdenweb.com> writes:

As far as position reporting goes, it seems pretty clear that find()
will always report positive index values. In a five-character string
then -1 and 4 are effectively equivalent.

What on earth makes you call this a bug? And what are you proposing
that find() should return if the substring isn't found at all? please
don't suggest it should raise an exception, as index() exists to
provide that functionality.

Bryan is making the case that Python's use of negative subscripts to
measure from the end of sequences is bogus, and that it should be done
some other way instead. I've certainly had bugs in my own programs
related to that "feature".

Aug 25 '05 #26

Bryan Olson

Steve Holden asked:

Do you just go round looking for trouble?
In the course of programming, yes, absolutly.
As far as position reporting goes, it seems pretty clear that find()
will always report positive index values. In a five-character string
then -1 and 4 are effectively equivalent.

What on earth makes you call this a bug?
What you just said, versus what the doc says.
And what are you proposing that
find() should return if the substring isn't found at all? please don't
suggest it should raise an exception, as index() exists to provide that
functionality.

There are a number of good options. A legal index is not one of
them.
--
--Bryan

Aug 25 '05 #27

Antoon Pardon

Op 2005-08-25, Bryan Olson schreef <fa*********@nowhere.org>:

Steve Holden asked:
Do you just go round looking for trouble?

In the course of programming, yes, absolutly.
As far as position reporting goes, it seems pretty clear that find()
will always report positive index values. In a five-character string
then -1 and 4 are effectively equivalent.

What on earth makes you call this a bug?

What you just said, versus what the doc says.
And what are you proposing that
find() should return if the substring isn't found at all? please don't
suggest it should raise an exception, as index() exists to provide that
functionality.

There are a number of good options. A legal index is not one of
them.

IMO, with find a number of "features" of python come together.
that create an awkward situation.

1) 0 is a false value, but indexes start at 0 so you can't
return 0 to indicate nothing was found.

2) -1 is returned, which is both a true value and a legal
index.
It probably is too late now, but I always felt, find should
have returned None when the substring isn't found.

--
Antoon Pardon

Aug 26 '05 #28

Bryan Olson

Antoon Pardon wrote:

Bryan Olson schreef:
Steve Holden asked:
And what are you proposing that
find() should return if the substring isn't found at all? please don't
suggest it should raise an exception, as index() exists to provide that
functionality.

There are a number of good options. A legal index is not one of
them.

IMO, with find a number of "features" of python come together.
that create an awkward situation.

1) 0 is a false value, but indexes start at 0 so you can't
return 0 to indicate nothing was found.

2) -1 is returned, which is both a true value and a legal
index.

It probably is too late now, but I always felt, find should
have returned None when the substring isn't found.

None is certainly a reasonable candidate. The one-past-the-end
value, len(sequence), would be fine, and follows the preferred
idiom of C/C++. I don't see any elegant way to arrange for
successful finds always to return a true value and unsuccessful
calls to return a false value.

The really broken part is that unsuccessful searches return a
legal index.

My suggestion doesn't change what find() returns, and doesn't
break code. Negative one is a reasonable choice to represent an
unsuccessful search -- provided it is not a legal index. Instead
of changing what find() returns, we should heal the
special-case-when-index-is-negative-in-a-certain-range wart.
--
--Bryan

Aug 26 '05 #29

Rick Wotnaz

Bryan Olson <fa*********@nowhere.org> wrote in
news:3E**************@newssvr21.news.prodigy.com:

Steve Holden asked:
Do you just go round looking for trouble?

In the course of programming, yes, absolutly.
As far as position reporting goes, it seems pretty clear that
find() will always report positive index values. In a
five-character string then -1 and 4 are effectively
equivalent.

What on earth makes you call this a bug?

What you just said, versus what the doc says.
And what are you proposing that
find() should return if the substring isn't found at all?
please don't suggest it should raise an exception, as index()
exists to provide that functionality.

There are a number of good options. A legal index is not one of
them.

Practically speaking, what difference would it make? Supposing find
returned None for not-found. How would you use it in your code that
would make it superior to what happens now? In either case you
would have to test for the not-found state before relying on the
index returned, wouldn't you? Or do you have a use that would
eliminate that step?

--
rzed

Aug 26 '05 #30

Steve Holden

Bryan Olson wrote:

Antoon Pardon wrote:
> Bryan Olson schreef:
>
>>Steve Holden asked:
>>>And what are you proposing that
>>>find() should return if the substring isn't found at all? please don't
>>>suggest it should raise an exception, as index() exists to provide that
>>>functionality.
>>
>>There are a number of good options. A legal index is not one of
>>them. >
> IMO, with find a number of "features" of python come together.
> that create an awkward situation.
>
> 1) 0 is a false value, but indexes start at 0 so you can't
> return 0 to indicate nothing was found.
>
> 2) -1 is returned, which is both a true value and a legal
> index.
>
> It probably is too late now, but I always felt, find should
> have returned None when the substring isn't found.

None is certainly a reasonable candidate. The one-past-the-end
value, len(sequence), would be fine, and follows the preferred
idiom of C/C++. I don't see any elegant way to arrange for
successful finds always to return a true value and unsuccessful
calls to return a false value.

The really broken part is that unsuccessful searches return a
legal index.

We might agree, before further discussion, that this isn't the most
elegant part of Python's design, and it's down to history that this tiny
little wart remains.
My suggestion doesn't change what find() returns, and doesn't
break code. Negative one is a reasonable choice to represent an
unsuccessful search -- provided it is not a legal index. Instead
of changing what find() returns, we should heal the
special-case-when-index-is-negative-in-a-certain-range wart.

What I don't understand is why you want it to return something that
isn't a legal index. Before using the result you always have to perform
a test to discriminate between the found and not found cases. So I don't
really see why this wart has put such a bug up your ass.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Aug 26 '05 #31

Bryan Olson

Steve Holden wrote:

Bryan Olson wrote:
Antoon Pardon wrote:
> It probably is too late now, but I always felt, find should
> have returned None when the substring isn't found.
None is certainly a reasonable candidate. [...] The really broken part is that unsuccessful searches return a
legal index.

We might agree, before further discussion, that this isn't the most
elegant part of Python's design, and it's down to history that this tiny
little wart remains.

I don't think my proposal breaks historic Python code, and I
don't think it has the same kind of unfortunate subtle
consequences as the current indexing scheme. You may think the
wart is tiny, but the duct-tape* is available so let's cure it.
[*] http://www.google.com/search?as_q=warts+%22duct+tape%22

My suggestion doesn't change what find() returns, and doesn't
break code. Negative one is a reasonable choice to represent an
unsuccessful search -- provided it is not a legal index. Instead
of changing what find() returns, we should heal the
special-case-when-index-is-negative-in-a-certain-range wart.

What I don't understand is why you want it to return something that
isn't a legal index.

In this case, so that errors are caught as close to their
occurrence as possible. I see no good reason for the following
to happily print 'y'.

s = 'buggy'
print s[s.find('w')]
Before using the result you always have to perform
a test to discriminate between the found and not found cases. So I don't
really see why this wart has put such a bug up your ass.

The bug that got me was what a slice object reports as the
'stop' bound when the step is negative and the slice includes
index 0. Took me hours to figure out why my code was failing.

The double-meaning of -1, as both an exclusive stopping bound
and an alias for the highest valid index, is just plain whacked.
Unfortunately, as negative indexes are currently handled, there
is no it-just-works value that slice could return.
--
--Bryan

Aug 26 '05 #32

Reinhold Birkenfeld

Bryan Olson wrote:

Steve Holden wrote:
> Bryan Olson wrote:
>> Antoon Pardon wrote: >> > It probably is too late now, but I always felt, find should
>> > have returned None when the substring isn't found.
>>
>> None is certainly a reasonable candidate. [...] >> The really broken part is that unsuccessful searches return a
>> legal index.
>>

> We might agree, before further discussion, that this isn't the most
> elegant part of Python's design, and it's down to history that this tiny
> little wart remains.

I don't think my proposal breaks historic Python code, and I
don't think it has the same kind of unfortunate subtle
consequences as the current indexing scheme. You may think the
wart is tiny, but the duct-tape* is available so let's cure it.

[*] http://www.google.com/search?as_q=warts+%22duct+tape%22

Well, nobody stops you from posting this on python-dev and be screamed
at by Guido...

just-kidding-ly
Reinhold

Aug 26 '05 #33

Terry Reedy

"Bryan Olson" <fa*********@nowhere.org> wrote in message
news:7s***************@newssvr25.news.prodigy.net. ..

The double-meaning of -1, as both an exclusive stopping bound
and an alias for the highest valid index, is just plain whacked.

I agree in this sense: the use of any int as an error return is an
unPythonic *nix-Cism, which I believe was copied therefrom. Str.find is
redundant with the Pythonic exception-raising str.index and I think it
should be removed in Py3.

Therefore, I think changing it now is untimely and changing the language
because of it backwards.

Terry J. Reedy

Aug 26 '05 #34

Paul Rubin

"Terry Reedy" <tj*****@udel.edu> writes:

I agree in this sense: the use of any int as an error return is an
unPythonic *nix-Cism, which I believe was copied therefrom. Str.find is
redundant with the Pythonic exception-raising str.index and I think it
should be removed in Py3.

I like having it available so you don't have to clutter your code with
try/except if the substring isn't there. But it should not return a
valid integer index.

Aug 26 '05 #35

Terry Reedy

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...

"Terry Reedy" <tj*****@udel.edu> writes:
Str.find is
redundant with the Pythonic exception-raising str.index
and I think it should be removed in Py3.

I like having it available so you don't have to clutter your code with
try/except if the substring isn't there. But it should not return a
valid integer index.

The try/except pattern is a pretty basic part of Python's design. One
could say the same about clutter for *every* function or method that raises
an exception on invalid input. Should more or even all be duplicated? Why
just this one?

Terry J. Reedy

Aug 26 '05 #36

Torsten Bronger

HallÃ¶chen!

"Terry Reedy" <tj*****@udel.edu> writes:

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...
"Terry Reedy" <tj*****@udel.edu> writes:
Str.find is redundant with the Pythonic exception-raising
str.index and I think it should be removed in Py3.

I like having it available so you don't have to clutter your code
with try/except if the substring isn't there. But it should not
return a valid integer index.

The try/except pattern is a pretty basic part of Python's design.
One could say the same about clutter for *every* function or
method that raises an exception on invalid input. Should more or
even all be duplicated? Why just this one?

Granted, try/except can be used for deliberate case discrimination
(which may even happen in the standard library in many places),
however, it is only the second most elegant method -- the most
elegant being "if". Where "if" does the job, it should be prefered
in my opinion.

TschÃ¶,
Torsten.

--
Torsten Bronger, aquisgrana, europa vetus ICQ 264-296-646

Aug 26 '05 #37

Paul Rubin

"Terry Reedy" <tj*****@udel.edu> writes:

The try/except pattern is a pretty basic part of Python's design. One
could say the same about clutter for *every* function or method that raises
an exception on invalid input. Should more or even all be duplicated? Why
just this one?

Someone must have thought str.find was worth having, or else it
wouldn't be in the library.

Aug 26 '05 #38

Raymond Hettinger

Bryan Olson wrote:

The conclusion is inescapable: Python's handling of negative
subscripts is a wart. Indexing from the high end is too useful
to give up, but it should be specified by the slicing/indexing
operation, not by the value of the index expression.
PPEP (Proposed Python Enhancement Proposal): New-Style Indexing

Instead of:

sequence[start : stop : step]

new-style slicing uses the syntax:

sequence[start ; stop ; step]

<klingon>
Bah!
</klingon>

The pythonic way to handle negative slicing is to use reversed(). The
principle is that the mind more easily handles this in two steps,
specifying the range a forward direction, and then reversing it.

IOW, it is easier to identify the included elements and see the
direction of:

reversed(xrange(1, 20, 2))

than it is for:

xrange(19, -1, -2)

See PEP 322 for discussion and examples:
http://www.python.org/peps/pep-0322.html

Raymond

Aug 26 '05 #39

Terry Reedy

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...

"Terry Reedy" <tj*****@udel.edu> writes:
The try/except pattern is a pretty basic part of Python's design. One
could say the same about clutter for *every* function or method that
raises
an exception on invalid input. Should more or even all be duplicated?
Why
just this one?

Someone must have thought str.find was worth having, or else it
wouldn't be in the library.

Well, Guido no longer thinks it worth having and emphatically agreed that
it should be added to one of the 'To be removed' sections of PEP 3000.

Terry J. Reedy

Aug 27 '05 #40

Steve Holden

Torsten Bronger wrote:

HallÃ¶chen!

"Terry Reedy" <tj*****@udel.edu> writes:

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...

"Terry Reedy" <tj*****@udel.edu> writes:
Str.find is redundant with the Pythonic exception-raising
str.index and I think it should be removed in Py3.

I like having it available so you don't have to clutter your code
with try/except if the substring isn't there. But it should not
return a valid integer index.

The try/except pattern is a pretty basic part of Python's design.
One could say the same about clutter for *every* function or
method that raises an exception on invalid input. Should more or
even all be duplicated? Why just this one?

Granted, try/except can be used for deliberate case discrimination
(which may even happen in the standard library in many places),
however, it is only the second most elegant method -- the most
elegant being "if". Where "if" does the job, it should be prefered
in my opinion.

Of course. But onc you (sensibly) decide to use an "if" then there
really isn't much difference between -1, None, () and sys.maxint as
a sentinel value, is there?

Which is what I've been trying to say all along.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Aug 27 '05 #41

Steve Holden

Bryan Olson wrote:

Steve Holden wrote:
> Bryan Olson wrote:
>> Antoon Pardon wrote: >> > It probably is too late now, but I always felt, find should
>> > have returned None when the substring isn't found.
>>
>> None is certainly a reasonable candidate. [...] >> The really broken part is that unsuccessful searches return a
>> legal index.
>>

> We might agree, before further discussion, that this isn't the most
> elegant part of Python's design, and it's down to history that this tiny
> little wart remains.

I don't think my proposal breaks historic Python code, and I
don't think it has the same kind of unfortunate subtle
consequences as the current indexing scheme. You may think the
wart is tiny, but the duct-tape* is available so let's cure it.

[*] http://www.google.com/search?as_q=warts+%22duct+tape%22

>> My suggestion doesn't change what find() returns, and doesn't
>> break code. Negative one is a reasonable choice to represent an
>> unsuccessful search -- provided it is not a legal index. Instead
>> of changing what find() returns, we should heal the
>> special-case-when-index-is-negative-in-a-certain-range wart.
>>
>>

> What I don't understand is why you want it to return something that
> isn't a legal index.

In this case, so that errors are caught as close to their
occurrence as possible. I see no good reason for the following
to happily print 'y'.

s = 'buggy'
print s[s.find('w')]
> Before using the result you always have to perform
> a test to discriminate between the found and not found cases. So I don't
> really see why this wart has put such a bug up your ass.

The bug that got me was what a slice object reports as the
'stop' bound when the step is negative and the slice includes
index 0. Took me hours to figure out why my code was failing.

The double-meaning of -1, as both an exclusive stopping bound
and an alias for the highest valid index, is just plain whacked.
Unfortunately, as negative indexes are currently handled, there
is no it-just-works value that slice could return.

If you want an exception from your code when 'w' isn't in the string you
should consider using index() rather than find.

Otherwise, whatever find() returns you will have to have an "if" in
there to handle the not-found case.

This just sounds like whining to me. If you want to catch errors, use a
function that will raise an exception rather than relying on the
invalidity of the result.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Aug 27 '05 #42

Robert Kern

Steve Holden wrote:

Of course. But onc you (sensibly) decide to use an "if" then there
really isn't much difference between -1, None, () and sys.maxint as
a sentinel value, is there?

Sure there is. -1 is a valid index; None is not. -1 as a sentinel is
specific to str.find(); None is used all over Python as a sentinel.

If I may digress for a bit, my advisor is currently working on a project
that is processing seafloor depth datasets starting from a few decades
ago. A lot of this data was orginally to be processed using FORTRAN
software, so in the idiom of much FORTRAN software from those days, 9999
is often used to mark missing data. Unfortunately, 9999 is a perfectly
valid datum in most of the unit systems used by the various datasets.

Now he has to find a grad student to traul through the datasets and
clean up the really invalid 9999's (as well as other such fun tasks like
deciding if a dataset that says it's using feet is actually using meters).

I have already called "Not It."

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Aug 27 '05 #43

Paul Rubin

Steve Holden <st***@holdenweb.com> writes:

Of course. But onc you (sensibly) decide to use an "if" then there
really isn't much difference between -1, None, () and sys.maxint as
a sentinel value, is there?

Of course there is. -1 is (under Python's perverse semantics) a valid
subscript. sys.maxint is an artifact of Python's fixed-size int
datatype, which is fading away under int/long unification, so it's
something that soon won't exist and shouldn't be used. None and ()
are invalid subscripts so would be reasonable return values, unlike -1
and sys.maxint. Of those, None is preferable to () because of its
semantic connotations.

Aug 27 '05 #44

Paul Rubin

Steve Holden <st***@holdenweb.com> writes:

If you want an exception from your code when 'w' isn't in the string
you should consider using index() rather than find.

The idea is you expect w to be in the string. If w isn't in the
string, your code has a bug, and programs with bugs should fail as
early as possible so you can locate the bugs quickly and easily. That
is why, for example,

x = 'buggy'[None]

raises an exception instead of doing something stupid like returning 'g'.

Aug 27 '05 #45

Terry Reedy

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...

Steve Holden <st***@holdenweb.com> writes:
Of course. But onc you (sensibly) decide to use an "if" then there
really isn't much difference between -1, None, () and sys.maxint as
a sentinel value, is there?

Of course there is. -1 is (under Python's perverse semantics) a valid
subscript. sys.maxint is an artifact of Python's fixed-size int
datatype, which is fading away under int/long unification, so it's
something that soon won't exist and shouldn't be used. None and ()
are invalid subscripts so would be reasonable return values, unlike -1
and sys.maxint. Of those, None is preferable to () because of its
semantic connotations.

I agree here that None is importantly different from -1 for the reason
stated. The use of -1 is, I am sure, a holdover from statically typed
languages (C, in particular) that require all return values to be of the
same type, even if the 'return value' is actually meant to indicat that
there is no valid return value.

Terry J. Reedy

Aug 27 '05 #46

Bryan Olson

Steve Holden wrote:

Bryan Olson wrote:
[...] I see no good reason for the following
to happily print 'y'.

s = 'buggy'
print s[s.find('w')]
> Before using the result you always have to perform
> a test to discriminate between the found and not found cases. So I don't
> really see why this wart has put such a bug up your ass.

The bug that got me was what a slice object reports as the
'stop' bound when the step is negative and the slice includes
index 0. Took me hours to figure out why my code was failing.

The double-meaning of -1, as both an exclusive stopping bound
and an alias for the highest valid index, is just plain whacked.
Unfortunately, as negative indexes are currently handled, there
is no it-just-works value that slice could return.

If you want an exception from your code when 'w' isn't in the string you
should consider using index() rather than find.

That misses the point. The code is a hypothetical example of
what a novice or imperfect Pythoners might have to deal with.
The exception isn't really wanted; it's just vastly superior to
silently returning a nonsensical value.

Otherwise, whatever find() returns you will have to have an "if" in
there to handle the not-found case.

This just sounds like whining to me. If you want to catch errors, use a
function that will raise an exception rather than relying on the
invalidity of the result.

I suppose if you ignore the real problems and the proposed
solution, it might sound a lot like whining.
--
--Bryan

Aug 27 '05 #47

Steve Holden

Paul Rubin wrote:

Steve Holden <st***@holdenweb.com> writes:
If you want an exception from your code when 'w' isn't in the string
you should consider using index() rather than find.

The idea is you expect w to be in the string. If w isn't in the
string, your code has a bug, and programs with bugs should fail as
early as possible so you can locate the bugs quickly and easily. That
is why, for example,

x = 'buggy'[None]

raises an exception instead of doing something stupid like returning 'g'.

You did read the sentence you were replying to, didn't you?

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Aug 27 '05 #48

Steve Holden

Terry Reedy wrote:

"Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...
Steve Holden <st***@holdenweb.com> writes:
Of course. But onc you (sensibly) decide to use an "if" then there
really isn't much difference between -1, None, () and sys.maxint as
a sentinel value, is there?

Of course there is. -1 is (under Python's perverse semantics) a valid
subscript. sys.maxint is an artifact of Python's fixed-size int
datatype, which is fading away under int/long unification, so it's
something that soon won't exist and shouldn't be used. None and ()
are invalid subscripts so would be reasonable return values, unlike -1
and sys.maxint. Of those, None is preferable to () because of its
semantic connotations.

I agree here that None is importantly different from -1 for the reason
stated. The use of -1 is, I am sure, a holdover from statically typed
languages (C, in particular) that require all return values to be of the
same type, even if the 'return value' is actually meant to indicat that
there is no valid return value.

While I agree that it would have been more sensible to choose None in
find()'s original design, there's really no reason to go breaking
existing code just to fix it.

Guido has already agreed that find() can change (or even disappear) in
Python 3.0, so please let's just leave things as they are for now.

A corrected find() that returns None on failure is a five-liner.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Aug 27 '05 #49

Paul Rubin

Steve Holden <st***@holdenweb.com> writes:

A corrected find() that returns None on failure is a five-liner.

If I wanted to write five lines instead of one everywhere in a Python
program, I'd use Java.

Aug 27 '05 #50

Bug in slice type

Similar topics