Can I overload the compare (cmp()) function for a Lists ([]) index function?

xkenneth

Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

I assume it would be something like...

list.index(something,mycmp)

Thanks!

Sep 28 '07 #1

Subscribe Reply

2166

xkenneth

On Sep 28, 12:30 pm, xkenneth <xkenn...@gmail.comwrote:

Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

I assume it would be something like...

list.index(something,mycmp)

Thanks!

or can i just say....

list.index.__cmp__ = mycmp

and do it that way? I just want to make sure I'm not doing anything
evil.

Sep 28 '07 #2

Steven Bethard

ir****@gmail.com wrote:

On Sep 28, 8:30 pm, xkenneth <xkenn...@gmail.comwrote:
>Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

I assume it would be something like...

list.index(something,mycmp)

Thanks!

Wouldn't it be enough to get the items that are "within a couple of
seconds" out of the list and into another list. Then you can process
the other list however you want. Like this:

def isNew(x):
return x < 5

data = range(20)
print data
out, data = filter(isNew, data), filter(lambda x: not isNew(x), data)
print out, data

Slightly off topic here, but these uses of filter will be slower than
the list comprehension equivalents::

out = [x for x in data if x < 5]
data = [x for x in data if x >= 5]

Here are sample timings::

$ python -m timeit -s "data = range(20)" -s "def is_new(x): return x <
5" "filter(is_new, data)"
100000 loops, best of 3: 5.05 usec per loop
$ python -m timeit -s "data = range(20)" "[x for x in data if x < 5]"
100000 loops, best of 3: 2.15 usec per loop

Functions like filter() and map() are really only more efficient when
you have an existing C-coded function, like ``map(str, items)``. Of
course, if the filter() code is clearer to you, feel free to use it, but
I find that most folks find list comprehensions easier to read than
map() and filter() code.

STeVe

Sep 28 '07 #3

Paul Rubin

xkenneth <xk******@gmail.comwrites:

Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

This sounds like you want itertools.groupby. What is the exact
requirement?

Sep 29 '07 #4

Hrvoje Niksic

xkenneth <xk******@gmail.comwrites:

Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

I assume it would be something like...

list.index(something,mycmp)

The obvious option is reimplementing the functionality of index as an
explicit loop, such as:

def myindex(lst, something, mycmp):
for i, el in enumerate(lst):
if mycmp(el, something) == 0:
return i
raise ValueError("element not in list")

Looping in Python is slower than looping in C, but since you're
calling a Python function per element anyway, the loop overhead might
be negligible.

A more imaginative way is to take advantage of the fact that index
uses the '==' operator to look for the item. You can create an object
whose == operator calls your comparison function and use that object
as the argument to list.index:

class Cmp(object):
def __init__(self, item, cmpfun):
self.item = item
self.cmpfun = cmpfun
def __eq__(self, other):
return self.cmpfun(self.item, other) == 0

# list.index(Cmp(something, mycmp))

For example:

>>def mycmp(s1, s2):

.... return cmp(s1.tolower(), s2.tolower())

>>['foo', 'bar', 'baz'].index(Cmp('bar', mycmp))

>>['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp))

>>['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp))

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.index(x): x not in list

The timeit module shows, somewhat surprisingly, that the first method
is ~1.5 times faster, even for larger lists.

Sep 29 '07 #5

Gabriel Genellina

En Fri, 28 Sep 2007 14:36:54 -0300, xkenneth <xk******@gmail.comescribiï¿½:

On Sep 28, 12:30 pm, xkenneth <xkenn...@gmail.comwrote:
>Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

The comparison is made by the list elements themselves (using their __eq__
or __cmp__), not by the index method nor the list object.
So you should modify __cmp__ for all your timestamps (datetime.datetime, I
presume?), but that's not very convenient. A workaround is to wrap the
object you are searching into a new, different class - since the list
items won't know how to compare to it, Python will try reversing the
operands.
datetime objects are a bit special in this behavior: they refuse to
compare to anything else unless the other object has a `timetuple`
attribute (see <http://docs.python.org/lib/datetime-date.htmlnote (4))

<code>
import datetime

class datetime_tol(object):
timetuple=None # unused, just to trigger the reverse comparison to
datetime objects
default_tolerance = datetime.timedelta(0, 10)

def __init__(self, dt, tolerance=None):
if tolerance is None:
tolerance = self.default_tolerance
self.dt = dt
self.tolerance = tolerance

def __cmp__(self, other):
tolerance = self.tolerance
if isinstance(other, datetime_tol):
tolerance = min(tolerance, other.tolerance)
other = other.dt
if not isinstance(other, datetime.datetime):
return cmp(self.dt, other)
delta = self.dt-other
return -1 if delta<-tolerance else 1 if delta>tolerance else 0

def index_tol(dtlist, dt, tolerance=None):
return dtlist.index(datetime_tol(dt, tolerance))
d1 = datetime.datetime(2007, 7, 18, 9, 20, 0)
d2 = datetime.datetime(2007, 7, 18, 9, 30, 25)
d3 = datetime.datetime(2007, 7, 18, 9, 30, 30)
d4 = datetime.datetime(2007, 7, 18, 9, 30, 35)
d5 = datetime.datetime(2007, 7, 18, 9, 40, 0)
L = [d1,d2,d3,d4,d5]

assert d3 in L
assert L.index(d3)==2
assert L.index(datetime_tol(d3))==1 # using 10sec tolerance
assert index_tol(L, d3)==1
assert index_tol(L, datetime.datetime(2007, 7, 18, 9, 43, 20),
datetime.timedelta(0, 5*60))==4 # 5 minutes tolerance
</code>

--
Gabriel Genellina

Sep 29 '07 #6

alan.haffner

On Sep 28, 5:12 pm, Hrvoje Niksic <hnik...@xemacs.orgwrote:

xkenneth <xkenn...@gmail.comwrites:
Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

I assume it would be something like...

list.index(something,mycmp)

The obvious option is reimplementing the functionality of index as an
explicit loop, such as:

def myindex(lst, something, mycmp):
for i, el in enumerate(lst):
if mycmp(el, something) == 0:
return i
raise ValueError("element not in list")

Looping in Python is slower than looping in C, but since you're
calling a Python function per element anyway, the loop overhead might
be negligible.

A more imaginative way is to take advantage of the fact that index
uses the '==' operator to look for the item. You can create an object
whose == operator calls your comparison function and use that object
as the argument to list.index:

class Cmp(object):
def __init__(self, item, cmpfun):
self.item = item
self.cmpfun = cmpfun
def __eq__(self, other):
return self.cmpfun(self.item, other) == 0

# list.index(Cmp(something, mycmp))

For example:

>def mycmp(s1, s2):

... return cmp(s1.tolower(), s2.tolower())>>['foo', 'bar', 'baz'].index(Cmp('bar', mycmp))
1

>['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp))

1

>['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp))

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.index(x): x not in list

The timeit module shows, somewhat surprisingly, that the first method
is ~1.5 times faster, even for larger lists.

Hrvoje,

That's fun! thx.

--Alan

the cut-n-paste version /w minor fix to 'lower'.
# ----------------------------------------------
class Cmp(object):
def __init__(self, item, cmpfun):
self.item = item
self.cmpfun = cmpfun
def __eq__(self, other):
return self.cmpfun(self.item, other) == 0
def mycmp(s1, s2):
return cmp(s1.lower(), s2.lower())
print ['foo', 'bar', 'baz'].index(Cmp('bar', mycmp))
print ['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp))
try:
print ['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp))
except ValueError:
print "Search String not found!"

# end example

Oct 11 '07 #7