473,473 Members | 1,857 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Can I overload the compare (cmp()) function for a Lists ([]) index function?

Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

I assume it would be something like...

list.index(something,mycmp)

Thanks!

Sep 28 '07 #1
6 2166
On Sep 28, 12:30 pm, xkenneth <xkenn...@gmail.comwrote:
Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

I assume it would be something like...

list.index(something,mycmp)

Thanks!
or can i just say....

list.index.__cmp__ = mycmp

and do it that way? I just want to make sure I'm not doing anything
evil.

Sep 28 '07 #2
ir****@gmail.com wrote:
On Sep 28, 8:30 pm, xkenneth <xkenn...@gmail.comwrote:
>Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

I assume it would be something like...

list.index(something,mycmp)

Thanks!

Wouldn't it be enough to get the items that are "within a couple of
seconds" out of the list and into another list. Then you can process
the other list however you want. Like this:

def isNew(x):
return x < 5

data = range(20)
print data
out, data = filter(isNew, data), filter(lambda x: not isNew(x), data)
print out, data
Slightly off topic here, but these uses of filter will be slower than
the list comprehension equivalents::

out = [x for x in data if x < 5]
data = [x for x in data if x >= 5]

Here are sample timings::

$ python -m timeit -s "data = range(20)" -s "def is_new(x): return x <
5" "filter(is_new, data)"
100000 loops, best of 3: 5.05 usec per loop
$ python -m timeit -s "data = range(20)" "[x for x in data if x < 5]"
100000 loops, best of 3: 2.15 usec per loop

Functions like filter() and map() are really only more efficient when
you have an existing C-coded function, like ``map(str, items)``. Of
course, if the filter() code is clearer to you, feel free to use it, but
I find that most folks find list comprehensions easier to read than
map() and filter() code.

STeVe
Sep 28 '07 #3
xkenneth <xk******@gmail.comwrites:
Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?
This sounds like you want itertools.groupby. What is the exact
requirement?
Sep 29 '07 #4
xkenneth <xk******@gmail.comwrites:
Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

I assume it would be something like...

list.index(something,mycmp)
The obvious option is reimplementing the functionality of index as an
explicit loop, such as:

def myindex(lst, something, mycmp):
for i, el in enumerate(lst):
if mycmp(el, something) == 0:
return i
raise ValueError("element not in list")

Looping in Python is slower than looping in C, but since you're
calling a Python function per element anyway, the loop overhead might
be negligible.

A more imaginative way is to take advantage of the fact that index
uses the '==' operator to look for the item. You can create an object
whose == operator calls your comparison function and use that object
as the argument to list.index:

class Cmp(object):
def __init__(self, item, cmpfun):
self.item = item
self.cmpfun = cmpfun
def __eq__(self, other):
return self.cmpfun(self.item, other) == 0

# list.index(Cmp(something, mycmp))

For example:
>>def mycmp(s1, s2):
.... return cmp(s1.tolower(), s2.tolower())
>>['foo', 'bar', 'baz'].index(Cmp('bar', mycmp))
1
>>['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp))
1
>>['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.index(x): x not in list

The timeit module shows, somewhat surprisingly, that the first method
is ~1.5 times faster, even for larger lists.
Sep 29 '07 #5
En Fri, 28 Sep 2007 14:36:54 -0300, xkenneth <xk******@gmail.comescribi�:
On Sep 28, 12:30 pm, xkenneth <xkenn...@gmail.comwrote:
>Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?
The comparison is made by the list elements themselves (using their __eq__
or __cmp__), not by the index method nor the list object.
So you should modify __cmp__ for all your timestamps (datetime.datetime, I
presume?), but that's not very convenient. A workaround is to wrap the
object you are searching into a new, different class - since the list
items won't know how to compare to it, Python will try reversing the
operands.
datetime objects are a bit special in this behavior: they refuse to
compare to anything else unless the other object has a `timetuple`
attribute (see <http://docs.python.org/lib/datetime-date.htmlnote (4))

<code>
import datetime

class datetime_tol(object):
timetuple=None # unused, just to trigger the reverse comparison to
datetime objects
default_tolerance = datetime.timedelta(0, 10)

def __init__(self, dt, tolerance=None):
if tolerance is None:
tolerance = self.default_tolerance
self.dt = dt
self.tolerance = tolerance

def __cmp__(self, other):
tolerance = self.tolerance
if isinstance(other, datetime_tol):
tolerance = min(tolerance, other.tolerance)
other = other.dt
if not isinstance(other, datetime.datetime):
return cmp(self.dt, other)
delta = self.dt-other
return -1 if delta<-tolerance else 1 if delta>tolerance else 0

def index_tol(dtlist, dt, tolerance=None):
return dtlist.index(datetime_tol(dt, tolerance))
d1 = datetime.datetime(2007, 7, 18, 9, 20, 0)
d2 = datetime.datetime(2007, 7, 18, 9, 30, 25)
d3 = datetime.datetime(2007, 7, 18, 9, 30, 30)
d4 = datetime.datetime(2007, 7, 18, 9, 30, 35)
d5 = datetime.datetime(2007, 7, 18, 9, 40, 0)
L = [d1,d2,d3,d4,d5]

assert d3 in L
assert L.index(d3)==2
assert L.index(datetime_tol(d3))==1 # using 10sec tolerance
assert index_tol(L, d3)==1
assert index_tol(L, datetime.datetime(2007, 7, 18, 9, 43, 20),
datetime.timedelta(0, 5*60))==4 # 5 minutes tolerance
</code>

--
Gabriel Genellina

Sep 29 '07 #6
On Sep 28, 5:12 pm, Hrvoje Niksic <hnik...@xemacs.orgwrote:
xkenneth <xkenn...@gmail.comwrites:
Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?
I assume it would be something like...
list.index(something,mycmp)

The obvious option is reimplementing the functionality of index as an
explicit loop, such as:

def myindex(lst, something, mycmp):
for i, el in enumerate(lst):
if mycmp(el, something) == 0:
return i
raise ValueError("element not in list")

Looping in Python is slower than looping in C, but since you're
calling a Python function per element anyway, the loop overhead might
be negligible.

A more imaginative way is to take advantage of the fact that index
uses the '==' operator to look for the item. You can create an object
whose == operator calls your comparison function and use that object
as the argument to list.index:

class Cmp(object):
def __init__(self, item, cmpfun):
self.item = item
self.cmpfun = cmpfun
def __eq__(self, other):
return self.cmpfun(self.item, other) == 0

# list.index(Cmp(something, mycmp))

For example:
>def mycmp(s1, s2):

... return cmp(s1.tolower(), s2.tolower())>>['foo', 'bar', 'baz'].index(Cmp('bar', mycmp))
1
>['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp))
1
>['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp))

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.index(x): x not in list

The timeit module shows, somewhat surprisingly, that the first method
is ~1.5 times faster, even for larger lists.
Hrvoje,

That's fun! thx.

--Alan

the cut-n-paste version /w minor fix to 'lower'.
# ----------------------------------------------
class Cmp(object):
def __init__(self, item, cmpfun):
self.item = item
self.cmpfun = cmpfun
def __eq__(self, other):
return self.cmpfun(self.item, other) == 0
def mycmp(s1, s2):
return cmp(s1.lower(), s2.lower())
print ['foo', 'bar', 'baz'].index(Cmp('bar', mycmp))
print ['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp))
try:
print ['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp))
except ValueError:
print "Search String not found!"

# end example

Oct 11 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.