473,382 Members | 1,386 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

__lt__ slowing the "in" operator even if not called

I have some code which does a lot of "in" on lists containing objects
with no __eq__ defined.

It all goes fast until I add the __lt__() method: then I have a
slowdown comparable to the one I get using the overridden __eq__, while
the __lt__ method is never called.

Someone can explain me why?
I have here a simple program which shows the behaviour:

*-------------------------------------------------------------------*

#!/usr/bin/env python

import timing

class State(object):
def __init__(self, value):
self.v = value

def generate(self):
return [self.__class__(self.v+1)]

class StateLT(State):
def __lt__ (self, other):
print 'Foo'
return self.v < other.v

class StateEQ(State):
def __eq__ (self, other):
return self.v == other.v
def test(state_cls):
print state_cls

timing.start()
initial = state_cls(0)
l = [initial]
for i in xrange(3000):

successors = l[i].generate()
successors = [s for s in successors if s not in l]

l += successors
timing.finish()
print timing.milli()

if __name__ == '__main__':
test(State)
print ' '
test(StateLT)
print ' '
test(StateEQ)

*-------------------------------------------------------------------*

On my machine the output is:

<class '__main__.State'>
406

<class '__main__.StateLT'>
4982

<class '__main__.StateEQ'>
5395

Here you can see that even with only the __lt__ method it goes 10x
slower, but __lt__ is never called as "Foo" is not printed.

I use the python 2.3.5-8 package on a debian unstable, but even the
python 2.4.3-5 package shows the same behaviour.

Jun 14 '06 #1
9 2170
Le Mercredi 14 Juin 2006 22:57, Emanuele Aina a écrit*:
Here you can see that even with only the __lt__ method it goes 10x
slower, but __lt__ is never called as "Foo" is not printed.

No, that is not what it shows. The only thing it shows is that in operator is
slow for lists, and as it iterates over the entire list untill it find an
element, it is much slower when it fails a lot.
Use dict in operator instead.

Here is a program to show this in details :

import timing

class State(object):
def __init__(self, value):
self.v = value

class StateEQ(State):
def __eq__ (self, other):
if isinstance(other, State) :
return self.v == other.v
else :
return self.v == other

class StateLT(State) :
def __lt__ (self, other):
if isinstance(other, State) :
return self.v < other.v
else :
return self.v < other

class StateLTEQ(StateEQ, StateLT) : pass

def test(state_cls):
num_elem = 3*10**4
print
print state_cls

l = [state_cls(0)]
for i in xrange(num_elem): l.append(state_cls(i+1))
print
print "in operator at the beginning of list:",
timing.start()
for i in xrange(300): i in l
timing.finish()
print timing.milli()
print "in operator at the end of list:",
timing.start()
for i in xrange(num_elem-300, num_elem): i in l
timing.finish()
print timing.milli()
print
print "converting to dict :",
timing.start()
d = {}.fromkeys(l)
timing.finish()
print timing.milli()
print "in operator for a dict for %s elements:" % (num_elem*2),
timing.start()
for i in xrange(num_elem, num_elem*2): i in d
timing.finish()
print timing.milli()

if __name__ == '__main__':
test(State)
test(StateLT)
test(StateEQ)
test(StateLTEQ)
for which the output is :

<class '__main__.State'>

in operator at the beginning of list: 1983
in operator at the end of list: 2011

converting to dict : 82
in operator for a dict for 60000 elements: 12

<class '__main__.StateLT'>

in operator at the beginning of list: 3866
in operator at the end of list: 3871

converting to dict : 332
in operator for a dict for 60000 elements: 50

<class '__main__.StateEQ'>

in operator at the beginning of list: 173
in operator at the end of list: 28249

converting to dict : 79
in operator for a dict for 60000 elements: 14

<class '__main__.StateLTEQ'>

in operator at the beginning of list: 202
in operator at the end of list: 30472

converting to dict : 68
in operator for a dict for 60000 elements: 50
Every classes that define the __eq__ operator will find quickly the elements
if they are at the beginning of the list.
If they are at the end, the in oprerator is slower in these classes becauseof
the overhead of function calls (in StateLt there is also an overhead due to
internal lookup, IMO).
Converting to dicts and looking for keys is *really* and equally fast in all
cases, it does not depend on success or failure.


--
_____________

Maric Michaud
_____________

Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
Jun 15 '06 #2
Maric Michaud spiegò:
Le Mercredi 14 Juin 2006 22:57, Emanuele Aina a écrit :
Here you can see that even with only the __lt__ method it goes 10x
slower, but __lt__ is never called as "Foo" is not printed.
No, that is not what it shows. The only thing it shows is that in operator is
slow for lists, and as it iterates over the entire list untill it find an
element, it is much slower when it fails a lot.


Yes, I now that "in" is slow for lists and that a dict is preferable,
but what I'm pointing out is that in one case is much slower without a
reason.

In my test, the one without __lt__ runs in ~500ms, while the one with
the __lt__ method runs in ~5000ms.

Just for comparision I've also tested with an overridden __eq__ and it
shows timings similar to the ones in the second test, but this time it
is expected, as "in" has to call __eq__ for every element in the list.

In the first test python uses the native equality (comparing memory
pointer) and it is fast because it is implemented in C in the
interpreter (no python function call).

In the last case python has to call __eq__ for every element, and a
python function call is obviuosly more expensive than the native
comparision.

But the __lt__ case leaves me wondering why it is slow as the __eq__
case, while using the native comparision.

What I asked is what is the difference between the first and the second
case?
Use dict in operator instead.
Yes, I know that a dict is the right datastructure for a fast "in", but
I need to keep track of the ordering, so I cannot use a dict.

Here is a program to show this in details :

import timing

class State(object):
def __init__(self, value):
self.v = value

class StateEQ(State):
def __eq__ (self, other):
if isinstance(other, State) :
return self.v == other.v
else :
return self.v == other

class StateLT(State) :
def __lt__ (self, other):
if isinstance(other, State) :
return self.v < other.v
else :
return self.v < other

class StateLTEQ(StateEQ, StateLT) : pass

def test(state_cls):
num_elem = 3*10**4
print
print state_cls

l = [state_cls(0)]
for i in xrange(num_elem): l.append(state_cls(i+1))
print
print "in operator at the beginning of list:",
timing.start()
for i in xrange(300): i in l
timing.finish()
print timing.milli()
print "in operator at the end of list:",
timing.start()
for i in xrange(num_elem-300, num_elem): i in l
timing.finish()
print timing.milli()
Sorry, but this works only when you override __eq__.

Without it it will alway scan the entire list, so you are comparing the
timings of two different things.
print
print "converting to dict :",
timing.start()
d = {}.fromkeys(l)
timing.finish()
print timing.milli()
print "in operator for a dict for %s elements:" % (num_elem*2),
timing.start()
for i in xrange(num_elem, num_elem*2): i in d
timing.finish()
print timing.milli()
As I said, that was only a test demo exploiting my observation, in my
real program I need to keep track of the order in which I add items in
the list, so I can't use a dict.
for which the output is :

<class '__main__.State'>

in operator at the beginning of list: 1983
in operator at the end of list: 2011

converting to dict : 82
in operator for a dict for 60000 elements: 12

<class '__main__.StateLT'>

in operator at the beginning of list: 3866
in operator at the end of list: 3871
Here python is using the equality comparator from "int", giving those
fast results.
I've substituted the ints with instances of state_cls:
- for i in xrange(300): i in l
+ for i in xrange(300): state_cls(i) in l

- for i in xrange(num_elem-300, num_elem): i in l
+ for i in xrange(num_elem-300, num_elem): state_cls(i) in l

Running these test, in the case of StateLT, now takes ~10000ms and
~9000ms.

[...]
Every classes that define the __eq__ operator will find quickly the elements
if they are at the beginning of the list.
If they are at the end, the in oprerator is slower in these classes because of
the overhead of function calls (in StateLt there is also an overhead due to ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ internal lookup, IMO).

^^^^^^^^^^^^^^^

Yes, that was my question! :)

But I hoped in a more exaustive answer: why python has to do this
lookup when the __lt__ method is not involved at all?

Jun 15 '06 #3
Le Jeudi 15 Juin 2006 14:14, Emanuele Aina a écrit*:
But I hoped in a more exaustive answer: why python has to do this
lookup when the __lt__ method is not involved at all?


It is not the case, that's what my program shows :

<class '__main__.StateEQ'>

in operator at the beginning of list: 173
in operator at the end of list: 28249 <- here

converting to dict : 79
in operator for a dict for 60000 elements: 14

<class '__main__.StateLTEQ'>

in operator at the beginning of list: 202
in operator at the end of list: 30472 <- and here
--
_____________

Maric Michaud
_____________

Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
Jun 15 '06 #4
Maric Michaud continuò:
But I hoped in a more exaustive answer: why python has to do this
lookup when the __lt__ method is not involved at all?


It is not the case, that's what my program shows :

<class '__main__.StateEQ'>

in operator at the beginning of list: 173
in operator at the end of list: 28249 <- here

converting to dict : 79
in operator for a dict for 60000 elements: 14

<class '__main__.StateLTEQ'>

in operator at the beginning of list: 202
in operator at the end of list: 30472 <- and here


It is very obvious that these two have similiar timings, as both call
__eq__.

I asked why the State and StateLT don't give similar results, but
StateLT is noticeably slower.

Jun 15 '06 #5
Emanuele Aina a écrit :
Maric Michaud continuò:

But I hoped in a more exaustive answer: why python has to do this
lookup when the __lt__ method is not involved at all?


It is not the case, that's what my program shows :

<class '__main__.StateEQ'>

in operator at the beginning of list: 173
in operator at the end of list: 28249 <- here

converting to dict : 79
in operator for a dict for 60000 elements: 14

<class '__main__.StateLTEQ'>

in operator at the beginning of list: 202
in operator at the end of list: 30472 <- and here

It is very obvious that these two have similiar timings, as both call
__eq__.

I asked why the State and StateLT don't give similar results, but
StateLT is noticeably slower.


Maybe because when python tries to compare 2 elements, checks if __lt__
is defined, and if it is, it checks for __gt__ ( or maybe __lteq__ ) but
since it isn't defined, it fallbacks on __cmp__. So for each comparison,
if you define only __lt__ you have some additional lookups to go through.
Jun 15 '06 #6
Le Jeudi 15 Juin 2006 14:14, Emanuele Aina a écrit*:
Every classes that define the __eq__ operator will find quickly the
elements if they are at the beginning of the list.
If they are at the end, the in oprerator is slower in these classes
because of the overhead of function calls (in StateLt there is also an
overhead due to


* * * * * * * * * * * * * * * * * ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
internal lookup, IMO).


* ^^^^^^^^^^^^^^^

Yes, that was my question! :)

But I hoped in a more exaustive answer: why python has to do this
lookup when the __lt__ method is not involved at all?


Ah, ok sorry, got your point, I didn't notice it in first time.
There is effectively an overhead when the class defines at less one of __lt__,
__gt__, __neq__ ... special attributes (comparison attributes), it seems in
my example that this overhead is a ratio of 1,5x.

It is related to comparison, as the following code show :
import timing

class State(object):
def __init__(self, value):
self.v = value

class StateFoo(State) :
def foo(self) : pass

class StateNew(State) :
def __nonzero__(self) : return None

class StateGT(State) :
def __gt__(self) : return None

class StateGTLT(StateGT) :
def __lt__(self) : return None

def test(cls) :
print cls
inst = cls(5)
timing.start()
for i in xrange(10**6) : i == inst
timing.finish()
print timing.milli()

test(State)
test(StateFoo)
test(StateNew)
test(StateGT)
test(StateGTLT)

for which ouput is :

<class '__main__.State'>
589
<class '__main__.StateFoo'>
596
<class '__main__.StateNew'>
615
<class '__main__.StateGT'>
815
<class '__main__.StateGTLT'>
836

Can't say more about what in the internals of python produce this non
negligeable overhead, but looking at the source code of builtin comparison
(for new style classes as it seems we have not this overhead with old style
classes) should be the fist step...

--
_____________

Maric Michaud
_____________

Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
Jun 15 '06 #7
Emanuele Aina wrote:
I have some code which does a lot of "in" on lists containing objects
with no __eq__ defined.

It all goes fast until I add the __lt__() method: then I have a
slowdown comparable to the one I get using the overridden __eq__, while
the __lt__ method is never called.

Someone can explain me why?


The list's __contains__ method is very simple

for (i = 0, cmp = 0 ; cmp == 0 && i < a->ob_size; ++i)
cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a,
i),
Py_EQ);

The PyObject_RichCompareBool becomes more complex. The
relevant code occurs after the check for object identity. The next
step is

res = PyObject_RichCompare(v, w, op);

which goes into your case

/* If the types are equal, and not old-style instances, try to
get out cheap (don't bother with coercions etc.). */
if (v->ob_type == w->ob_type && !PyInstance_Check(v)) {
cmpfunc fcmp;
richcmpfunc frich = RICHCOMPARE(v->ob_type);
/* If the type has richcmp, try it first.
try_rich_compare
tries it two-sided, which is not needed since we've
a
single type only. */
if (frich != NULL) {
res = (*frich)(v, w, op);
if (res != Py_NotImplemented)
goto Done;
Py_DECREF(res);
}
/* No richcmp, or this particular richmp not
implemented.
Try 3-way cmp. */
fcmp = v->ob_type->tp_compare;

One new-style class defines '__lt__' while the other does not.

Here's where things get shaky for me. I think new-style classes
are instances of PyInstanceObject (and not PyClassObject). Instances
use 'instance_richcompare' to implement the rich comparison
between two instances. This function does the lookup for '__eq__'
in the two classes.

The tp_richcompare slot is set to instance_richcompare when any
of __lt__, __gt__, __eq_, etc. are defined in a new type. The macro-
based code looks like

TPSLOT("__lt__", tp_richcompare, slot_tp_richcompare,
richcmp_lt,
"x.__lt__(y) <==> x<y"),
TPSLOT("__le__", tp_richcompare, slot_tp_richcompare,
richcmp_le,
"x.__le__(y) <==> x<=y"),
TPSLOT("__eq__", tp_richcompare, slot_tp_richcompare,
richcmp_eq,
"x.__eq__(y) <==> x==y"),
TPSLOT("__ne__", tp_richcompare, slot_tp_richcompare,
richcmp_ne,
"x.__ne__(y) <==> x!=y"),
TPSLOT("__gt__", tp_richcompare, slot_tp_richcompare,
richcmp_gt,
"x.__gt__(y) <==> x>y"),
TPSLOT("__ge__", tp_richcompare, slot_tp_richcompare,
richcmp_ge,
"x.__ge__(y) <==> x>=y"),

So if you define "__lt__" in your object then the type gets a richcmp
function and your == test implicit in the 'in' search always incurs the
cost of figuring out that "__eq__" is not defined.
Andrew
da***@dalkescientific.com

Jun 15 '06 #8
an*********@gmail.com dettagliò:
Someone can explain me why?
The list's __contains__ method is very simple


[...]
So if you define "__lt__" in your object then the type gets a richcmp
function and your == test implicit in the 'in' search always incurs the
cost of figuring out that "__eq__" is not defined.


Thank you for the detailed explanation! :)

Do you think I should report this as a performance bug, maybe with the
'wishlist' priority, or I should accept the truth and hope for better
luck next time? ;)

Jun 15 '06 #9
Emanuele Aina wrote:
an*********@gmail.com dettagliò:
Someone can explain me why?

The list's __contains__ method is very simple


[...]
So if you define "__lt__" in your object then the type gets a richcmp
function and your == test implicit in the 'in' search always incurs the
cost of figuring out that "__eq__" is not defined.


Thank you for the detailed explanation! :)

Do you think I should report this as a performance bug, maybe with the
'wishlist' priority, or I should accept the truth and hope for better
luck next time? ;)


It certainly wouldn't hurt to report it. But I suspect it's not ever
going to get "fixed". Classes with a __lt__ but no __eq__ really aren't
that common.

STeVe
Jun 16 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Lisa S. | last post by:
Can someeone help! This works: SELECT * FROM sometable WHERE 1 IN (1, 2, 3) ; But if I want something like this: s_mylist VARCHAR2(20) := '1, 2, 3'; SELECT * FROM sometable
4
by: Påhl Melin | last post by:
I have some problems using conversion operators in C++/CLI. In my project I have two ref class:es Signal and SignalMask and I have an conversion function in Signal to convert Signal:s to...
43
by: markryde | last post by:
Hello, I saw in some open source projects a use of "!!" in "C" code; for example: in some header file #define event_pending(v) \ (!!(v)->vcpu_info->evtchn_upcall_pending & \...
11
by: fourfires.d | last post by:
Dear All, I am new to c++ and when write below code that try to call copy constructor in "=" operator overloading, it can not compile. Can anyone point out for me the reason? thanks !! ...
4
by: pradeep kaltari | last post by:
Hello friends, I have a doubt in the working of IN operator. I have a schema by name 'test_schema'. If you look at the following query the expected output is "1", but nothing is displayed. SELECT...
2
by: dhtmlkitchen | last post by:
I'm trying to learn something about JScript engine. I have the following example: <script>(function() { poop = "brown"; myFunction = function myFunction() {}
4
by: fran7 | last post by:
Hi, from help in the javascript forum I found the error in some code but need help. This bit of code works perfectly, trouble is I am writing it to a javascript function so the height needs to be in...
5
by: tinnews | last post by:
Is there any way in python to say if string1 in string2: <do something> ignoring the case of string1 and string2? I know I could use:-
5
by: Cirene | last post by:
I seem to remember seeing a solution where a databound dropdownlist (used as a FILTER for a gridview) had the 1st item as "ALL ITEMS". It used a query with a UNION. But I can't find the example....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.