I have some code which does a lot of "in" on lists containing objects
with no __eq__ defined.
It all goes fast until I add the __lt__() method: then I have a
slowdown comparable to the one I get using the overridden __eq__, while
the __lt__ method is never called.
Someone can explain me why?
I have here a simple program which shows the behaviour:
*-------------------------------------------------------------------*
#!/usr/bin/env python
import timing
class State(object):
def __init__(self, value):
self.v = value
def generate(self):
return [self.__class__(self.v+1)]
class StateLT(State):
def __lt__ (self, other):
print 'Foo'
return self.v < other.v
class StateEQ(State):
def __eq__ (self, other):
return self.v == other.v
def test(state_cls):
print state_cls
timing.start()
initial = state_cls(0)
l = [initial]
for i in xrange(3000):
successors = l[i].generate()
successors = [s for s in successors if s not in l]
l += successors
timing.finish()
print timing.milli()
if __name__ == '__main__':
test(State)
print ' '
test(StateLT)
print ' '
test(StateEQ)
*-------------------------------------------------------------------*
On my machine the output is:
<class '__main__.State'>
406
<class '__main__.StateLT'>
4982
<class '__main__.StateEQ'>
5395
Here you can see that even with only the __lt__ method it goes 10x
slower, but __lt__ is never called as "Foo" is not printed.
I use the python 2.3.5-8 package on a debian unstable, but even the
python 2.4.3-5 package shows the same behaviour. 9 2170
Le Mercredi 14 Juin 2006 22:57, Emanuele Aina a écrit*: Here you can see that even with only the __lt__ method it goes 10x slower, but __lt__ is never called as "Foo" is not printed.
No, that is not what it shows. The only thing it shows is that in operator is
slow for lists, and as it iterates over the entire list untill it find an
element, it is much slower when it fails a lot.
Use dict in operator instead.
Here is a program to show this in details :
import timing
class State(object):
def __init__(self, value):
self.v = value
class StateEQ(State):
def __eq__ (self, other):
if isinstance(other, State) :
return self.v == other.v
else :
return self.v == other
class StateLT(State) :
def __lt__ (self, other):
if isinstance(other, State) :
return self.v < other.v
else :
return self.v < other
class StateLTEQ(StateEQ, StateLT) : pass
def test(state_cls):
num_elem = 3*10**4
print
print state_cls
l = [state_cls(0)]
for i in xrange(num_elem): l.append(state_cls(i+1))
print
print "in operator at the beginning of list:",
timing.start()
for i in xrange(300): i in l
timing.finish()
print timing.milli()
print "in operator at the end of list:",
timing.start()
for i in xrange(num_elem-300, num_elem): i in l
timing.finish()
print timing.milli()
print
print "converting to dict :",
timing.start()
d = {}.fromkeys(l)
timing.finish()
print timing.milli()
print "in operator for a dict for %s elements:" % (num_elem*2),
timing.start()
for i in xrange(num_elem, num_elem*2): i in d
timing.finish()
print timing.milli()
if __name__ == '__main__':
test(State)
test(StateLT)
test(StateEQ)
test(StateLTEQ)
for which the output is :
<class '__main__.State'>
in operator at the beginning of list: 1983
in operator at the end of list: 2011
converting to dict : 82
in operator for a dict for 60000 elements: 12
<class '__main__.StateLT'>
in operator at the beginning of list: 3866
in operator at the end of list: 3871
converting to dict : 332
in operator for a dict for 60000 elements: 50
<class '__main__.StateEQ'>
in operator at the beginning of list: 173
in operator at the end of list: 28249
converting to dict : 79
in operator for a dict for 60000 elements: 14
<class '__main__.StateLTEQ'>
in operator at the beginning of list: 202
in operator at the end of list: 30472
converting to dict : 68
in operator for a dict for 60000 elements: 50
Every classes that define the __eq__ operator will find quickly the elements
if they are at the beginning of the list.
If they are at the end, the in oprerator is slower in these classes becauseof
the overhead of function calls (in StateLt there is also an overhead due to
internal lookup, IMO).
Converting to dicts and looking for keys is *really* and equally fast in all
cases, it does not depend on success or failure.
--
_____________
Maric Michaud
_____________
Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
Maric Michaud spiegò: Le Mercredi 14 Juin 2006 22:57, Emanuele Aina a écrit : Here you can see that even with only the __lt__ method it goes 10x slower, but __lt__ is never called as "Foo" is not printed. No, that is not what it shows. The only thing it shows is that in operator is slow for lists, and as it iterates over the entire list untill it find an element, it is much slower when it fails a lot.
Yes, I now that "in" is slow for lists and that a dict is preferable,
but what I'm pointing out is that in one case is much slower without a
reason.
In my test, the one without __lt__ runs in ~500ms, while the one with
the __lt__ method runs in ~5000ms.
Just for comparision I've also tested with an overridden __eq__ and it
shows timings similar to the ones in the second test, but this time it
is expected, as "in" has to call __eq__ for every element in the list.
In the first test python uses the native equality (comparing memory
pointer) and it is fast because it is implemented in C in the
interpreter (no python function call).
In the last case python has to call __eq__ for every element, and a
python function call is obviuosly more expensive than the native
comparision.
But the __lt__ case leaves me wondering why it is slow as the __eq__
case, while using the native comparision.
What I asked is what is the difference between the first and the second
case?
Use dict in operator instead.
Yes, I know that a dict is the right datastructure for a fast "in", but
I need to keep track of the ordering, so I cannot use a dict.
Here is a program to show this in details :
import timing
class State(object): def __init__(self, value): self.v = value
class StateEQ(State): def __eq__ (self, other): if isinstance(other, State) : return self.v == other.v else : return self.v == other
class StateLT(State) : def __lt__ (self, other): if isinstance(other, State) : return self.v < other.v else : return self.v < other
class StateLTEQ(StateEQ, StateLT) : pass
def test(state_cls): num_elem = 3*10**4 print print state_cls
l = [state_cls(0)] for i in xrange(num_elem): l.append(state_cls(i+1)) print print "in operator at the beginning of list:", timing.start() for i in xrange(300): i in l timing.finish() print timing.milli() print "in operator at the end of list:", timing.start() for i in xrange(num_elem-300, num_elem): i in l timing.finish() print timing.milli()
Sorry, but this works only when you override __eq__.
Without it it will alway scan the entire list, so you are comparing the
timings of two different things.
print print "converting to dict :", timing.start() d = {}.fromkeys(l) timing.finish() print timing.milli() print "in operator for a dict for %s elements:" % (num_elem*2), timing.start() for i in xrange(num_elem, num_elem*2): i in d timing.finish() print timing.milli()
As I said, that was only a test demo exploiting my observation, in my
real program I need to keep track of the order in which I add items in
the list, so I can't use a dict.
for which the output is :
<class '__main__.State'>
in operator at the beginning of list: 1983 in operator at the end of list: 2011
converting to dict : 82 in operator for a dict for 60000 elements: 12
<class '__main__.StateLT'>
in operator at the beginning of list: 3866 in operator at the end of list: 3871
Here python is using the equality comparator from "int", giving those
fast results.
I've substituted the ints with instances of state_cls:
- for i in xrange(300): i in l
+ for i in xrange(300): state_cls(i) in l
- for i in xrange(num_elem-300, num_elem): i in l
+ for i in xrange(num_elem-300, num_elem): state_cls(i) in l
Running these test, in the case of StateLT, now takes ~10000ms and
~9000ms.
[...]
Every classes that define the __eq__ operator will find quickly the elements if they are at the beginning of the list. If they are at the end, the in oprerator is slower in these classes because of the overhead of function calls (in StateLt there is also an overhead due to
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ internal lookup, IMO).
^^^^^^^^^^^^^^^
Yes, that was my question! :)
But I hoped in a more exaustive answer: why python has to do this
lookup when the __lt__ method is not involved at all?
Le Jeudi 15 Juin 2006 14:14, Emanuele Aina a écrit*: But I hoped in a more exaustive answer: why python has to do this lookup when the __lt__ method is not involved at all?
It is not the case, that's what my program shows :
<class '__main__.StateEQ'>
in operator at the beginning of list: 173
in operator at the end of list: 28249 <- here
converting to dict : 79
in operator for a dict for 60000 elements: 14
<class '__main__.StateLTEQ'>
in operator at the beginning of list: 202
in operator at the end of list: 30472 <- and here
--
_____________
Maric Michaud
_____________
Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
Maric Michaud continuò: But I hoped in a more exaustive answer: why python has to do this lookup when the __lt__ method is not involved at all?
It is not the case, that's what my program shows :
<class '__main__.StateEQ'>
in operator at the beginning of list: 173 in operator at the end of list: 28249 <- here
converting to dict : 79 in operator for a dict for 60000 elements: 14
<class '__main__.StateLTEQ'>
in operator at the beginning of list: 202 in operator at the end of list: 30472 <- and here
It is very obvious that these two have similiar timings, as both call
__eq__.
I asked why the State and StateLT don't give similar results, but
StateLT is noticeably slower.
Emanuele Aina a écrit : Maric Michaud continuò:
But I hoped in a more exaustive answer: why python has to do this lookup when the __lt__ method is not involved at all?
It is not the case, that's what my program shows :
<class '__main__.StateEQ'>
in operator at the beginning of list: 173 in operator at the end of list: 28249 <- here
converting to dict : 79 in operator for a dict for 60000 elements: 14
<class '__main__.StateLTEQ'>
in operator at the beginning of list: 202 in operator at the end of list: 30472 <- and here
It is very obvious that these two have similiar timings, as both call __eq__.
I asked why the State and StateLT don't give similar results, but StateLT is noticeably slower.
Maybe because when python tries to compare 2 elements, checks if __lt__
is defined, and if it is, it checks for __gt__ ( or maybe __lteq__ ) but
since it isn't defined, it fallbacks on __cmp__. So for each comparison,
if you define only __lt__ you have some additional lookups to go through.
Le Jeudi 15 Juin 2006 14:14, Emanuele Aina a écrit*: Every classes that define the __eq__ operator will find quickly the elements if they are at the beginning of the list. If they are at the end, the in oprerator is slower in these classes because of the overhead of function calls (in StateLt there is also an overhead due to
* * * * * * * * * * * * * * * * * ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
internal lookup, IMO).
* ^^^^^^^^^^^^^^^
Yes, that was my question! :)
But I hoped in a more exaustive answer: why python has to do this lookup when the __lt__ method is not involved at all?
Ah, ok sorry, got your point, I didn't notice it in first time.
There is effectively an overhead when the class defines at less one of __lt__,
__gt__, __neq__ ... special attributes (comparison attributes), it seems in
my example that this overhead is a ratio of 1,5x.
It is related to comparison, as the following code show :
import timing
class State(object):
def __init__(self, value):
self.v = value
class StateFoo(State) :
def foo(self) : pass
class StateNew(State) :
def __nonzero__(self) : return None
class StateGT(State) :
def __gt__(self) : return None
class StateGTLT(StateGT) :
def __lt__(self) : return None
def test(cls) :
print cls
inst = cls(5)
timing.start()
for i in xrange(10**6) : i == inst
timing.finish()
print timing.milli()
test(State)
test(StateFoo)
test(StateNew)
test(StateGT)
test(StateGTLT)
for which ouput is :
<class '__main__.State'>
589
<class '__main__.StateFoo'>
596
<class '__main__.StateNew'>
615
<class '__main__.StateGT'>
815
<class '__main__.StateGTLT'>
836
Can't say more about what in the internals of python produce this non
negligeable overhead, but looking at the source code of builtin comparison
(for new style classes as it seems we have not this overhead with old style
classes) should be the fist step...
--
_____________
Maric Michaud
_____________
Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
Emanuele Aina wrote: I have some code which does a lot of "in" on lists containing objects with no __eq__ defined.
It all goes fast until I add the __lt__() method: then I have a slowdown comparable to the one I get using the overridden __eq__, while the __lt__ method is never called.
Someone can explain me why?
The list's __contains__ method is very simple
for (i = 0, cmp = 0 ; cmp == 0 && i < a->ob_size; ++i)
cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a,
i),
Py_EQ);
The PyObject_RichCompareBool becomes more complex. The
relevant code occurs after the check for object identity. The next
step is
res = PyObject_RichCompare(v, w, op);
which goes into your case
/* If the types are equal, and not old-style instances, try to
get out cheap (don't bother with coercions etc.). */
if (v->ob_type == w->ob_type && !PyInstance_Check(v)) {
cmpfunc fcmp;
richcmpfunc frich = RICHCOMPARE(v->ob_type);
/* If the type has richcmp, try it first.
try_rich_compare
tries it two-sided, which is not needed since we've
a
single type only. */
if (frich != NULL) {
res = (*frich)(v, w, op);
if (res != Py_NotImplemented)
goto Done;
Py_DECREF(res);
}
/* No richcmp, or this particular richmp not
implemented.
Try 3-way cmp. */
fcmp = v->ob_type->tp_compare;
One new-style class defines '__lt__' while the other does not.
Here's where things get shaky for me. I think new-style classes
are instances of PyInstanceObject (and not PyClassObject). Instances
use 'instance_richcompare' to implement the rich comparison
between two instances. This function does the lookup for '__eq__'
in the two classes.
The tp_richcompare slot is set to instance_richcompare when any
of __lt__, __gt__, __eq_, etc. are defined in a new type. The macro-
based code looks like
TPSLOT("__lt__", tp_richcompare, slot_tp_richcompare,
richcmp_lt,
"x.__lt__(y) <==> x<y"),
TPSLOT("__le__", tp_richcompare, slot_tp_richcompare,
richcmp_le,
"x.__le__(y) <==> x<=y"),
TPSLOT("__eq__", tp_richcompare, slot_tp_richcompare,
richcmp_eq,
"x.__eq__(y) <==> x==y"),
TPSLOT("__ne__", tp_richcompare, slot_tp_richcompare,
richcmp_ne,
"x.__ne__(y) <==> x!=y"),
TPSLOT("__gt__", tp_richcompare, slot_tp_richcompare,
richcmp_gt,
"x.__gt__(y) <==> x>y"),
TPSLOT("__ge__", tp_richcompare, slot_tp_richcompare,
richcmp_ge,
"x.__ge__(y) <==> x>=y"),
So if you define "__lt__" in your object then the type gets a richcmp
function and your == test implicit in the 'in' search always incurs the
cost of figuring out that "__eq__" is not defined.
Andrew da***@dalkescientific.com an*********@gmail.com dettagliò: Someone can explain me why? The list's __contains__ method is very simple
[...]
So if you define "__lt__" in your object then the type gets a richcmp function and your == test implicit in the 'in' search always incurs the cost of figuring out that "__eq__" is not defined.
Thank you for the detailed explanation! :)
Do you think I should report this as a performance bug, maybe with the
'wishlist' priority, or I should accept the truth and hope for better
luck next time? ;)
Emanuele Aina wrote: an*********@gmail.com dettagliò:
Someone can explain me why? The list's __contains__ method is very simple
[...]
So if you define "__lt__" in your object then the type gets a richcmp function and your == test implicit in the 'in' search always incurs the cost of figuring out that "__eq__" is not defined.
Thank you for the detailed explanation! :)
Do you think I should report this as a performance bug, maybe with the 'wishlist' priority, or I should accept the truth and hope for better luck next time? ;)
It certainly wouldn't hurt to report it. But I suspect it's not ever
going to get "fixed". Classes with a __lt__ but no __eq__ really aren't
that common.
STeVe This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Lisa S. |
last post by:
Can someeone help! This works:
SELECT * FROM sometable
WHERE 1 IN (1, 2, 3) ;
But if I want something like this:
s_mylist VARCHAR2(20) := '1, 2, 3';
SELECT * FROM sometable
|
by: Påhl Melin |
last post by:
I have some problems using conversion operators in C++/CLI. In my
project I have two ref class:es Signal and SignalMask and I have an
conversion function in Signal to convert Signal:s to...
|
by: markryde |
last post by:
Hello,
I saw in some open source projects a use of "!!" in "C" code;
for example:
in some header file
#define event_pending(v) \
(!!(v)->vcpu_info->evtchn_upcall_pending & \...
|
by: fourfires.d |
last post by:
Dear All,
I am new to c++ and when write below code that try to call copy
constructor
in "=" operator overloading, it can not compile. Can anyone point out
for
me the reason? thanks !!
...
|
by: pradeep kaltari |
last post by:
Hello friends,
I have a doubt in the working of IN operator. I have a schema by name 'test_schema'. If you look at the following query the expected output is "1", but nothing is displayed.
SELECT...
|
by: dhtmlkitchen |
last post by:
I'm trying to learn something about JScript engine.
I have the following example:
<script>(function() {
poop = "brown";
myFunction = function myFunction() {}
|
by: fran7 |
last post by:
Hi, from help in the javascript forum I found the error in some code but need help. This bit of code works perfectly, trouble is I am writing it to a javascript function so the height needs to be in...
|
by: tinnews |
last post by:
Is there any way in python to say
if string1 in string2:
<do something>
ignoring the case of string1 and string2?
I know I could use:-
|
by: Cirene |
last post by:
I seem to remember seeing a solution where a databound dropdownlist (used as
a FILTER for a gridview) had the 1st item as "ALL ITEMS". It used a query
with a UNION. But I can't find the example....
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
|
by: ryjfgjl |
last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
| |