I have rather simple 'Address' object that contains streetname,
number, my own status and x,y coordinates for it. I have two lists
both containing approximately 30000 addresses.
I've defined __eq__ method in my class like this:
def __eq__(self, other):
return self.xcoord == other.xcoord and \
self.ycoord == other.ycoord and \
self.streetname == other.streetname and \
self.streetno == other.streetno
But it turns out to be very, very slow.
Then I setup two lists:
list_external = getexternal()
list_internal = getinternal()
Now I need get all all addresses from 'list_external' that are not in
'list_internal', and mark them as "new".
I did it like this:
for addr in list_external:
if addr not in list_internal:
addr.status = 1 # New address
But in my case running that loop takes about 10 minutes. What I am
doing wrong?
--
Jani Tiainen 7 1213
Jani Tiainen <re*****@gmail.comwrites:
for addr in list_external:
if addr not in list_internal:
addr.status = 1 # New address
But in my case running that loop takes about 10 minutes. What I am
doing wrong?
The nested loop takes time proportional to the product of the number
of elements in the loops. To avoid it, convert the inner loop to a
set, which can be looked up in constant time:
internal = set(list_internal)
for addr in list_external:
if addr in internal:
addr.status = 1
Jani Tiainen wrote:
I have rather simple 'Address' object that contains streetname,
number, my own status and x,y coordinates for it. I have two lists
both containing approximately 30000 addresses.
I've defined __eq__ method in my class like this:
def __eq__(self, other):
return self.xcoord == other.xcoord and \
self.ycoord == other.ycoord and \
self.streetname == other.streetname and \
self.streetno == other.streetno
But it turns out to be very, very slow.
Then I setup two lists:
list_external = getexternal()
list_internal = getinternal()
Now I need get all all addresses from 'list_external' that are not in
'list_internal', and mark them as "new".
I did it like this:
for addr in list_external:
if addr not in list_internal:
addr.status = 1 # New address
But in my case running that loop takes about 10 minutes. What I am
doing wrong?
Even if list_external and list_internal contain the same items you need
about len(list_external)*(len(list_internal)/2), or 45 million comparisons.
You can bring that down to 30000*some_small_factor if you make your
addresses hashable and put the internal ones into a dict or set
def __hash__(self):
return hash((self.xcoord, self.yccord, self.streetname, self.streetno))
def __eq__(self, other):
# as above
Then
set_internal = set(list_internal)
for addr in list_external:
if addr not in set_internal:
addr.status = 1
Note that the attributes relevant for hash and equality must not change
during this process.
Peter
Hrvoje Niksic:
internal = set(list_internal)
....
To do that the original poster may have to define a __hash__ and
__eq__ methods in his/her class.
Bye,
bearophile
Jani Tiainen a écrit :
I have rather simple 'Address' object that contains streetname,
number, my own status and x,y coordinates for it. I have two lists
both containing approximately 30000 addresses.
I've defined __eq__ method in my class like this:
def __eq__(self, other):
return self.xcoord == other.xcoord and \
self.ycoord == other.ycoord and \
self.streetname == other.streetname and \
self.streetno == other.streetno
But it turns out to be very, very slow.
Then I setup two lists:
list_external = getexternal()
list_internal = getinternal()
Now I need get all all addresses from 'list_external' that are not in
'list_internal', and mark them as "new".
I did it like this:
for addr in list_external:
if addr not in list_internal:
addr.status = 1 # New address
But in my case running that loop takes about 10 minutes. What I am
doing wrong?
mmm... not using sets ?
On 23 loka, 15:24, Peter Otten <__pete...@web.dewrote:
Jani Tiainen wrote:
I have rather simple 'Address' object that contains streetname,
number, my own status and x,y coordinates for it. I have two lists
both containing approximately 30000 addresses.
I've defined __eq__ method in my class like this:
* * def __eq__(self, other):
* * * * return self.xcoord == other.xcoord and \
* * * * * * self.ycoord == other.ycoord and \
* * * * * * self.streetname == other.streetname and \
* * * * * * self.streetno == other.streetno
But it turns out to be very, very slow.
Then I setup two lists:
list_external = getexternal()
list_internal = getinternal()
Now I need get all all addresses from 'list_external' that are not in
'list_internal', and mark them as "new".
I did it like this:
for addr in list_external:
* * if addr not in list_internal:
* * * * addr.status = 1 # New address
But in my case running that loop takes about 10 minutes. What I am
doing wrong?
Even if list_external and list_internal contain the same items you need
about len(list_external)*(len(list_internal)/2), or 45 million comparisons.
You can bring that down to 30000*some_small_factor if you make your
addresses hashable and put the internal ones into a dict or set
def __hash__(self):
* *return hash((self.xcoord, self.yccord, self.streetname, self.streetno))
def __eq__(self, other):
* *# as above
Then
set_internal = set(list_internal)
for addr in list_external:
* * if addr not in set_internal:
* * * * addr.status = 1
Note that the attributes relevant for hash and equality must not change
during this process.
Very complete answer, thank you very much.
I tried that hash approach and sets but it seemed to get wrong results
first time and it was all due my hash method.
Now it takes like 2-3 seconds to do all that stuff and result seem to
be correct. Apparently I have lot to learn about Python... :)
--
Jani Tiainen be************@lycos.com writes:
>internal = set(list_internal)
...
To do that the original poster may have to define a __hash__ and
__eq__ methods in his/her class.
You're right. The OP states he implements __eq__, so he also needs a
matching __hash__, such as:
def __hash__(self, other):
return (hash(self.xcoord) ^ hash(self.ycoord) ^
hash(self.streetname) ^ hash(self.streetno))
On Thu, 23 Oct 2008 05:03:26 -0700, Jani Tiainen wrote:
But in my case running that loop takes about 10 minutes. What I am doing
wrong?
Others have already suggested you have a O(N**2) algorithm. Here's an
excellent article that explains more about them: http://www.joelonsoftware.com/articl...000000319.html
The article is biased towards low-level languages like C. In Python the
way to avoid them is to use sets or dicts, or to keep the list sorted and
use the bisect module.
--
Steven This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Craig Keightley |
last post by:
is it possible to compare acomma separated list aginst another
eg comma list 1 => 1,2,3,4,5
comma list 2 => 3,5
can you check that 3 is in both, and 5 is in both, therfore they match???
the...
|
by: Shay |
last post by:
essentially I am trying to do some counts based on some
assumptions in the recordset. So I get the RS back, put
the values into a variable, move to the next record in the
RS and compare what is in...
|
by: Tim Fountain |
last post by:
We've recently enabled slow query logging on a server and it's proving
interesting seeing which queries are bogging things down. This one is
puzzling me a little:
SELECT articleid, type,...
|
by: diffuser78 |
last post by:
I have just started to learn python. Some said that its slow. Can
somebody pin point the issue.
Thans
|
by: taylorcarr |
last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| |