473,993 Members | 2,281 Online

# efficient intersection of lists with rounding

Hi,

I have to lists that I need to find the common numbers (2nd rounded to
nearest integral) and I am wondering if there is a more efficient way of
doing it.
a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))]
[(123, 1.0), (123, 2.0), (123, 8.0)]

This works but a and b can be in the order of 30K long.

A couple of other bits of info.
- a and b are ordered smallest to largest (could bisect module be used?)
- in the future I will want to round the second number of closest 0.25
rather than whole number.

Would the sets module be more efficient?

I'm using python 2.3.

Thanks for any ideas.

Regards,

Gordon Williams

Jul 18 '05 #1
17 2804
> A couple of other bits of info.
- a and b are ordered smallest to largest (could bisect module be used?)
- in the future I will want to round the second number of closest 0.25
rather than whole number.

Would the sets module be more efficient?

I'm using python 2.3.

I'd go for something that uses the rounded versions of the lists and then
iterates the first list and lets the second "cach up". Sorry, I'm to lazy
to desribe it better, so here is the code:

a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
a = [ (i,round(j)) for i,j in a]
b = [ (i,round(j)) for i,j in b]
res = []
pos_b = 0

try:
for i, pivot in a:
while b[pos_b][1] < pivot:
pos_b += 1
while b[pos_b][1] == pivot:
res.append(b[pos_b])
pos_b += 1
except IndexError:
# If b gets exhausted somewhere
pass
print res

While it looks more complicated, it certainly is faster, as its complexity
is in O(max(len(a), len(b))) where your code was O(len(a) * len(b)) - so

The speed gain comes of course from the order of the elements. And you could
factor the rounding _into_ the loops, but thats more ugly.
--
Regards,

Diez B. Roggisch
Jul 18 '05 #2
Gordon Williams wrote:
I have to lists that I need to find the common numbers (2nd rounded to
nearest integral) and I am wondering if there is a more efficient way of
doing it.
a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) ==
(l,round(m))]
[(123, 1.0), (123, 2.0), (123, 8.0)]
[snip] Would the sets module be more efficient?

Well, in Python 2.3, I believe sets are implemented in Python while
they're implemented in C in Python 2.4. So probably not, unless you
upgrade. A 2.4 solution with sets:
a = [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b = [(123, 0.9), (123, 1.9), (123, 8.0)]
def roundedj(pairs_ iterable): .... return ((i, round(j)) for i, j in pairs_iterable)
.... set(roundedj(a) ).intersection( set(roundedj(b) ))

set([(123, 8.0), (123, 2.0), (123, 1.0)])

Steve
Jul 18 '05 #3
> A couple of other bits of info.
- a and b are ordered smallest to largest (could bisect module be used?)
- in the future I will want to round the second number of closest 0.25
rather than whole number.

Would the sets module be more efficient?

I'm using python 2.3.

I'd go for something that uses the rounded versions of the lists and then
iterates the first list and lets the second "cach up". Sorry, I'm to lazy
to desribe it better, so here is the code:

a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
a = [ (i,round(j)) for i,j in a]
b = [ (i,round(j)) for i,j in b]
res = []
pos_b = 0

try:
for i, pivot in a:
while b[pos_b][1] < pivot:
pos_b += 1
while b[pos_b][1] == pivot:
res.append(b[pos_b])
pos_b += 1
except IndexError:
# If b gets exhausted somewhere
pass
print res

While it looks more complicated, it certainly is faster, as its complexity
is in O(max(len(a), len(b))) where your code was O(len(a) * len(b)) - so

The speed gain comes of course from the order of the elements. And you could
factor the rounding _into_ the loops, but thats more ugly.
--
Regards,

Diez B. Roggisch
Jul 18 '05 #4
Gordon Williams wrote:
I have to lists that I need to find the common numbers (2nd rounded to
nearest integral) and I am wondering if there is a more efficient way of
doing it.
a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) ==
(l,round(m))]
[(123, 1.0), (123, 2.0), (123, 8.0)]
[snip] Would the sets module be more efficient?

Well, in Python 2.3, I believe sets are implemented in Python while
they're implemented in C in Python 2.4. So probably not, unless you
upgrade. A 2.4 solution with sets:
a = [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b = [(123, 0.9), (123, 1.9), (123, 8.0)]
def roundedj(pairs_ iterable): .... return ((i, round(j)) for i, j in pairs_iterable)
.... set(roundedj(a) ).intersection( set(roundedj(b) ))

set([(123, 8.0), (123, 2.0), (123, 1.0)])

Steve
Jul 18 '05 #5
Steven Bethard wrote:
Well, in Python 2.3, I believe sets are implemented in Python while
they're implemented in C in Python 2.4.

I think the Python 2.3 Sets implementation is likely to be quicker than
only one way to find out ;)
--
Michael Hoffman
Jul 18 '05 #6
Steven Bethard wrote:
Well, in Python 2.3, I believe sets are implemented in Python while
they're implemented in C in Python 2.4.

I think the Python 2.3 Sets implementation is likely to be quicker than
only one way to find out ;)
--
Michael Hoffman
Jul 18 '05 #7
Gordon Williams wrote:
a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) ==
(l,round(m))]

d = {}
for (l, m) in b:
d[l, round(m)] = 1

result = []
for (i, j) in a:
t = (i, round(j))
if t in d:
result.append(t )
- in the future I will want to round the second number of closest 0.25
rather than whole number.
I would do that by multiplying by 4 and rounding to
an integer to derive the dictionary key. That will
avoid any float-representation problems you might have
by trying to round to a fraction.
Would the sets module be more efficient?

As another poster said, sets are implemented as dicts
in 2.3, so it comes down to much the same thing. Using
sets might be a bit faster than the above code in 2.4,
but probably not greatly so. By far the biggest
improvement will come from using an O(n) algorithm

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg

Jul 18 '05 #8
Gordon Williams wrote:
a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) ==
(l,round(m))]

d = {}
for (l, m) in b:
d[l, round(m)] = 1

result = []
for (i, j) in a:
t = (i, round(j))
if t in d:
result.append(t )
- in the future I will want to round the second number of closest 0.25
rather than whole number.
I would do that by multiplying by 4 and rounding to
an integer to derive the dictionary key. That will
avoid any float-representation problems you might have
by trying to round to a fraction.
Would the sets module be more efficient?

As another poster said, sets are implemented as dicts
in 2.3, so it comes down to much the same thing. Using
sets might be a bit faster than the above code in 2.4,
but probably not greatly so. By far the biggest
improvement will come from using an O(n) algorithm

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg

Jul 18 '05 #9
On Thu, 2004-12-02 at 22:16, Greg Ewing wrote:
Gordon Williams wrote:
>a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
>b= [(123, 0.9), (123, 1.9), (123, 8.0)]
>[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) ==

(l,round(m))]

d = {}
for (l, m) in b:
d[l, round(m)] = 1

result = []
for (i, j) in a:
t = (i, round(j))
if t in d:
result.append(t )
- in the future I will want to round the second number of closest 0.25
rather than whole number.

I would do that by multiplying by 4 and rounding to
an integer to derive the dictionary key. That will
avoid any float-representation problems you might have
by trying to round to a fraction.
Would the sets module be more efficient?

As another poster said, sets are implemented as dicts
in 2.3, so it comes down to much the same thing. Using
sets might be a bit faster than the above code in 2.4,
but probably not greatly so. By far the biggest
improvement will come from using an O(n) algorithm

Of course a low O-factor is important; you should avoid however
confusing the statement of what you want to do with the statement of how
you want to do it. One of the benefits of a HLL like Python is you can
merely state *what* you want without worrying about how to compute it.

In the original example above you are computing a set intersection -
python's set object has an intersection method. Use it. Not only is it
faster than your O**2 solution, but it is a good deal clearer.
from sets import Set
set_a = Set( [(i,round(j)) for i,j in a] )
set_b = Set( [(i,round(j)) for i,j in b] )
set_a.intersect ion( set_b ) Set([(123, 2.0), (123, 1.0), (123, 8.0)])

Or you could say ...
set_a, set_b = [[Set((i,round(j) )) for i,j in s] for s in (a,b )]