459,296 Members | 1,457 Online Need help? Post your question and get tips & solutions from a community of 459,296 IT Pros & Developers. It's quick & easy.

# efficient intersection of lists with rounding

 P: n/a Hi, I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it. a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b= [(123, 0.9), (123, 1.9), (123, 8.0)] [ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))] [(123, 1.0), (123, 2.0), (123, 8.0)] This works but a and b can be in the order of 30K long. A couple of other bits of info. - a and b are ordered smallest to largest (could bisect module be used?) - in the future I will want to round the second number of closest 0.25 rather than whole number. Would the sets module be more efficient? I'm using python 2.3. Thanks for any ideas. Regards, Gordon Williams Jul 18 '05 #1
17 Replies

 P: n/a > A couple of other bits of info. - a and b are ordered smallest to largest (could bisect module be used?) - in the future I will want to round the second number of closest 0.25 rather than whole number. Would the sets module be more efficient? I'm using python 2.3. I'd go for something that uses the rounded versions of the lists and then iterates the first list and lets the second "cach up". Sorry, I'm to lazy to desribe it better, so here is the code: a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b= [(123, 0.9), (123, 1.9), (123, 8.0)] a = [ (i,round(j)) for i,j in a] b = [ (i,round(j)) for i,j in b] res = [] pos_b = 0 try: for i, pivot in a: while b[pos_b] < pivot: pos_b += 1 while b[pos_b] == pivot: res.append(b[pos_b]) pos_b += 1 except IndexError: # If b gets exhausted somewhere pass print res While it looks more complicated, it certainly is faster, as its complexity is in O(max(len(a), len(b))) where your code was O(len(a) * len(b)) - so usually more or less quadratic. The speed gain comes of course from the order of the elements. And you could factor the rounding _into_ the loops, but thats more ugly. -- Regards, Diez B. Roggisch Jul 18 '05 #2

 P: n/a Gordon Williams wrote: I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it.a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)]b= [(123, 0.9), (123, 1.9), (123, 8.0)][ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))] [(123, 1.0), (123, 2.0), (123, 8.0)] [snip] Would the sets module be more efficient? Well, in Python 2.3, I believe sets are implemented in Python while they're implemented in C in Python 2.4. So probably not, unless you upgrade. A 2.4 solution with sets: a = [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b = [(123, 0.9), (123, 1.9), (123, 8.0)] def roundedj(pairs_iterable): .... return ((i, round(j)) for i, j in pairs_iterable) .... set(roundedj(a)).intersection(set(roundedj(b))) set([(123, 8.0), (123, 2.0), (123, 1.0)]) Steve Jul 18 '05 #3

 P: n/a > A couple of other bits of info. - a and b are ordered smallest to largest (could bisect module be used?) - in the future I will want to round the second number of closest 0.25 rather than whole number. Would the sets module be more efficient? I'm using python 2.3. I'd go for something that uses the rounded versions of the lists and then iterates the first list and lets the second "cach up". Sorry, I'm to lazy to desribe it better, so here is the code: a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b= [(123, 0.9), (123, 1.9), (123, 8.0)] a = [ (i,round(j)) for i,j in a] b = [ (i,round(j)) for i,j in b] res = [] pos_b = 0 try: for i, pivot in a: while b[pos_b] < pivot: pos_b += 1 while b[pos_b] == pivot: res.append(b[pos_b]) pos_b += 1 except IndexError: # If b gets exhausted somewhere pass print res While it looks more complicated, it certainly is faster, as its complexity is in O(max(len(a), len(b))) where your code was O(len(a) * len(b)) - so usually more or less quadratic. The speed gain comes of course from the order of the elements. And you could factor the rounding _into_ the loops, but thats more ugly. -- Regards, Diez B. Roggisch Jul 18 '05 #4

 P: n/a Gordon Williams wrote: I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it.a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)]b= [(123, 0.9), (123, 1.9), (123, 8.0)][ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))] [(123, 1.0), (123, 2.0), (123, 8.0)] [snip] Would the sets module be more efficient? Well, in Python 2.3, I believe sets are implemented in Python while they're implemented in C in Python 2.4. So probably not, unless you upgrade. A 2.4 solution with sets: a = [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b = [(123, 0.9), (123, 1.9), (123, 8.0)] def roundedj(pairs_iterable): .... return ((i, round(j)) for i, j in pairs_iterable) .... set(roundedj(a)).intersection(set(roundedj(b))) set([(123, 8.0), (123, 2.0), (123, 1.0)]) Steve Jul 18 '05 #5

 P: n/a Steven Bethard wrote: Well, in Python 2.3, I believe sets are implemented in Python while they're implemented in C in Python 2.4. I think the Python 2.3 Sets implementation is likely to be quicker than whatever list-manipulation answer you come up with instead. But there's only one way to find out ;) -- Michael Hoffman Jul 18 '05 #6

 P: n/a Steven Bethard wrote: Well, in Python 2.3, I believe sets are implemented in Python while they're implemented in C in Python 2.4. I think the Python 2.3 Sets implementation is likely to be quicker than whatever list-manipulation answer you come up with instead. But there's only one way to find out ;) -- Michael Hoffman Jul 18 '05 #7

 P: n/a Gordon Williams wrote:a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)]b= [(123, 0.9), (123, 1.9), (123, 8.0)][ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))] d = {} for (l, m) in b: d[l, round(m)] = 1 result = [] for (i, j) in a: t = (i, round(j)) if t in d: result.append(t) - in the future I will want to round the second number of closest 0.25 rather than whole number. I would do that by multiplying by 4 and rounding to an integer to derive the dictionary key. That will avoid any float-representation problems you might have by trying to round to a fraction. Would the sets module be more efficient? As another poster said, sets are implemented as dicts in 2.3, so it comes down to much the same thing. Using sets might be a bit faster than the above code in 2.4, but probably not greatly so. By far the biggest improvement will come from using an O(n) algorithm instead of an O(n**2) one. -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand http://www.cosc.canterbury.ac.nz/~greg Jul 18 '05 #8

 P: n/a Gordon Williams wrote:a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)]b= [(123, 0.9), (123, 1.9), (123, 8.0)][ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))] d = {} for (l, m) in b: d[l, round(m)] = 1 result = [] for (i, j) in a: t = (i, round(j)) if t in d: result.append(t) - in the future I will want to round the second number of closest 0.25 rather than whole number. I would do that by multiplying by 4 and rounding to an integer to derive the dictionary key. That will avoid any float-representation problems you might have by trying to round to a fraction. Would the sets module be more efficient? As another poster said, sets are implemented as dicts in 2.3, so it comes down to much the same thing. Using sets might be a bit faster than the above code in 2.4, but probably not greatly so. By far the biggest improvement will come from using an O(n) algorithm instead of an O(n**2) one. -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand http://www.cosc.canterbury.ac.nz/~greg Jul 18 '05 #9

 P: n/a On Thu, 2004-12-02 at 22:16, Greg Ewing wrote: Gordon Williams wrote:>a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)]>b= [(123, 0.9), (123, 1.9), (123, 8.0)]>[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))] d = {} for (l, m) in b: d[l, round(m)] = 1 result = [] for (i, j) in a: t = (i, round(j)) if t in d: result.append(t) - in the future I will want to round the second number of closest 0.25 rather than whole number. I would do that by multiplying by 4 and rounding to an integer to derive the dictionary key. That will avoid any float-representation problems you might have by trying to round to a fraction. Would the sets module be more efficient? As another poster said, sets are implemented as dicts in 2.3, so it comes down to much the same thing. Using sets might be a bit faster than the above code in 2.4, but probably not greatly so. By far the biggest improvement will come from using an O(n) algorithm instead of an O(n**2) one. Of course a low O-factor is important; you should avoid however confusing the statement of what you want to do with the statement of how you want to do it. One of the benefits of a HLL like Python is you can merely state *what* you want without worrying about how to compute it. In the original example above you are computing a set intersection - python's set object has an intersection method. Use it. Not only is it faster than your O**2 solution, but it is a good deal clearer. from sets import Set set_a = Set( [(i,round(j)) for i,j in a] ) set_b = Set( [(i,round(j)) for i,j in b] ) set_a.intersection( set_b ) Set([(123, 2.0), (123, 1.0), (123, 8.0)]) Or you could say ... set_a, set_b = [[Set((i,round(j))) for i,j in s] for s in (a,b )] Adam DePrince Jul 18 '05 #10

 P: n/a On Thu, 2004-12-02 at 22:16, Greg Ewing wrote: Gordon Williams wrote:>a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)]>b= [(123, 0.9), (123, 1.9), (123, 8.0)]>[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))] d = {} for (l, m) in b: d[l, round(m)] = 1 result = [] for (i, j) in a: t = (i, round(j)) if t in d: result.append(t) - in the future I will want to round the second number of closest 0.25 rather than whole number. I would do that by multiplying by 4 and rounding to an integer to derive the dictionary key. That will avoid any float-representation problems you might have by trying to round to a fraction. Would the sets module be more efficient? As another poster said, sets are implemented as dicts in 2.3, so it comes down to much the same thing. Using sets might be a bit faster than the above code in 2.4, but probably not greatly so. By far the biggest improvement will come from using an O(n) algorithm instead of an O(n**2) one. Of course a low O-factor is important; you should avoid however confusing the statement of what you want to do with the statement of how you want to do it. One of the benefits of a HLL like Python is you can merely state *what* you want without worrying about how to compute it. In the original example above you are computing a set intersection - python's set object has an intersection method. Use it. Not only is it faster than your O**2 solution, but it is a good deal clearer. from sets import Set set_a = Set( [(i,round(j)) for i,j in a] ) set_b = Set( [(i,round(j)) for i,j in b] ) set_a.intersection( set_b ) Set([(123, 2.0), (123, 1.0), (123, 8.0)]) Or you could say ... set_a, set_b = [[Set((i,round(j))) for i,j in s] for s in (a,b )] Adam DePrince Jul 18 '05 #11

 P: n/a Michael Hoffman wrote: Steven Bethard wrote: Well, in Python 2.3, I believe sets are implemented in Python while they're implemented in C in Python 2.4. I think the Python 2.3 Sets implementation is likely to be quicker than whatever list-manipulation answer you come up with instead. But there's only one way to find out ;) Yeah, almost certainly since he's looking at lists 3K long. If they were small, you never know since the list comprehension gets the C-code speedup, while sets.Set is Python code: python -m timeit -s "a = [(123,1.3),(123,2.4),(123,7.8),(123,10.2)]; b = [(123, 0.9), (123, 1.9), (123, 8.0)]" "[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))]" 10000 loops, best of 3: 27.5 usec per loop python -m timeit -s "import sets; a = [(123,1.3),(123,2.4),(123,7.8),(123,10.2)]; b = [(123, 0.9), (123, 1.9 ), (123, 8.0)]" "sets.Set([(i,round(j)) for i,j in a]).intersection(sets.Set([(i, round(j)) for i, j in b]))" 10000 loops, best of 3: 47.7 usec per loop In the case given, the O(n**2) list comprehension is faster than the O(n) set intersection. Of course, this is not likely to be true with any reasonable sized data. But it's something worth keeping in mind. Steve Jul 18 '05 #12

 P: n/a Michael Hoffman wrote: Steven Bethard wrote: Well, in Python 2.3, I believe sets are implemented in Python while they're implemented in C in Python 2.4. I think the Python 2.3 Sets implementation is likely to be quicker than whatever list-manipulation answer you come up with instead. But there's only one way to find out ;) Yeah, almost certainly since he's looking at lists 3K long. If they were small, you never know since the list comprehension gets the C-code speedup, while sets.Set is Python code: python -m timeit -s "a = [(123,1.3),(123,2.4),(123,7.8),(123,10.2)]; b = [(123, 0.9), (123, 1.9), (123, 8.0)]" "[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))]" 10000 loops, best of 3: 27.5 usec per loop python -m timeit -s "import sets; a = [(123,1.3),(123,2.4),(123,7.8),(123,10.2)]; b = [(123, 0.9), (123, 1.9 ), (123, 8.0)]" "sets.Set([(i,round(j)) for i,j in a]).intersection(sets.Set([(i, round(j)) for i, j in b]))" 10000 loops, best of 3: 47.7 usec per loop In the case given, the O(n**2) list comprehension is faster than the O(n) set intersection. Of course, this is not likely to be true with any reasonable sized data. But it's something worth keeping in mind. Steve Jul 18 '05 #13

 P: n/a "Gordon Williams" wrote in message news:ma**************************************@pyth on.org... Hi, I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it. a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b= [(123, 0.9), (123, 1.9), (123, 8.0)] [ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))] [(123, 1.0), (123, 2.0), (123, 8.0)] This works but a and b can be in the order of 30K long. A couple of other bits of info. - a and b are ordered smallest to largest (could bisect module be used?) - in the future I will want to round the second number of closest 0.25 rather than whole number. Would the sets module be more efficient? Yes: set((x,round(y)) for x,y in a) & set((x,round(y)) for x,y in b) set([(123, 8.0), (123, 2.0), (123, 1.0)]) I'm using python 2.3. from sets import Set as set set([(x,round(y)) for x,y in a]) & set([(x,round(y)) for x,y in b]) set([(123, 8.0), (123, 2.0), (123, 1.0)]) Raymond Hettinger Jul 18 '05 #14

 P: n/a "Gordon Williams" wrote in message news:ma**************************************@pyth on.org... Hi, I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it. a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b= [(123, 0.9), (123, 1.9), (123, 8.0)] [ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))] [(123, 1.0), (123, 2.0), (123, 8.0)] This works but a and b can be in the order of 30K long. A couple of other bits of info. - a and b are ordered smallest to largest (could bisect module be used?) - in the future I will want to round the second number of closest 0.25 rather than whole number. Would the sets module be more efficient? Yes: set((x,round(y)) for x,y in a) & set((x,round(y)) for x,y in b) set([(123, 8.0), (123, 2.0), (123, 1.0)]) I'm using python 2.3. from sets import Set as set set([(x,round(y)) for x,y in a]) & set([(x,round(y)) for x,y in b]) set([(123, 8.0), (123, 2.0), (123, 1.0)]) Raymond Hettinger Jul 18 '05 #15

 P: n/a Steven Bethard wrote: Yeah, almost certainly since he's looking at lists 3K long. If they were small, you never know since the list comprehension gets the C-code speedup, while sets.Set is Python code:[list comprehension] 10000 loops, best of 3: 27.5 usec per loop [Python 2.3 Set] 10000 loops, best of 3: 47.7 usec per loop In the case given, the O(n**2) list comprehension is faster than the O(n) set intersection. Of course, this is not likely to be true with any reasonable sized data. But it's something worth keeping in mind. Of course if you're working with a dataset that small, it probably doesn't really matter which of these implementations you use. The exception would be if this were in an inner loop in the actual program and *were* being run 10000 times or more. -- Michael Hoffman Jul 18 '05 #16

 P: n/a Steven Bethard wrote: Yeah, almost certainly since he's looking at lists 3K long. If they were small, you never know since the list comprehension gets the C-code speedup, while sets.Set is Python code:[list comprehension] 10000 loops, best of 3: 27.5 usec per loop [Python 2.3 Set] 10000 loops, best of 3: 47.7 usec per loop In the case given, the O(n**2) list comprehension is faster than the O(n) set intersection. Of course, this is not likely to be true with any reasonable sized data. But it's something worth keeping in mind. Of course if you're working with a dataset that small, it probably doesn't really matter which of these implementations you use. The exception would be if this were in an inner loop in the actual program and *were* being run 10000 times or more. -- Michael Hoffman Jul 18 '05 #17

 P: n/a Hi, I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it. a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b= [(123, 0.9), (123, 1.9), (123, 8.0)] [ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))] [(123, 1.0), (123, 2.0), (123, 8.0)] Thanks for all your suggestions. I've tried each one with lists of 1K, 10K and 30K long and tabulated the results below. Run with profile on W2K, python 2.3.2, 1GHz Athlon. 1K, 10K and 30K long (seconds per call) t1= 0.009, 0.148, 0.563 t2= 0.015, 0.217, 0.777 t3= 0.008, 0.108, 0.487 t4= 0.016, 0.190, 0.749 t5= 0.015, 0.224, 0.773 The non-set algorithims (t1,t3) came out the winners (maybe due to the conversion of the set to a sorted list. I didn't look into it any farther.) Regards, Gordon Williams -------- from sets import Set import random size = 1000 a = [(123,i+random.choice([-.2,-.1,.1,.2])) for i in range(size)] b = [(123, 1+i+random.choice([-.2,-.1,.1,.2])) for i in range(size)] def t1(): #Diez B. Roggisch ra = [ (i,round(j)) for i,j in a] rb = [ (i,round(j)) for i,j in b] res = [] pos_b = 0 try: for i, pivot in ra: while rb[pos_b] < pivot: pos_b += 1 while rb[pos_b] == pivot: res.append(rb[pos_b]) pos_b += 1 except IndexError: # If b gets exhausted somewhere pass return res def t2(): #Steven Bethard def roundedj(pairs_iterable): return [(i, round(j)) for i, j in pairs_iterable] l=list(Set(roundedj(a)).intersection(Set(roundedj( b)))) l.sort() return l def t3(): #Greg Ewing d = {} for (l, m) in b: d[l, round(m)] = 1 result = [] for (i, j) in a: t = (i, round(j)) if t in d: result.append(t) return result def t4(): #Adam DePrince set_a = Set( [(i,round(j)) for i,j in a] ) set_b = Set( [(i,round(j)) for i,j in b] ) l= list(set_a.intersection( set_b )) l.sort() return l def t5(): #Raymond Hettinger l= list(Set([(x,round(y)) for x,y in a]) & Set([(x,round(y)) for x,y in b])) l.sort() return l def test(): r1= t1() r2= t2() r3= t3() r4= t4() r5= t5() Jul 18 '05 #18

### This discussion thread is closed

Replies have been disabled for this discussion. 