473,583 Members | 2,875 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

efficient intersection of lists with rounding

Hi,

I have to lists that I need to find the common numbers (2nd rounded to
nearest integral) and I am wondering if there is a more efficient way of
doing it.
a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))]
[(123, 1.0), (123, 2.0), (123, 8.0)]

This works but a and b can be in the order of 30K long.

A couple of other bits of info.
- a and b are ordered smallest to largest (could bisect module be used?)
- in the future I will want to round the second number of closest 0.25
rather than whole number.

Would the sets module be more efficient?

I'm using python 2.3.

Thanks for any ideas.

Regards,

Gordon Williams

Jul 18 '05 #1
17 2765
> A couple of other bits of info.
- a and b are ordered smallest to largest (could bisect module be used?)
- in the future I will want to round the second number of closest 0.25
rather than whole number.

Would the sets module be more efficient?

I'm using python 2.3.


I'd go for something that uses the rounded versions of the lists and then
iterates the first list and lets the second "cach up". Sorry, I'm to lazy
to desribe it better, so here is the code:

a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
a = [ (i,round(j)) for i,j in a]
b = [ (i,round(j)) for i,j in b]
res = []
pos_b = 0

try:
for i, pivot in a:
while b[pos_b][1] < pivot:
pos_b += 1
while b[pos_b][1] == pivot:
res.append(b[pos_b])
pos_b += 1
except IndexError:
# If b gets exhausted somewhere
pass
print res

While it looks more complicated, it certainly is faster, as its complexity
is in O(max(len(a), len(b))) where your code was O(len(a) * len(b)) - so
usually more or less quadratic.

The speed gain comes of course from the order of the elements. And you could
factor the rounding _into_ the loops, but thats more ugly.
--
Regards,

Diez B. Roggisch
Jul 18 '05 #2
Gordon Williams wrote:
I have to lists that I need to find the common numbers (2nd rounded to
nearest integral) and I am wondering if there is a more efficient way of
doing it.
a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) ==
(l,round(m))]
[(123, 1.0), (123, 2.0), (123, 8.0)]
[snip] Would the sets module be more efficient?


Well, in Python 2.3, I believe sets are implemented in Python while
they're implemented in C in Python 2.4. So probably not, unless you
upgrade. A 2.4 solution with sets:
a = [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b = [(123, 0.9), (123, 1.9), (123, 8.0)]
def roundedj(pairs_ iterable): .... return ((i, round(j)) for i, j in pairs_iterable)
.... set(roundedj(a) ).intersection( set(roundedj(b) ))

set([(123, 8.0), (123, 2.0), (123, 1.0)])

Steve
Jul 18 '05 #3
> A couple of other bits of info.
- a and b are ordered smallest to largest (could bisect module be used?)
- in the future I will want to round the second number of closest 0.25
rather than whole number.

Would the sets module be more efficient?

I'm using python 2.3.


I'd go for something that uses the rounded versions of the lists and then
iterates the first list and lets the second "cach up". Sorry, I'm to lazy
to desribe it better, so here is the code:

a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
a = [ (i,round(j)) for i,j in a]
b = [ (i,round(j)) for i,j in b]
res = []
pos_b = 0

try:
for i, pivot in a:
while b[pos_b][1] < pivot:
pos_b += 1
while b[pos_b][1] == pivot:
res.append(b[pos_b])
pos_b += 1
except IndexError:
# If b gets exhausted somewhere
pass
print res

While it looks more complicated, it certainly is faster, as its complexity
is in O(max(len(a), len(b))) where your code was O(len(a) * len(b)) - so
usually more or less quadratic.

The speed gain comes of course from the order of the elements. And you could
factor the rounding _into_ the loops, but thats more ugly.
--
Regards,

Diez B. Roggisch
Jul 18 '05 #4
Gordon Williams wrote:
I have to lists that I need to find the common numbers (2nd rounded to
nearest integral) and I am wondering if there is a more efficient way of
doing it.
a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) ==
(l,round(m))]
[(123, 1.0), (123, 2.0), (123, 8.0)]
[snip] Would the sets module be more efficient?


Well, in Python 2.3, I believe sets are implemented in Python while
they're implemented in C in Python 2.4. So probably not, unless you
upgrade. A 2.4 solution with sets:
a = [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b = [(123, 0.9), (123, 1.9), (123, 8.0)]
def roundedj(pairs_ iterable): .... return ((i, round(j)) for i, j in pairs_iterable)
.... set(roundedj(a) ).intersection( set(roundedj(b) ))

set([(123, 8.0), (123, 2.0), (123, 1.0)])

Steve
Jul 18 '05 #5
Steven Bethard wrote:
Well, in Python 2.3, I believe sets are implemented in Python while
they're implemented in C in Python 2.4.


I think the Python 2.3 Sets implementation is likely to be quicker than
whatever list-manipulation answer you come up with instead. But there's
only one way to find out ;)
--
Michael Hoffman
Jul 18 '05 #6
Steven Bethard wrote:
Well, in Python 2.3, I believe sets are implemented in Python while
they're implemented in C in Python 2.4.


I think the Python 2.3 Sets implementation is likely to be quicker than
whatever list-manipulation answer you come up with instead. But there's
only one way to find out ;)
--
Michael Hoffman
Jul 18 '05 #7
Gordon Williams wrote:
a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) ==
(l,round(m))]


d = {}
for (l, m) in b:
d[l, round(m)] = 1

result = []
for (i, j) in a:
t = (i, round(j))
if t in d:
result.append(t )
- in the future I will want to round the second number of closest 0.25
rather than whole number.
I would do that by multiplying by 4 and rounding to
an integer to derive the dictionary key. That will
avoid any float-representation problems you might have
by trying to round to a fraction.
Would the sets module be more efficient?


As another poster said, sets are implemented as dicts
in 2.3, so it comes down to much the same thing. Using
sets might be a bit faster than the above code in 2.4,
but probably not greatly so. By far the biggest
improvement will come from using an O(n) algorithm
instead of an O(n**2) one.

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg

Jul 18 '05 #8
Gordon Williams wrote:
a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) ==
(l,round(m))]


d = {}
for (l, m) in b:
d[l, round(m)] = 1

result = []
for (i, j) in a:
t = (i, round(j))
if t in d:
result.append(t )
- in the future I will want to round the second number of closest 0.25
rather than whole number.
I would do that by multiplying by 4 and rounding to
an integer to derive the dictionary key. That will
avoid any float-representation problems you might have
by trying to round to a fraction.
Would the sets module be more efficient?


As another poster said, sets are implemented as dicts
in 2.3, so it comes down to much the same thing. Using
sets might be a bit faster than the above code in 2.4,
but probably not greatly so. By far the biggest
improvement will come from using an O(n) algorithm
instead of an O(n**2) one.

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg

Jul 18 '05 #9
On Thu, 2004-12-02 at 22:16, Greg Ewing wrote:
Gordon Williams wrote:
>a= [(123,1.3),(123, 2.4),(123,7.8), (123,10.2)]
>b= [(123, 0.9), (123, 1.9), (123, 8.0)]
>[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) ==

(l,round(m))]


d = {}
for (l, m) in b:
d[l, round(m)] = 1

result = []
for (i, j) in a:
t = (i, round(j))
if t in d:
result.append(t )
- in the future I will want to round the second number of closest 0.25
rather than whole number.


I would do that by multiplying by 4 and rounding to
an integer to derive the dictionary key. That will
avoid any float-representation problems you might have
by trying to round to a fraction.
Would the sets module be more efficient?


As another poster said, sets are implemented as dicts
in 2.3, so it comes down to much the same thing. Using
sets might be a bit faster than the above code in 2.4,
but probably not greatly so. By far the biggest
improvement will come from using an O(n) algorithm
instead of an O(n**2) one.


Of course a low O-factor is important; you should avoid however
confusing the statement of what you want to do with the statement of how
you want to do it. One of the benefits of a HLL like Python is you can
merely state *what* you want without worrying about how to compute it.

In the original example above you are computing a set intersection -
python's set object has an intersection method. Use it. Not only is it
faster than your O**2 solution, but it is a good deal clearer.
from sets import Set
set_a = Set( [(i,round(j)) for i,j in a] )
set_b = Set( [(i,round(j)) for i,j in b] )
set_a.intersect ion( set_b ) Set([(123, 2.0), (123, 1.0), (123, 8.0)])

Or you could say ...
set_a, set_b = [[Set((i,round(j) )) for i,j in s] for s in (a,b )]


Adam DePrince
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
80963
by: Mickel Grönroos | last post by:
Hi! Are there any standard list methods for getting the intersection and difference of two lists? (The union is easy ("list1.extend(list2)"), unless you want it to contain unique values.) Here is what I would like to have: list1 = list2 =
6
2648
by: Narendra C. Tulpule | last post by:
Hi, if you know the Python internals, here is a newbie question for you. If I have a list with 100 elements, each element being a long string, is it more efficient to maintain it as a dictionary (with a key = a string from the list and value = None) for the purpose of insertion and removal? Basically, if Python really implements lists as...
0
1105
by: Gordon Williams | last post by:
Hi, I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it. >>> a= >>> b= >>> (l,round(m))]
7
2205
by: les_ander | last post by:
Hi, I have 2 lists of tuples that look like: E1= and E2=. In this tuple, the ordering does not matter, i.e. (u,v) is the same as (v,u). What I want to do is the following: given 2 list of tuples, E1 and E2, I want to create another list with tuples that are common to both. So in the above example I would like
2
2521
by: James Stroud | last post by:
Hello All, I find myself in this situation from time to time: I want to compare two lists of arbitrary objects and (1) find those unique to the first list, (2) find those unique to the second list, (3) find those that overlap. But here is the catch: comparison is not straight-forward. For example, I will want to compare 2 objects based on a...
6
17570
by: kimos | last post by:
hi all, how to calculate the intersection of 2 rectangle a rectangle is the following: Rectangle makeRectangle (Point lowerLeft, Point upperRight) { Rectangle r;
3
5494
by: ryu | last post by:
Hi, May I know how to do an intersection of sets using C#? Where the number of sets will only be known during runtime. Many Thanks
2
2308
by: mkppk | last post by:
I have kind of strange change I'd like to make to the sets.Set() intersection() method.. Normally, intersection would return items in both s1 and s2 like with something like this: s1.intersection(s2) I want the item matching to be a bit "looser".. that is, items in s2 that match to just the beginning of items in s1 would be included in...
11
8553
by: Prateek | last post by:
I have 3 variable length lists of sets. I need to find the common elements in each list (across sets) really really quickly. Here is some sample code: # Doesn't make sense to union the sets - we're going to do intersections later anyway l1 = reduce(operator.add, list(x) for x in l1) l2 = reduce(operator.add, list(x) for x in l2) l3 =...
0
7888
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7811
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8314
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7922
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
1
5689
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5366
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3836
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1416
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1147
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.