Hi,
I have to lists that I need to find the common numbers (2nd rounded to
nearest integral) and I am wondering if there is a more efficient way of
doing it. a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b= [(123, 0.9), (123, 1.9), (123, 8.0)] [ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) ==
(l,round(m))]
[(123, 1.0), (123, 2.0), (123, 8.0)]
This works but a and b can be in the order of 30K long.
A couple of other bits of info.
- a and b are ordered smallest to largest (could bisect module be used?)
- in the future I will want to round the second number of closest 0.25
rather than whole number.
Would the sets module be more efficient?
I'm using python 2.3.
Thanks for any ideas.
Regards,
Gordon Williams 17 2737
> A couple of other bits of info. - a and b are ordered smallest to largest (could bisect module be used?) - in the future I will want to round the second number of closest 0.25 rather than whole number.
Would the sets module be more efficient?
I'm using python 2.3.
I'd go for something that uses the rounded versions of the lists and then
iterates the first list and lets the second "cach up". Sorry, I'm to lazy
to desribe it better, so here is the code:
a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
a = [ (i,round(j)) for i,j in a]
b = [ (i,round(j)) for i,j in b]
res = []
pos_b = 0
try:
for i, pivot in a:
while b[pos_b][1] < pivot:
pos_b += 1
while b[pos_b][1] == pivot:
res.append(b[pos_b])
pos_b += 1
except IndexError:
# If b gets exhausted somewhere
pass
print res
While it looks more complicated, it certainly is faster, as its complexity
is in O(max(len(a), len(b))) where your code was O(len(a) * len(b)) - so
usually more or less quadratic.
The speed gain comes of course from the order of the elements. And you could
factor the rounding _into_ the loops, but thats more ugly.
--
Regards,
Diez B. Roggisch
Gordon Williams wrote: I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it.
a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b= [(123, 0.9), (123, 1.9), (123, 8.0)] [ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))] [(123, 1.0), (123, 2.0), (123, 8.0)]
[snip] Would the sets module be more efficient?
Well, in Python 2.3, I believe sets are implemented in Python while
they're implemented in C in Python 2.4. So probably not, unless you
upgrade. A 2.4 solution with sets: a = [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b = [(123, 0.9), (123, 1.9), (123, 8.0)] def roundedj(pairs_iterable):
.... return ((i, round(j)) for i, j in pairs_iterable)
.... set(roundedj(a)).intersection(set(roundedj(b)))
set([(123, 8.0), (123, 2.0), (123, 1.0)])
Steve
> A couple of other bits of info. - a and b are ordered smallest to largest (could bisect module be used?) - in the future I will want to round the second number of closest 0.25 rather than whole number.
Would the sets module be more efficient?
I'm using python 2.3.
I'd go for something that uses the rounded versions of the lists and then
iterates the first list and lets the second "cach up". Sorry, I'm to lazy
to desribe it better, so here is the code:
a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)]
b= [(123, 0.9), (123, 1.9), (123, 8.0)]
a = [ (i,round(j)) for i,j in a]
b = [ (i,round(j)) for i,j in b]
res = []
pos_b = 0
try:
for i, pivot in a:
while b[pos_b][1] < pivot:
pos_b += 1
while b[pos_b][1] == pivot:
res.append(b[pos_b])
pos_b += 1
except IndexError:
# If b gets exhausted somewhere
pass
print res
While it looks more complicated, it certainly is faster, as its complexity
is in O(max(len(a), len(b))) where your code was O(len(a) * len(b)) - so
usually more or less quadratic.
The speed gain comes of course from the order of the elements. And you could
factor the rounding _into_ the loops, but thats more ugly.
--
Regards,
Diez B. Roggisch
Gordon Williams wrote: I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it.
a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b= [(123, 0.9), (123, 1.9), (123, 8.0)] [ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))] [(123, 1.0), (123, 2.0), (123, 8.0)]
[snip] Would the sets module be more efficient?
Well, in Python 2.3, I believe sets are implemented in Python while
they're implemented in C in Python 2.4. So probably not, unless you
upgrade. A 2.4 solution with sets: a = [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b = [(123, 0.9), (123, 1.9), (123, 8.0)] def roundedj(pairs_iterable):
.... return ((i, round(j)) for i, j in pairs_iterable)
.... set(roundedj(a)).intersection(set(roundedj(b)))
set([(123, 8.0), (123, 2.0), (123, 1.0)])
Steve
Steven Bethard wrote: Well, in Python 2.3, I believe sets are implemented in Python while they're implemented in C in Python 2.4.
I think the Python 2.3 Sets implementation is likely to be quicker than
whatever list-manipulation answer you come up with instead. But there's
only one way to find out ;)
--
Michael Hoffman
Steven Bethard wrote: Well, in Python 2.3, I believe sets are implemented in Python while they're implemented in C in Python 2.4.
I think the Python 2.3 Sets implementation is likely to be quicker than
whatever list-manipulation answer you come up with instead. But there's
only one way to find out ;)
--
Michael Hoffman
Gordon Williams wrote: a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b= [(123, 0.9), (123, 1.9), (123, 8.0)] [ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))]
d = {}
for (l, m) in b:
d[l, round(m)] = 1
result = []
for (i, j) in a:
t = (i, round(j))
if t in d:
result.append(t)
- in the future I will want to round the second number of closest 0.25 rather than whole number.
I would do that by multiplying by 4 and rounding to
an integer to derive the dictionary key. That will
avoid any float-representation problems you might have
by trying to round to a fraction.
Would the sets module be more efficient?
As another poster said, sets are implemented as dicts
in 2.3, so it comes down to much the same thing. Using
sets might be a bit faster than the above code in 2.4,
but probably not greatly so. By far the biggest
improvement will come from using an O(n) algorithm
instead of an O(n**2) one.
--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand http://www.cosc.canterbury.ac.nz/~greg
Gordon Williams wrote: a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b= [(123, 0.9), (123, 1.9), (123, 8.0)] [ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))]
d = {}
for (l, m) in b:
d[l, round(m)] = 1
result = []
for (i, j) in a:
t = (i, round(j))
if t in d:
result.append(t)
- in the future I will want to round the second number of closest 0.25 rather than whole number.
I would do that by multiplying by 4 and rounding to
an integer to derive the dictionary key. That will
avoid any float-representation problems you might have
by trying to round to a fraction.
Would the sets module be more efficient?
As another poster said, sets are implemented as dicts
in 2.3, so it comes down to much the same thing. Using
sets might be a bit faster than the above code in 2.4,
but probably not greatly so. By far the biggest
improvement will come from using an O(n) algorithm
instead of an O(n**2) one.
--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand http://www.cosc.canterbury.ac.nz/~greg
On Thu, 2004-12-02 at 22:16, Greg Ewing wrote: Gordon Williams wrote:>a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] >b= [(123, 0.9), (123, 1.9), (123, 8.0)] >[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))]
d = {} for (l, m) in b: d[l, round(m)] = 1
result = [] for (i, j) in a: t = (i, round(j)) if t in d: result.append(t)
- in the future I will want to round the second number of closest 0.25 rather than whole number.
I would do that by multiplying by 4 and rounding to an integer to derive the dictionary key. That will avoid any float-representation problems you might have by trying to round to a fraction.
Would the sets module be more efficient?
As another poster said, sets are implemented as dicts in 2.3, so it comes down to much the same thing. Using sets might be a bit faster than the above code in 2.4, but probably not greatly so. By far the biggest improvement will come from using an O(n) algorithm instead of an O(n**2) one.
Of course a low O-factor is important; you should avoid however
confusing the statement of what you want to do with the statement of how
you want to do it. One of the benefits of a HLL like Python is you can
merely state *what* you want without worrying about how to compute it.
In the original example above you are computing a set intersection -
python's set object has an intersection method. Use it. Not only is it
faster than your O**2 solution, but it is a good deal clearer. from sets import Set set_a = Set( [(i,round(j)) for i,j in a] ) set_b = Set( [(i,round(j)) for i,j in b] ) set_a.intersection( set_b )
Set([(123, 2.0), (123, 1.0), (123, 8.0)])
Or you could say ...
set_a, set_b = [[Set((i,round(j))) for i,j in s] for s in (a,b )]
Adam DePrince
On Thu, 2004-12-02 at 22:16, Greg Ewing wrote: Gordon Williams wrote:>a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] >b= [(123, 0.9), (123, 1.9), (123, 8.0)] >[ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))]
d = {} for (l, m) in b: d[l, round(m)] = 1
result = [] for (i, j) in a: t = (i, round(j)) if t in d: result.append(t)
- in the future I will want to round the second number of closest 0.25 rather than whole number.
I would do that by multiplying by 4 and rounding to an integer to derive the dictionary key. That will avoid any float-representation problems you might have by trying to round to a fraction.
Would the sets module be more efficient?
As another poster said, sets are implemented as dicts in 2.3, so it comes down to much the same thing. Using sets might be a bit faster than the above code in 2.4, but probably not greatly so. By far the biggest improvement will come from using an O(n) algorithm instead of an O(n**2) one.
Of course a low O-factor is important; you should avoid however
confusing the statement of what you want to do with the statement of how
you want to do it. One of the benefits of a HLL like Python is you can
merely state *what* you want without worrying about how to compute it.
In the original example above you are computing a set intersection -
python's set object has an intersection method. Use it. Not only is it
faster than your O**2 solution, but it is a good deal clearer. from sets import Set set_a = Set( [(i,round(j)) for i,j in a] ) set_b = Set( [(i,round(j)) for i,j in b] ) set_a.intersection( set_b )
Set([(123, 2.0), (123, 1.0), (123, 8.0)])
Or you could say ...
set_a, set_b = [[Set((i,round(j))) for i,j in s] for s in (a,b )]
Adam DePrince
Michael Hoffman wrote: Steven Bethard wrote:
Well, in Python 2.3, I believe sets are implemented in Python while they're implemented in C in Python 2.4. I think the Python 2.3 Sets implementation is likely to be quicker than whatever list-manipulation answer you come up with instead. But there's only one way to find out ;)
Yeah, almost certainly since he's looking at lists 3K long. If they
were small, you never know since the list comprehension gets the C-code
speedup, while sets.Set is Python code:
python -m timeit -s "a = [(123,1.3),(123,2.4),(123,7.8),(123,10.2)];
b = [(123, 0.9), (123, 1.9), (123, 8.0)]" "[ (i,round(j)) for i,j in a
for l,m in b if (i,round(j)) == (l,round(m))]"
10000 loops, best of 3: 27.5 usec per loop
python -m timeit -s "import sets; a =
[(123,1.3),(123,2.4),(123,7.8),(123,10.2)]; b = [(123, 0.9), (123, 1.9
), (123, 8.0)]" "sets.Set([(i,round(j)) for i,j in
a]).intersection(sets.Set([(i, round(j)) for i, j in b]))"
10000 loops, best of 3: 47.7 usec per loop
In the case given, the O(n**2) list comprehension is faster than the
O(n) set intersection. Of course, this is not likely to be true with
any reasonable sized data. But it's something worth keeping in mind.
Steve
Michael Hoffman wrote: Steven Bethard wrote:
Well, in Python 2.3, I believe sets are implemented in Python while they're implemented in C in Python 2.4. I think the Python 2.3 Sets implementation is likely to be quicker than whatever list-manipulation answer you come up with instead. But there's only one way to find out ;)
Yeah, almost certainly since he's looking at lists 3K long. If they
were small, you never know since the list comprehension gets the C-code
speedup, while sets.Set is Python code:
python -m timeit -s "a = [(123,1.3),(123,2.4),(123,7.8),(123,10.2)];
b = [(123, 0.9), (123, 1.9), (123, 8.0)]" "[ (i,round(j)) for i,j in a
for l,m in b if (i,round(j)) == (l,round(m))]"
10000 loops, best of 3: 27.5 usec per loop
python -m timeit -s "import sets; a =
[(123,1.3),(123,2.4),(123,7.8),(123,10.2)]; b = [(123, 0.9), (123, 1.9
), (123, 8.0)]" "sets.Set([(i,round(j)) for i,j in
a]).intersection(sets.Set([(i, round(j)) for i, j in b]))"
10000 loops, best of 3: 47.7 usec per loop
In the case given, the O(n**2) list comprehension is faster than the
O(n) set intersection. Of course, this is not likely to be true with
any reasonable sized data. But it's something worth keeping in mind.
Steve
"Gordon Williams" <g_****@cyberus.ca> wrote in message
news:ma**************************************@pyth on.org... Hi,
I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it.
a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b= [(123, 0.9), (123, 1.9), (123, 8.0)] [ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))] [(123, 1.0), (123, 2.0), (123, 8.0)] This works but a and b can be in the order of 30K long.
A couple of other bits of info. - a and b are ordered smallest to largest (could bisect module be used?) - in the future I will want to round the second number of closest 0.25 rather than whole number.
Would the sets module be more efficient?
Yes: set((x,round(y)) for x,y in a) & set((x,round(y)) for x,y in b)
set([(123, 8.0), (123, 2.0), (123, 1.0)])
I'm using python 2.3.
from sets import Set as set set([(x,round(y)) for x,y in a]) & set([(x,round(y)) for x,y in b])
set([(123, 8.0), (123, 2.0), (123, 1.0)])
Raymond Hettinger
"Gordon Williams" <g_****@cyberus.ca> wrote in message
news:ma**************************************@pyth on.org... Hi,
I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it.
a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b= [(123, 0.9), (123, 1.9), (123, 8.0)] [ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) == (l,round(m))] [(123, 1.0), (123, 2.0), (123, 8.0)] This works but a and b can be in the order of 30K long.
A couple of other bits of info. - a and b are ordered smallest to largest (could bisect module be used?) - in the future I will want to round the second number of closest 0.25 rather than whole number.
Would the sets module be more efficient?
Yes: set((x,round(y)) for x,y in a) & set((x,round(y)) for x,y in b)
set([(123, 8.0), (123, 2.0), (123, 1.0)])
I'm using python 2.3.
from sets import Set as set set([(x,round(y)) for x,y in a]) & set([(x,round(y)) for x,y in b])
set([(123, 8.0), (123, 2.0), (123, 1.0)])
Raymond Hettinger
Steven Bethard wrote: Yeah, almost certainly since he's looking at lists 3K long. If they were small, you never know since the list comprehension gets the C-code speedup, while sets.Set is Python code:
[list comprehension] 10000 loops, best of 3: 27.5 usec per loop
[Python 2.3 Set] 10000 loops, best of 3: 47.7 usec per loop
In the case given, the O(n**2) list comprehension is faster than the O(n) set intersection. Of course, this is not likely to be true with any reasonable sized data. But it's something worth keeping in mind.
Of course if you're working with a dataset that small, it probably
doesn't really matter which of these implementations you use.
The exception would be if this were in an inner loop in the actual
program and *were* being run 10000 times or more.
--
Michael Hoffman
Steven Bethard wrote: Yeah, almost certainly since he's looking at lists 3K long. If they were small, you never know since the list comprehension gets the C-code speedup, while sets.Set is Python code:
[list comprehension] 10000 loops, best of 3: 27.5 usec per loop
[Python 2.3 Set] 10000 loops, best of 3: 47.7 usec per loop
In the case given, the O(n**2) list comprehension is faster than the O(n) set intersection. Of course, this is not likely to be true with any reasonable sized data. But it's something worth keeping in mind.
Of course if you're working with a dataset that small, it probably
doesn't really matter which of these implementations you use.
The exception would be if this were in an inner loop in the actual
program and *were* being run 10000 times or more.
--
Michael Hoffman Hi,
I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it.
a= [(123,1.3),(123,2.4),(123,7.8),(123,10.2)] b= [(123, 0.9), (123, 1.9), (123, 8.0)] [ (i,round(j)) for i,j in a for l,m in b if (i,round(j)) ==
(l,round(m))] [(123, 1.0), (123, 2.0), (123, 8.0)]
Thanks for all your suggestions. I've tried each one with lists of 1K, 10K
and 30K long and tabulated the results below. Run with profile on W2K,
python 2.3.2, 1GHz Athlon.
1K, 10K and 30K long (seconds per call)
t1= 0.009, 0.148, 0.563
t2= 0.015, 0.217, 0.777
t3= 0.008, 0.108, 0.487
t4= 0.016, 0.190, 0.749
t5= 0.015, 0.224, 0.773
The non-set algorithims (t1,t3) came out the winners (maybe due to the
conversion of the set to a sorted list. I didn't look into it any farther.)
Regards,
Gordon Williams
--------
from sets import Set
import random
size = 1000
a = [(123,i+random.choice([-.2,-.1,.1,.2])) for i in range(size)]
b = [(123, 1+i+random.choice([-.2,-.1,.1,.2])) for i in range(size)]
def t1():
#Diez B. Roggisch <de*********@web.de>
ra = [ (i,round(j)) for i,j in a]
rb = [ (i,round(j)) for i,j in b]
res = []
pos_b = 0
try:
for i, pivot in ra:
while rb[pos_b][1] < pivot:
pos_b += 1
while rb[pos_b][1] == pivot:
res.append(rb[pos_b])
pos_b += 1
except IndexError:
# If b gets exhausted somewhere
pass
return res
def t2():
#Steven Bethard <st************@gmail.com>
def roundedj(pairs_iterable):
return [(i, round(j)) for i, j in pairs_iterable]
l=list(Set(roundedj(a)).intersection(Set(roundedj( b))))
l.sort()
return l
def t3():
#Greg Ewing <gr**@cosc.canterbury.ac.nz>
d = {}
for (l, m) in b:
d[l, round(m)] = 1
result = []
for (i, j) in a:
t = (i, round(j))
if t in d:
result.append(t)
return result
def t4():
#Adam DePrince <ad**@cognitcorp.com>
set_a = Set( [(i,round(j)) for i,j in a] )
set_b = Set( [(i,round(j)) for i,j in b] )
l= list(set_a.intersection( set_b ))
l.sort()
return l
def t5():
#Raymond Hettinger <vz******@verizon.net>
l= list(Set([(x,round(y)) for x,y in a]) & Set([(x,round(y)) for x,y in
b]))
l.sort()
return l
def test():
r1= t1()
r2= t2()
r3= t3()
r4= t4()
r5= t5() This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Mickel Grönroos |
last post by:
Hi!
Are there any standard list methods for getting the intersection and
difference of two lists? (The union is easy ("list1.extend(list2)"),
unless you want it to contain unique values.)
...
|
by: Narendra C. Tulpule |
last post by:
Hi,
if you know the Python internals, here is a newbie question for you.
If I have a list with 100 elements, each element being a long string,
is it more efficient to maintain it as a dictionary...
|
by: Gordon Williams |
last post by:
Hi,
I have to lists that I need to find the common numbers (2nd rounded to
nearest integral) and I am wondering if there is a more efficient way of
doing it.
>>> a=
>>> b=
>>> ...
|
by: les_ander |
last post by:
Hi,
I have 2 lists of tuples that look like:
E1= and
E2=.
In this tuple, the ordering does not
matter, i.e. (u,v) is the same as (v,u).
What I want to do is the following:
given 2 list of...
|
by: James Stroud |
last post by:
Hello All,
I find myself in this situation from time to time: I want to compare two lists
of arbitrary objects and (1) find those unique to the first list, (2) find
those unique to the second...
|
by: kimos |
last post by:
hi all,
how to calculate the intersection of 2 rectangle
a rectangle is the following:
Rectangle makeRectangle (Point lowerLeft, Point upperRight) {
Rectangle r;
|
by: ryu |
last post by:
Hi, May I know how to do an intersection of sets using C#? Where the number
of sets will only be known during runtime. Many Thanks
|
by: mkppk |
last post by:
I have kind of strange change I'd like to make to the sets.Set()
intersection() method..
Normally, intersection would return items in both s1 and s2 like with
something like this: ...
|
by: Prateek |
last post by:
I have 3 variable length lists of sets. I need to find the common
elements in each list (across sets) really really quickly.
Here is some sample code:
# Doesn't make sense to union the sets -...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
| |