472,958 Members | 2,427 Online

# comparing two lists and returning "position"

Hi there, I have a 2 lists.. for simplicities sake lets say the are:

l1 = [ 'abc' 'ghi' 'mno' ]

l2 = [ 'abc' 'def' 'ghi' 'jkl 'mno' 'pqr']

what I need to do is compare l1 against l2 and return the "position"
of where each object in l1 is in l2

ie: pos = 0, 2, 4

Jun 22 '07 #1
12 5807
On 2007-06-22, hiro <Nu****@gmail.comwrote:
Hi there, I have a 2 lists.. for simplicities sake lets say the are:

l1 = [ 'abc' 'ghi' 'mno' ]

l2 = [ 'abc' 'def' 'ghi' 'jkl 'mno' 'pqr']

what I need to do is compare l1 against l2 and return the "position"
of where each object in l1 is in l2

ie: pos = 0, 2, 4

Come, come! You can try harder than that.

--
Neil Cerutti
Jun 22 '07 #2
hiro <Nu****@gmail.comwrites:
what I need to do is compare l1 against l2 and return the "position"
of where each object in l1 is in l2

ie: pos = 0, 2, 4

from itertools import izip
pos = map(dict(izip(l2, count())).__getitem__, l1)

Heh heh heh.
Jun 22 '07 #3
Paul Rubin wrote:
>
from itertools import izip
pos = map(dict(izip(l2, count())).__getitem__, l1)
or probably less efficiently ...
>>l1 = [ 'abc', 'ghi', 'mno' ]
l2 = [ 'abc', 'def', 'ghi', 'jkl', 'mno', 'pqr']
pos = [ l2.index(i) for i in l1 ]
print pos
[0, 2, 4]

Charles
Jun 22 '07 #4
On Jun 22, 1:46 am, Charles Sanders <C.delete_this.Sand...@BoM.GOv.AU>
wrote:
Paul Rubin wrote:
from itertools import izip
pos = map(dict(izip(l2, count())).__getitem__, l1)

or probably less efficiently ...
>>l1 = [ 'abc', 'ghi', 'mno' ]
>>l2 = [ 'abc', 'def', 'ghi', 'jkl', 'mno', 'pqr']
>>pos = [ l2.index(i) for i in l1 ]
>>print pos
[0, 2, 4]

Charles
Hey Guys thanks for the feedback and the suggestions.
Charles I got your implementation to work so many thanks for this.

this is what I had so far

for spam in l1:
for eggs in l2:
if spam == eggs:
print "kaka", spam, eggs

so its almost working just need the index, I'll
continue playing with the nested loop approach for a bit more.

Thanks once again guys

Jun 22 '07 #5
On Fri, 22 Jun 2007 03:11:16 +0000, hiro wrote:
Hi there, I have a 2 lists.. for simplicities sake lets say the are:

l1 = [ 'abc' 'ghi' 'mno' ]

l2 = [ 'abc' 'def' 'ghi' 'jkl 'mno' 'pqr']

what I need to do is compare l1 against l2 and return the "position" of
where each object in l1 is in l2

ie: pos = 0, 2, 4

Thanks for sharing. Did you have a question, or did you just want to tell
us what you were doing?
My pleasure.
--
Steven.
Jun 22 '07 #6
On Jun 22, 2:16 am, hiro <Nun...@gmail.comwrote:
On Jun 22, 1:46 am, Charles Sanders <C.delete_this.Sand...@BoM.GOv.AU>
wrote:
Paul Rubin wrote:
from itertools import izip
pos = map(dict(izip(l2, count())).__getitem__, l1)
or probably less efficiently ...
>>l1 = [ 'abc', 'ghi', 'mno' ]
>>l2 = [ 'abc', 'def', 'ghi', 'jkl', 'mno', 'pqr']
>>pos = [ l2.index(i) for i in l1 ]
>>print pos
[0, 2, 4]
Charles

Hey Guys thanks for the feedback and the suggestions.
Charles I got your implementation to work so many thanks for this.

this is what I had so far

for spam in l1:
for eggs in l2:
if spam == eggs:
print "kaka", spam, eggs

so its almost working just need the index, I'll
continue playing with the nested loop approach for a bit more.

Thanks once again guys
Hi once again, Charles.. I have tried your approach in my data set l2
and it keeps crashing on me,
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)

python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32

error is : ValueError: list.index(x): x not in list

when using Charles's
pos = [ l2.index(i) for i in l1 ]
print pos

does anybody know of if I have to many data points ? the nested for
loop approach seems to be working(still have get the index "position"
returned though)
Charles's approach works fine with less data.

Cheers, -d

Jun 22 '07 #7
Hi once again, Charles.. I have tried your approach in my data set l2
and it keeps crashing on me,
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)

python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32

error is : ValueError: list.index(x): x not in list
So you are saying you get this error with the value of `x` actually in the
list!? Somehow hard to believe.

Ciao,
Marc 'BlackJack' Rintsch
Jun 22 '07 #8
On Jun 22, 1:56 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
Hi once again, Charles.. I have tried your approach in my data set l2
and it keeps crashing on me,
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)
python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
error is : ValueError: list.index(x): x not in list

So you are saying you get this error with the value of `x` actually in the
list!? Somehow hard to believe.

Ciao,
Marc 'BlackJack' Rintsch
yes I do

Jun 22 '07 #9
On Jun 22, 1:58 pm, hiro <Nun...@gmail.comwrote:
On Jun 22, 1:56 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
Hi once again, Charles.. I have tried your approach in my data set l2
and it keeps crashing on me,
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)
python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
error is : ValueError: list.index(x): x not in list
So you are saying you get this error with the value of `x` actually in the
list!? Somehow hard to believe.
Ciao,
Marc 'BlackJack' Rintsch

yes I do
I doubled, trippled check my data already (even doing a search by hand
using vim) and the data is fine. Still looking into it though

Jun 22 '07 #10
On Jun 22, 2:00 pm, hiro <Nun...@gmail.comwrote:
On Jun 22, 1:58 pm, hiro <Nun...@gmail.comwrote:
On Jun 22, 1:56 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
Hi once again, Charles.. I have tried your approach in my data set l2
and it keeps crashing on me,
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)
python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
error is : ValueError: list.index(x): x not in list
So you are saying you get this error with the value of `x` actually in the
list!? Somehow hard to believe.
Ciao,
Marc 'BlackJack' Rintsch
yes I do

I doubled, trippled check my data already (even doing a search by hand
using vim) and the data is fine. Still looking into it though

hahaha, K found out what was wrong.. in the function computing
the data for l1 there was extra space was being put in.

ie:

l1 = [ 'abc ' 'ghi ' 'mno ' ]

and I didn't strip it properly after splitting it.. silly me,

well.. live and learn.. thanks guys

Cheers, -h

Jun 22 '07 #11
hiro wrote:
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)
Glad to see you solved the problem with the trailing space.

Just one minor point, I did say
or probably less efficiently ...
As far as i know, my suggestion's running time is
proportional to len(l1)*len(l2), which gets quite
big for your case where l1 and l2 are large lists.

If I understand how python dictionaries work, Paul Rubin's
suggestion
from itertools import izip, count
pos = map(dict(izip(l2, count())).__getitem__, l1)
or the (I think) approximately equivalent

from itertools import izip, count
d = dict(izip(l2,count()))
pos = [ d[i] for i in l1 ]

or the more memory intensive

d = dict(zip(l2,range(len(l2))))
pos = [ d[i] for i in l1 ]

should all take take running time proportional to
(len(l1)+len(l2))*log(len(l2))

For len(l1)=4,000 and len(l2)=10,000,000
Paul's suggestion is likely to take
about 1/100th of the time to run, ie
be about 100 times as fast. I was trying
to point out a somewhat clearer and simpler
(but slower) alternative.

Charles
Jun 25 '07 #12
Charles Sanders <C.*******************@BoM.GOv.AUwrites:
from itertools import izip, count
d = dict(izip(l2,count()))
pos = [ d[i] for i in l1 ]

or the more memory intensive

d = dict(zip(l2,range(len(l2))))
pos = [ d[i] for i in l1 ]
If you're itertools-phobic you could alternatively write

d = dict((x,i) for i,x in enumerate(l2))
pos = [ d[i] for i in l1 ]

dict access and update is supposed to take approximately constant time,
btw. They are implemented as hash tables.
Jun 25 '07 #13

This thread has been closed and replies have been disabled. Please start a new discussion.