# comparing two lists and returning "position"

Hi there, I have a 2 lists.. for simplicities sake lets say the are:

l1 = [ 'abc' 'ghi' 'mno' ]

l2 = [ 'abc' 'def' 'ghi' 'jkl 'mno' 'pqr']

what I need to do is compare l1 against l2 and return the "position"
of where each object in l1 is in l2

ie: pos = 0, 2, 4

Come, come! You can try harder than that.

from itertools import izip
pos = map(dict(izip(l2, count())).__getitem__, l1)

Heh heh heh.
Paul Rubin wrote:
>
from itertools import izip
pos = map(dict(izip(l2, count())).__getitem__, l1)
or probably less efficiently ...
>>l1 = [ 'abc', 'ghi', 'mno' ]
l2 = [ 'abc', 'def', 'ghi', 'jkl', 'mno', 'pqr']
pos = [ l2.index(i) for i in l1 ]
print pos
[0, 2, 4]

Charles
Hey Guys thanks for the feedback and the suggestions.
Charles I got your implementation to work so many thanks for this.

this is what I had so far

for spam in l1:
for eggs in l2:
if spam == eggs:
print "kaka", spam, eggs

so its almost working just need the index, I'll
continue playing with the nested loop approach for a bit more.

Thanks once again guys

Hi once again, Charles.. I have tried your approach in my data set l2
and it keeps crashing on me,
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)

python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32

error is : ValueError: list.index(x): x not in list

when using Charles's
pos = [ l2.index(i) for i in l1 ]
print pos

does anybody know of if I have to many data points ? the nested for
loop approach seems to be working(still have get the index "position"
returned though)
Charles's approach works fine with less data.

Cheers, -d

Hi once again, Charles.. I have tried your approach in my data set l2
and it keeps crashing on me,
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)

python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32

error is : ValueError: list.index(x): x not in list
So you are saying you get this error with the value of `x` actually in the
list!? Somehow hard to believe.

Ciao,
Marc 'BlackJack' Rintsch
yes I do

yes I do
I doubled, trippled check my data already (even doing a search by hand
using vim) and the data is fine. Still looking into it though

I doubled, trippled check my data already (even doing a search by hand
using vim) and the data is fine. Still looking into it though

hahaha, K found out what was wrong.. in the function computing
the data for l1 there was extra space was being put in.

ie:

l1 = [ 'abc ' 'ghi ' 'mno ' ]

and I didn't strip it properly after splitting it.. silly me,

well.. live and learn.. thanks guys

Cheers, -h

hiro wrote:
bare in mind that I have a little over 10 million objects in my list
(l2) and l1 contains around 4 thousand
objects.. (i have enough ram in my computer so memory is not a
problem)
Glad to see you solved the problem with the trailing space.

Just one minor point, I did say
or probably less efficiently ...
As far as i know, my suggestion's running time is
proportional to len(l1)*len(l2), which gets quite
big for your case where l1 and l2 are large lists.

If I understand how python dictionaries work, Paul Rubin's
suggestion
from itertools import izip, count
pos = map(dict(izip(l2, count())).__getitem__, l1)
or the (I think) approximately equivalent

from itertools import izip, count
d = dict(izip(l2,count()))
pos = [ d[i] for i in l1 ]

or the more memory intensive

d = dict(zip(l2,range(len(l2))))
pos = [ d[i] for i in l1 ]

should all take take running time proportional to
(len(l1)+len(l2))*log(len(l2))

For len(l1)=4,000 and len(l2)=10,000,000
Paul's suggestion is likely to take
about 1/100th of the time to run, ie
be about 100 times as fast. I was trying
to point out a somewhat clearer and simpler
(but slower) alternative.

Charles
Charles Sanders <C.*******************@BoM.GOv.AUwrites:
from itertools import izip, count
d = dict(izip(l2,count()))
pos = [ d[i] for i in l1 ]

or the more memory intensive

d = dict(zip(l2,range(len(l2))))
pos = [ d[i] for i in l1 ]
If you're itertools-phobic you could alternatively write

d = dict((x,i) for i,x in enumerate(l2))
pos = [ d[i] for i in l1 ]

dict access and update is supposed to take approximately constant time,
btw. They are implemented as hash tables.
Jun 25 '07 #13

