list comprehention

Mathijs

Hi,

Python beginner here and very much enjoying it. I'm looking for a
pythonic way to find how many listmembers are also present in a reference
list. Don't count duplicates (eg. if you already found a matching member
in the ref list, you can't use the ref member anymore).

Example1:
ref=[2, 2, 4, 1, 1]
list=[2, 3, 4, 5, 3]
solution: 2

Example2:
ref=[2, 2, 4, 1, 1]
list=[2, 2, 5, 2, 4]
solution: 3 (note that only the first two 2's count, the third 2 in the
list should not be counted)

Any suggestions or comments?

Thanks.
M.
#my failing effort:
sum([min(r.count(n)-l[:i].count(n),l.count(n)) for i,n in enumerate(l)])

#test lists
import random
#reference list
r=[random.randint(1,5) for n in range(5)]
#list
l=[random.randint(1,5) for n in range(5)]

Jan 19 '06 #1

Subscribe Reply

1497

Tim Chase

> Python beginner here and very much enjoying it. I'm looking

for a pythonic way to find how many listmembers are also
present in a reference list. Don't count duplicates (eg. if
you already found a matching member in the ref list, you can't
use the ref member anymore).

Example1:
ref=[2, 2, 4, 1, 1]
list=[2, 3, 4, 5, 3]
solution: 2

Example2:
ref=[2, 2, 4, 1, 1]
list=[2, 2, 5, 2, 4]
solution: 3 (note that only the first two 2's count, the third
2 in the list should not be counted)

It sounds like you're looking for "set" operations: (using "ell"
for clarity)

from sets import Set
a = [2,2,4,1,1]
b = [2,3,4,5,3]
setA = Set(a)
setB = Set(b)
results = setA.intersection(setB)
results Set([2,4]) intersection = [x for x in results]
intersection

[2,4]
I'm a tad confused by the help, as it sounds like sets are
supposed to be first-class citizens, but in ver2.3.5 that I'm
running here (or rather "there", on a friend's box), I have to
"import sets" which I didn't see mentioned in the reference manual.

-one of many tims on the list
tim = Set(["bald", "vegetarian", "loving husband"])

:)

Jan 19 '06 #2

markscala

def reference(alist1,alist2):
counter = 0
for x in lis:
if x in ref:
ref.pop(ref.index(x))
counter += 1
return counter

this works I think for your examples, but you should check it against
them and other cases.
good luck

Jan 19 '06 #3

markscala

revision of previous:

def reference(reflist,alist2):
counter = 0
for x in alist2:
if x in reflist:
reflist.pop(reflist.index(x))
counter += 1
return counter

Jan 19 '06 #4

markscala

another approach:

ref = [2,2,4,1,1]
lis = [2,2,5,2,4]

len([ref.pop(ref.index(x)) for x in lis if x in ref])

Jan 19 '06 #5

bonono

Tim Chase wrote:

Python beginner here and very much enjoying it. I'm looking
> for a pythonic way to find how many listmembers are also
> present in a reference list. Don't count duplicates (eg. if
> you already found a matching member in the ref list, you can't
> use the ref member anymore).
>
> Example1:
> ref=[2, 2, 4, 1, 1]
> list=[2, 3, 4, 5, 3]
> solution: 2
>
> Example2:
> ref=[2, 2, 4, 1, 1]
> list=[2, 2, 5, 2, 4]
> solution: 3 (note that only the first two 2's count, the third
> 2 in the list should not be counted)

It sounds like you're looking for "set" operations: (using "ell"
for clarity)
>>> from sets import Set
>>> a = [2,2,4,1,1]
>>> b = [2,3,4,5,3]
>>> setA = Set(a)
>>> setB = Set(b)
>>> results = setA.intersection(setB)
>>> results Set([2,4]) >>> intersection = [x for x in results]
>>> intersection

[2,4]

won't set remove duplicates which he wants to preserve ? He is not just
looking for the 'values' that is in common, but the occurence as well,
if I understand the requirement correctly.

I would just build a dict with the value as the key and occurence as
the value then loop the list and lookup.

Jan 19 '06 #6

Peter Otten

Mathijs wrote:

Python beginner here and very much enjoying it. I'm looking for a
pythonic way to find how many listmembers are also present in a reference
list. Don't count duplicates (eg. if you already found a matching member
in the ref list, you can't use the ref member anymore).

Example1:
ref=[2, 2, 4, 1, 1]
list=[2, 3, 4, 5, 3]
solution: 2

Example2:
ref=[2, 2, 4, 1, 1]
list=[2, 2, 5, 2, 4]
solution: 3 (note that only the first two 2's count, the third 2 in the
list should not be counted)

Any suggestions or comments?

sum(min(list.count(n), ref.count(n)) for n in set(ref))

Is that it?

Peter

Jan 19 '06 #7

Peter Otten

Tim Chase wrote:

I'm a tad confused by the help, as it sounds like sets are
supposed to be first-class citizens, but in ver2.3.5 that I'm
running here (or rather "there", on a friend's box), I have to
"import sets" which I didn't see mentioned in the reference manual.

set and frozenset are a builtins starting with Python 2.4.

http://www.python.org/2.4/highlights.html

"""
New or upgraded built-ins

built-in sets - the sets module, introduced in 2.3, has now been implemented
in C, and the set and frozenset types are available as built-in types (PEP
218)
"""

Peter

Jan 19 '06 #8

Matt

Tim Chase wrote:
<snip>

I'm a tad confused by the help, as it sounds like sets are
supposed to be first-class citizens, but in ver2.3.5 that I'm
running here (or rather "there", on a friend's box), I have to
"import sets" which I didn't see mentioned in the reference manual.

-one of many tims on the list
tim = Set(["bald", "vegetarian", "loving husband"])

:)

The builtin "set" was added in 2.4, I believe (note lowercase "s"):

http://docs.python.org/whatsnew/node2.html

print "vegetarian".replace("etari","")
:-)

Jan 19 '06 #9

Tim Chase

>> >Python beginner here and very much enjoying it. I'm looking

> for a pythonic way to find how many listmembers are also
> present in a reference list. Don't count duplicates (eg. if

[snipped]
won't set remove duplicates which he wants to preserve ?

My reading was that the solution shouldn't count duplicates
("Don't count duplicates"). However, once you mentioned it, and
I saw other folks' responses that looked very diff. from my own,
I re-read the OP's comments and found that I missed this key bit:

"note that only the first *two* 2's count, the third 2
in the list should not be counted"
(*emphasis* mine)

and that the desired result was a count (though a len(Set())
would return the right count if the OP wanted the true
intersection, but that's beside the point).

My error. Sorry, ladies and gentlemen :)

I'm partial to the elegance of markscala's suggestion of:

len([ref.pop(ref.index(x)) for x in lis if x in ref])

though it might need to come with a caveat that it doesn't leave
"ref" in the same state is it was originally, so it should be
copied and then manipulated thusly:

r = ref[:]
len([r.pop(r.index(x)) for x in (lis if x in r])

which would then leave ref undisturbed. As tested:

import random
ref = [random.randint(1,5) for n in range(5)]
for x in range(1,10):
lis = [random.randint(1,5) for n in range(5)]
r = ref[:]
print repr((r,lis))
print len([r.pop(r.index(x)) for x in lis if x in r])

seems to give the results the OP describes.

-tim

Jan 19 '06 #10

Paddy

Hi,
I liked the twist at the end when you state that only the first two 2's
count. It reminded me
of my maths O'level revision where you always had to read the question
thoroughly.

Here is what I came up with:

ref [2, 2, 4, 1, 1] lst [2, 2, 5, 2, 4] tmp = [ [val]*min(lst.count(val), ref.count(val)) for val in set(ref)]
tmp [[], [2, 2], [4]] answer = [x for y in tmp for x in y]
answer [2, 2, 4]
I took a lot from Peter Ottens reply to generate tmp then flattened the
inner lists.

After a bit more thought, the intermediate calculation of tmp can be
removed with a
little loss in clarity though, to give:
answer = [ val for val in set(ref) for x in range(min(lst.count(val), ref.count(val)))]
answer

[2, 2, 4]

- Cheers, Paddy.

Jan 19 '06 #11

Duncan Booth

Mathijs wrote:

Python beginner here and very much enjoying it. I'm looking for a
pythonic way to find how many listmembers are also present in a
reference list. Don't count duplicates (eg. if you already found a
matching member in the ref list, you can't use the ref member
anymore).

Example1:
ref=[2, 2, 4, 1, 1]
list=[2, 3, 4, 5, 3]
solution: 2

Example2:
ref=[2, 2, 4, 1, 1]
list=[2, 2, 5, 2, 4]
solution: 3 (note that only the first two 2's count, the third 2 in
the list should not be counted)

Here's the way I would do it:

def occurrences(it): res = {}
for item in it:
if item in res:
res[item] += 1
else:
res[item] = 1
return res
ref=[2, 2, 4, 1, 1]
lst=[2, 2, 5, 2, 4]
oref = occurrences(ref)
sum(min(v,oref.get(k,0)) for (k,v) in occurrences(lst).iteritems()) 3 lst=[2, 3, 4, 5, 3]
sum(min(v,oref.get(k,0)) for (k,v) in occurrences(lst).iteritems()) 2

Or in other words, define a function to return a dictionary containing
a count of the number of occurrences of each element in the list (this
assumes that the list elements are hashable). Then you just add up the
values in the test list making sure each count is limited to no higher than
the reference count.

Jan 20 '06 #12

Bryan Olson

Duncan Booth wrote:

Here's the way I would do it:
def occurrences(it):

res = {}
for item in it:
if item in res:
res[item] += 1
else:
res[item] = 1
return res

I slightly prefer:

def occurrences(it):
res = {}
res[item] = res.get(item, 0) + 1
return res
[...] Or in other words, define a function to return a dictionary containing
a count of the number of occurrences of each element in the list (this
assumes that the list elements are hashable). Then you just add up the
values in the test list making sure each count is limited to no higher than
the reference count.

Resulting in a linear-time average case, where the posted
list-comprehension-based solutions are quadratic. The title
of the thread is unfortunate.

The generalized problem is multiset (AKA "bag") intersection:

http://en.wikipedia.org/wiki/Bag_(mathematics)
--
--Bryan

Jan 20 '06 #13

Mathijs

Op 19 jan 2006 vond "ma*******@gmail.com" :

another approach:

ref = [2,2,4,1,1]
lis = [2,2,5,2,4]

len([ref.pop(ref.index(x)) for x in lis if x in ref])

This is the type of solution I was hoping to see: one-liners, with no use
of local variables. As Tim Chase already wrote, it has only one less
elegant side: it alters the original ref list.

Thanks for your suggestion.

Jan 23 '06 #14

Mathijs

Op 19 jan 2006 vond Peter Otten <__*******@web.de> :

sum(min(list.count(n), ref.count(n)) for n in set(ref))

Is that it?

Seems like this is it! Thanks.

Jan 23 '06 #15

Mathijs

Op 19 jan 2006 vond "Paddy" <pa*******@netscape.net>:

answer = [ val for val in set(ref) for x in
range(min(lst.count(val), ref.count(val)))] answer

[2, 2, 4]

I don't think it's correct. Your algoritm with the ref and lst below gives
3 as answer. The answer should have been 2 (1,3).

ref=[3, 3, 1, 1, 3]
lst=[5, 1, 4, 5, 3]

Jan 23 '06 #16

Mathijs

Op 20 jan 2006 vond Duncan Booth <du**********@invalid.invalid>:

Or in other words, define a function to return a dictionary containing
a count of the number of occurrences of each element in the list (this
assumes that the list elements are hashable). Then you just add up the
values in the test list making sure each count is limited to no higher
than the reference count.

Thanks. Though I don't know much about python (yet), this is more or less
the way I'de do it the language I'm more comfortable with (object pascal),
and I wouldn't label this as a pythonic solution. I could be wrong,
though:)

Jan 23 '06 #17

Duncan Booth

Mathijs wrote:

Op 20 jan 2006 vond Duncan Booth <du**********@invalid.invalid>:
Or in other words, define a function to return a dictionary
containing a count of the number of occurrences of each element in
the list (this assumes that the list elements are hashable). Then you
just add up the values in the test list making sure each count is
limited to no higher than the reference count.

Thanks. Though I don't know much about python (yet), this is more or
less the way I'de do it the language I'm more comfortable with (object
pascal), and I wouldn't label this as a pythonic solution. I could be
wrong, though:)

I'm curious what you think isn't pythonic about this solution?

Jan 23 '06 #18

Peter Otten

Mathijs wrote:

Op 20 jan 2006 vond Duncan Booth <du**********@invalid.invalid>:
Or in other words, define a function to return a dictionary containing
a count of the number of occurrences of each element in the list (this
assumes that the list elements are hashable). Then you just add up the
values in the test list making sure each count is limited to no higher
than the reference count.

Thanks. Though I don't know much about python (yet), this is more or less
the way I'de do it the language I'm more comfortable with (object pascal),
and I wouldn't label this as a pythonic solution. I could be wrong,
though:)

You *are* wrong. Also, as the lists grow, Duncan's approach scales *much*
better than, e. g., mine.

If picking a better algorithm were unpythonic there would not be much value
in striving for pythonic solutions.

Peter

Jan 23 '06 #19

Steven D'Aprano

On Mon, 23 Jan 2006 17:41:55 +0000, Mathijs wrote:

Op 19 jan 2006 vond "ma*******@gmail.com" :
another approach:

ref = [2,2,4,1,1]
lis = [2,2,5,2,4]

len([ref.pop(ref.index(x)) for x in lis if x in ref])

This is the type of solution I was hoping to see: one-liners, with no use
of local variables.

Because you like unreadable, incomprehensible, unmaintainable code?

*wink*

--
Steven.

Jan 23 '06 #20

Patrick Maupin

Duncan Booth showed how to solve a problem posed by Mathijs. This is
very similar to Duncan's solution, except I (ab)use setdefault on a
regular basis...

def occurrences(t): .... res = {}
.... for item in t:
.... res.setdefault(item,[0])[0] += 1
.... return res
.... ref = [2,2,4,1,1]
lst = [2,2,5,2,4]
oref = occurrences(ref)
sum(min(v[0],oref.get(k,[0])[0]) for (k,v) in occurrences(lst).iteritems()) 3
Actually, this brings up an interesting point:
{}.setdefault('foo',0) += 1

SyntaxError: can't assign to function call

I understand why this doesn't work, but it would sure be nice if it
did. I think somebody was mentioning "mutable ints" at one point,
which is basically what I abuse [0] to provide. If I were doing a lot
of this in one place, I would code a mutable integer class, and then
the rest of the code would get simpler.

Regards,
Pat

Jan 23 '06 #21

Bryan Olson

Bryan Olson wrote:

Duncan Booth wrote:
Here's the way I would do it:
> def occurrences(it):

res = {}
for item in it:
if item in res:
res[item] += 1
else:
res[item] = 1
return res

I slightly prefer:

def occurrences(it):
res = {}
res[item] = res.get(item, 0) + 1
return res

Arg! In illustrating a trivial point, I dropped something essential.
This version has almost three minutes of testing behind it:

def occurrences(it):
res = {}
for item in it:
res[item] = res.get(item, 0) + 1
return res
--
--Bryan

Jan 24 '06 #22

Morten Vold

On 23/01/2006 18:41, Mathijs uttered:

len([ref.pop(ref.index(x)) for x in lis if x in ref])
This is the type of solution I was hoping to see: one-liners, with no
use of local variables. As Tim Chase already wrote, it has only one
less elegant side: it alters the original ref list.

Thanks for your suggestion.

May I suggest another one-liner:

len(set(ref).intersection(lis))

I have no idea how this scales/performs compared to the other
suggestions you've received, but I find it immediately comprehensible.
-- Morten

Jan 24 '06 #23

Peter Otten

[Mathijs]

Example2:
ref=[2, 2, 4, 1, 1]
list=[2, 2, 5, 2, 4]
solution: 3 (note that only the first two 2's count, the third 2 in the
list should not be counted)
[Morten Vold]
May I suggest another one-liner:

len(set(ref).intersection(lis))

I have no idea how this scales/performs compared to the other
suggestions you've received, but I find it immediately comprehensible.

len(set([2, 2, 4, 1, 1]).intersection([2, 2, 5, 2, 4]))

2

Close, but no cigar.

Peter

Jan 24 '06 #24

Morten Vold

On 24/01/2006 09:46, Peter Otten uttered:

len(set([2, 2, 4, 1, 1]).intersection([2, 2, 5, 2, 4]))

2
Close, but no cigar.

I need more coffee! (note to self: Always read entire problem set first)

-- Morten

Jan 24 '06 #25

Duncan Booth

Patrick Maupin wrote:

def occurrences(t):
... res = {}
... for item in t:
... res.setdefault(item,[0])[0] += 1
... return res

.... I think somebody was mentioning "mutable ints" at one point,
which is basically what I abuse [0] to provide. If I were doing a lot
of this in one place, I would code a mutable integer class, and then
the rest of the code would get simpler.

Or you could use the facilities built in to Python: use the get method as
Bryan did, and the code gets simpler without using mutable integers.

I prefer writing an 'if' statement here, Bryan prefers 'get', that's just a
choice of style. But 'setdefault' here, that has no style.

Jan 24 '06 #26

Patrick Maupin

Duncan Booth wrote:

I prefer writing an 'if' statement here, Bryan prefers 'get', that's just a
choice of style. But 'setdefault' here, that has no style.

Well, I'm often told I have no style, and I _did_ admit that it's an
abuse of setdefault. However, I often use setdefault to populate
nested dictionaries, or dictionaries of sets or lists, e.g.:

for x, y in somebiglist:
bigdict.setdefault(x,set()).add(y) # Strips duplicates
for x, y in somebiglist:
bigdict.setdefault(x,[]).append(y) # Preserves duplicates

To my mind, this latter is so much cleaner and clearer than any of the
alternatives that it just isn't funny:

for x, y in somebiglist:
if x in bigdict:
bigdict[x].append(y)
else:
bigdict[x] = [y]

for x, y in somebiglist:
if x not in bigdict:
bigdict[x] = []
bigdict[x].append(y)

for x, y in somebiglist:
try:
bigdict[x].append(y)
except KeyError:
bigdict[x] = [y]

etc.

Since I use setdefault in this manner quite often, I am very
comfortable with it. On a single line I know what I am creating (list,
dict, set, etc.), what the default value is, and how it is being
modified.

Because I am NOT a believer in the Perl idiom of TMTOWTDI, and because,
TO ME, setdefault screams "This line is creating and subsequently
modifying a dictionary entry", setdefault is absolutely my first choice
for this particular task.

Having said that, my enthusiasm for setdefault (for the given problem!)
IS tempered somewhat by having to abuse a list as a mutable integer.
That is the only reason that I concede that my solution is an abuse of
setdefault, but the setdefault idiom is (to me) so clear and compelling
that I view it as a toss-up whether to use a single line setdefault or
an if/else in this case.

And, as I mentioned earlier, if I had to do this same thing LOTS of
times in a program, I most likely would code a class (perhaps not as
fancy as a true mutable int, but certainly something with an increment
method), because, IMO, it would be a lot cleaner (hmmm, maybe there's
no style points in that) than tons and tons of if/else statements.

Regards,
Pat

Jan 24 '06 #27

Duncan Booth

Patrick Maupin wrote:

Duncan Booth wrote:
I prefer writing an 'if' statement here, Bryan prefers 'get', that's
just a choice of style. But 'setdefault' here, that has no style.

Well, I'm often told I have no style, and I _did_ admit that it's an
abuse of setdefault. However, I often use setdefault to populate
nested dictionaries, or dictionaries of sets or lists, e.g.:

for x, y in somebiglist:
bigdict.setdefault(x,set()).add(y) # Strips duplicates
for x, y in somebiglist:
bigdict.setdefault(x,[]).append(y) # Preserves duplicates

To my mind, this latter is so much cleaner and clearer than any of the
alternatives that it just isn't funny:

Yes, but storing a mutable is a not unreasonable use of setdefault. What I
objected to was storing an immutable just to overwrite it immediately.

Also, while I agree it is shorter, I'm not convinced that it is much
cleaner, and it is likely to be slower if there are a lot of duplicates,
possibly a lot slower if the mutable object has much overhead on its
construction.

Jan 24 '06 #28

Paddy

Umm,
My answer is not correct, but for a different reason; it seems you need
the length of my
previous answer, thus:

PythonWin 2.4 (#60, Feb 9 2005, 19:03:27) [MSC v.1310 32 bit (Intel)]
on win32.
Portions Copyright 1994-2004 Mark Hammond (mh******@skippinet.com.au) -
see 'Help/About PythonWin' for further copyright information.

ref = [3, 3, 1, 1, 3]
lst=[5, 1, 4, 5, 3]
answer = len([ val for val in set(ref) for x in range(min(lst.count(val), ref.count(val)))])
answer 2

- Paddy.

Jan 24 '06 #29

Mathijs

23 jan 2006 ta (Steven D'Aprano) shuo le:

This is the type of solution I was hoping to see: one-liners, with no
use of local variables.
Because you like unreadable, incomprehensible, unmaintainable code?

For practical use: no! But I'm just learning python and to understand
sets/lists/dicts and it's functions, this seemed like good practise:) and
besides that: it's fun! A solution with local vaiables, if-statemenst and a
for loop wouldn't be much different from what I know with my current
programming skills.
*wink*

Yeah, yeah, I know, but I felt a response was on its place. :)

Jan 25 '06 #30

list comprehention

Similar topics