filter()ing a dict

Robin Cull

Imagine I have a dict looking something like this:

myDict = {"key 1": ["value 1", "value 2", "value 3", "value 4"], "key
2": ["value 1", "value 2"], "key 3": ["value2", "value 3", "value 4"],
"key 4": ["value 1", "value 2", "value 3", "value 4"]}

That is, a set of keys which have a variable length list of associated
values after them. What I want to do is filter out a subset of this
dict to produce another dict that satisfies a set of criteria (in this
case whether it contains all four values) to end up with something
like this:

{"key 1": ["value 1", "value 2", "value 3", "value 4"], "key 4":
["value 1", "value 2", "value 3", "value 4"]}

I've tried using the filter() function so I wrote a small function
that returned True or False depending on whether the length of the
list was what I wanted e.g.

def f(item):
if len(item) == 4:
return True
else:
return False

I then called filter(f, myDict).

I was expecting to be returned a dict which was the subset of myDict
that satisfied f.

This did not happen; the item passed to f was each key of the dict,
not each value list. The call to filter returned an empty list (as
obviously none of the keys satisfied the criteria). If anything I
would have expected an empty dict back as according to the docs:

""filter(function, sequence)" returns a sequence (of the same
type, if possible) consisting of those items from the sequence for
which function(item) is true."

The important bit here is "of the same type, if possible" which I
understood to mean if passed a dict it would return a dict.

I've tried calling filter(f, myDict.values()) which works and I get a
list like this returned:

[["value 1", "value 2", "value 3", "value 4"], ["value 1", "value 2",
"value 3", "value 4"]]

Sort of what I want but I've lost the associativity with the key which
was kind of the point in using a dict in the first place.

I'm guessing that filter() does not work on a dict in the way I think
it should.

I can write something to get around the behaviour in this particular
case (although suggestions would be welcome).

My question is, does anybody else think that when calling filter(f,
dict), it should return a dict of the keys and values of the former
dict where the values satisfy the conditions defined by f()?

Are there reasons for it not to work this way? I must admit I've not
looked into this carefully, just in the context of my script it'd be
useful to.

What are the side effects of making filter() work this way?

Thanks all.

Regards,

Robin

Jul 18 '05 #1

Subscribe Reply

12220

John Roth

According to the doc, filter() processes sequences,
iterators and objects that support the iterator protocol.
A dict is none of those things, unless it supports the iterator
protocol, which is not mentioned in the documentation
as of 2.2.3. Possibly it does in 2.3.0.

If you want to use filter() to process a dict, use the .items()
method. That will get you a list of tuples, where each tuple
contains the key and value. .iteritems() will also work,
of course.

What is returned will be a list of tuples, which you can
use to build a dict.

I'm surprised that what you tried worked at all. The
2.2.3 documentation on dictionaries doesn't suggest that
a dictionary supports the iteration protocol, or what it
returns in that context.

John Roth

"Robin Cull" <ro********@pace.co.uk> wrote in message
news:16************************@posting.google.com ...

Imagine I have a dict looking something like this:

myDict = {"key 1": ["value 1", "value 2", "value 3", "value 4"], "key
2": ["value 1", "value 2"], "key 3": ["value2", "value 3", "value 4"],
"key 4": ["value 1", "value 2", "value 3", "value 4"]}

That is, a set of keys which have a variable length list of associated
values after them. What I want to do is filter out a subset of this
dict to produce another dict that satisfies a set of criteria (in this
case whether it contains all four values) to end up with something
like this:

{"key 1": ["value 1", "value 2", "value 3", "value 4"], "key 4":
["value 1", "value 2", "value 3", "value 4"]}

I've tried using the filter() function so I wrote a small function
that returned True or False depending on whether the length of the
list was what I wanted e.g.

def f(item):
if len(item) == 4:
return True
else:
return False

I then called filter(f, myDict).

I was expecting to be returned a dict which was the subset of myDict
that satisfied f.

This did not happen; the item passed to f was each key of the dict,
not each value list. The call to filter returned an empty list (as
obviously none of the keys satisfied the criteria). If anything I
would have expected an empty dict back as according to the docs:

""filter(function, sequence)" returns a sequence (of the same
type, if possible) consisting of those items from the sequence for
which function(item) is true."

The important bit here is "of the same type, if possible" which I
understood to mean if passed a dict it would return a dict.

I've tried calling filter(f, myDict.values()) which works and I get a
list like this returned:

[["value 1", "value 2", "value 3", "value 4"], ["value 1", "value 2",
"value 3", "value 4"]]

Sort of what I want but I've lost the associativity with the key which
was kind of the point in using a dict in the first place.

I'm guessing that filter() does not work on a dict in the way I think
it should.

I can write something to get around the behaviour in this particular
case (although suggestions would be welcome).

My question is, does anybody else think that when calling filter(f,
dict), it should return a dict of the keys and values of the former
dict where the values satisfy the conditions defined by f()?

Are there reasons for it not to work this way? I must admit I've not
looked into this carefully, just in the context of my script it'd be
useful to.

What are the side effects of making filter() work this way?

Thanks all.

Regards,

Robin

Jul 18 '05 #2

Duncan Booth

"John Roth" <ne********@jhrothjr.com> wrote in
news:vj***********@news.supernews.com:

According to the doc, filter() processes sequences,
iterators and objects that support the iterator protocol.
A dict is none of those things, unless it supports the iterator
protocol, which is not mentioned in the documentation
as of 2.2.3. Possibly it does in 2.3.0.
It looks to me like there is a hole in the Python documentation. The
"what's new in Python 2.2" said:
Iterator support has been added to some of Python's basic types.
Calling iter() on a dictionary will return an iterator which loops
over its keys:

However, the library reference documentation for mapping types doesn't seem
to mention that dictionaries support the iterator protocol directly,
although it does mention the explicit ways to get iterators out of
dictionaries (such as iteritems()).

--
Duncan Booth du****@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Jul 18 '05 #3

Alex Martelli

Robin Cull wrote:

Imagine I have a dict looking something like this:

myDict = {"key 1": ["value 1", "value 2", "value 3", "value 4"], "key
2": ["value 1", "value 2"], "key 3": ["value2", "value 3", "value 4"],
"key 4": ["value 1", "value 2", "value 3", "value 4"]}

That is, a set of keys which have a variable length list of associated
values after them. What I want to do is filter out a subset of this
dict to produce another dict that satisfies a set of criteria (in this
case whether it contains all four values) to end up with something
like this:

{"key 1": ["value 1", "value 2", "value 3", "value 4"], "key 4":
["value 1", "value 2", "value 3", "value 4"]}

I've tried using the filter() function so I wrote a small function
that returned True or False depending on whether the length of the
list was what I wanted e.g.

def f(item):
if len(item) == 4:
return True
else:
return False
This function expects to be passed the value (not the item).
I then called filter(f, myDict).
While this passes the keys (not the items, but not the values either).
I was expecting to be returned a dict which was the subset of myDict
that satisfied f.

This did not happen; the item passed to f was each key of the dict,
not each value list. The call to filter returned an empty list (as
obviously none of the keys satisfied the criteria). If anything I
would have expected an empty dict back as according to the docs:

""filter(function, sequence)" returns a sequence (of the same
type, if possible) consisting of those items from the sequence for
which function(item) is true."

The important bit here is "of the same type, if possible" which I
understood to mean if passed a dict it would return a dict.
Please check the current docs at
http://www.python.org/doc/current/li...-in-funcs.html
and let us know if you consider them clearer now. Specifically,
the sentence:
"""
If <i>list</i> is a string or a tuple, the result also has that type;
otherwise it is always a list.
"""
should help (despite the unfortunate name chosen for the argument).

I've tried calling filter(f, myDict.values()) which works and I get a
list like this returned:

[["value 1", "value 2", "value 3", "value 4"], ["value 1", "value 2",
"value 3", "value 4"]]

Sort of what I want but I've lost the associativity with the key which
was kind of the point in using a dict in the first place.
Exactly. It's impossible to reconstruct the right dict from this one.
I'm guessing that filter() does not work on a dict in the way I think
it should.
Doesn't and never will.
I can write something to get around the behaviour in this particular
case (although suggestions would be welcome).
You need to filter on items, which are key-value pairs; therefore:

def f(item):
return len(item[1]) == 4

the value is the 1-th element of the 'item' pair (tuple) while the
key is the 0-th element. I've also simplified the logical structure,
since the == already returns True or False just as you wish, there
is no reason to keep an if/else (though it's just redundant, not
directly harmful).

Then, you need to provide to filter the items, explicitly:

filter(f, myDict.items())

this gives you a list of items (pairs of key-value). Finally you
can pass that to dict to obtain what you want:

dict(filter(f, myDict.items()))

You could equivalently use myDict.iteritems() [may be faster, but
I doubt that matters for a reasonably small dictionary anyway].

Note that filter, these days, is often shunned in favour of a
list-comprehension, which saves the need to write a function as
the predicate for the filtering; with a list comprehension you
might write it all on one line as

dict([ (key,myDict[key]) for key in myDict if len(myDict[key])==4 ])

or, if you prefer:

dict([ (key,value) for key,value in myDict.items() if len(value)==4 ])

and other equivalent ways yet.

My question is, does anybody else think that when calling filter(f,
dict), it should return a dict of the keys and values of the former
dict where the values satisfy the conditions defined by f()?
According to "Ugol's Law", on Usenet, if you ever ask whether you
are the only one who "X" (where "X" is "thinks this", "does that",
"likes the other", etc, etc), the answer is invariably "no".

However, the Ugolian certainty that there must be another person
who shares your opinion in this matter isn't all that relevant.
More interesting is the next question you ask:
Are there reasons for it not to work this way? I must admit I've not
looked into this carefully, just in the context of my script it'd be
useful to.
'filter' unfortunately carries around the complication (for backwards
compatibility) of returning a string when passed a string, a tuple
when passed a tuple, a list in all other cases. It would obviously
be much simpler if it always returned a list. You, on the other hand,
would want to make it even _more_ complicated -- and for reasons that
escape me, but perhaps connected at looking just at the context of
this one script, would want the filtering function to be passed just
the values, rather than (as happens now if you pass filter a dict)
just the keys. Since you CAN recover the value from the key (if you
have access to the dictionary, e.g. as a global variable) but not
vice-versa, I think it's undisputable that such a hypothetical change
(hypothetical because it would break backwards compatibility, and
thus it's mathematically certain that it will not happen) would be
worse than the current situation. Passing a (key,value) pair, i.e
a whole item, would on the other hand not be just as unambiguously
bad -- it would "just" be a complication whose cost (complicating
things) needs to be weighed against its advantage (making it a tad
handier to perform dictionary-filtering in certain cases) - again
hypothetically speaking, of course.

What are the side effects of making filter() work this way?

Presumably that filter then would need to know about one more
special case (besides the two unfortunate, historical ones of
strings and tuples) rather than just being able to iterate on
its second argument. The incompatibility of the filtering function
needing to be passed the whole item [ (key,value) pair ] is the
definite reason this will never happen; whether it should have
been so in the first place is therefore moot (I opine filter's
very existence is unjustified in today's Python, except for
backwards compatibility, but others do differ, of course).

Python 3.0's main thrust (when it finally happens) will be to
*SIMPLIFY* Python by removing a lot of currently existing
basically redundant ways to do things -- the BDFL has said so
at this year's Python conferences. Thus, I expect the filter
function to go away, rather than gaining bells and whistles.

If you can make substantial use cases for the "filtering a
dictionary" functionality, for very large dictionaries where
dict([ (k,d[k]) for k in d if len(d[k])==4 ]) has intolerable
overhead, then one addition that _might_ have some (small)
chance of making it into future releases might be to add to
the dict class a filter method (taking just the predicate as
its argument). I'm still rather dubious that this would
actually stand any real chance, but can't be certain of that.
Alex

Jul 18 '05 #4

Sean Ross

d = {"key 1": ["value 1", "value 2", "value 3", "value 4"],
"key 2": ["value 1", "value 2"],
"key 3": ["value2", "value 3", "value 4"],
"key 4": ["value 1", "value 2", "value 3", "value 4"]}

# using list comprehension
fd = dict([(k,v) for k,v in d.items() if len(v)==4])
print fd

# using filter
value = 1
fd = dict(filter(lambda item:len(item[value])==4, d.items()))
print fd

Jul 18 '05 #5

Aahz

In article <vj***********@news.supernews.com>,
John Roth <ne********@jhrothjr.com> wrote:

I'm surprised that what you tried worked at all. The 2.2.3
documentation on dictionaries doesn't suggest that a dictionary
supports the iteration protocol, or what it returns in that context.

Check the What's New doc on iterators.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

This is Python. We don't care much about theory, except where it intersects
with useful practice. --Aahz

Jul 18 '05 #6

John Hunter

>>>>> "Robin" == Robin Cull <ro********@pace.co.uk> writes:

Robin> My question is, does anybody else think that when calling
Robin> filter(f, dict), it should return a dict of the keys and
Robin> values of the former dict where the values satisfy the
Robin> conditions defined by f()?

The problem is not with filter. When you do

d = {'k1':1, 'k2':2}
for x in d: print x

It prints the keys. That is: by default, iterating over a dictionary
iterates over the keys. So when you filter a dict, it iterates over
the dictionary which means iterating over the keys. SO filter and
dict are behaving as advertised. As you noted, you can call
dictionary functions to iterate over the keys, values, or key, value
pairs.

It is easy to do what you want with list comprehensions

myDict = {"key 1": ["value 1", "value 2", "value 3", "value 4"],
"key 2": ["value 1", "value 2"],
"key 3": ["value2", "value 3", "value 4"],
"key 4": ["value 1", "value 2", "value 3", "value 4"]}
len4Dict = dict([ (k,v) for k,v in myDict.items() if len(v)==4])
print len4Dict
Cheers,
John Hunter

Jul 18 '05 #7

Carl Banks

Robin Cull wrote:

My question is, does anybody else think that when calling filter(f,
dict), it should return a dict of the keys and values of the former
dict where the values satisfy the conditions defined by f()?

Why shouldn't filter apply to the keys? You want it to apply to
values, but don't you think someone else might want it for the keys?
Who's to say what it should be? Better to just leave it take care of
sequences.
--
CARL BANKS
"You don't run Microsoft Windows. Microsoft Windows runs you."

Jul 18 '05 #8

Robin Cull

ro********@pace.co.uk (Robin Cull) wrote in message news:<16************************@posting.google.co m>...
<SNIP>

I can write something to get around the behaviour in this particular
case (although suggestions would be welcome).

<SNIP>

Thanks all for your explanations and examples. I quite like the list
comprehension methods. I didn't comprehend (pun intended) what list
comprehensions were until I saw these examples so I've learnt
something new! :)

For my particular script, which is likely to be used by pythonistas
even less experienced than me, I decided to solve the problem this
way:

def filterDict(testFunction, dictToFilter):
newDict = {}
for k, v in dictToFilter.items():
if testFunction(v):
newDict[k] = v
return newDict

Seems to work for what I want it to do and since it's a standalone
defined function which mimics filter() should be nice and easy to
understand for people who have to look at it later.

I appreciate all your replies.

Regards,

Robin

Jul 18 '05 #9

Sean Ross

"Robin Cull" <ro********@pace.co.uk> wrote in message
news:16**************************@posting.google.c om...

ro********@pace.co.uk (Robin Cull) wrote in message news:<16************************@posting.google.co m>... For my particular script, which is likely to be used by pythonistas
even less experienced than me, I decided to solve the problem this
way:

def filterDict(testFunction, dictToFilter):
newDict = {}
for k, v in dictToFilter.items():
if testFunction(v):
newDict[k] = v
return newDict

Seems to work for what I want it to do and since it's a standalone
defined function which mimics filter() should be nice and easy to
understand for people who have to look at it later.

I actually find the following version atleast as easy to understand as the
version above, but I perhaps I lack perspective.

def filterdict(predicate, mapping):
"predicate() must take one (key,value) tuple as argument and return a
boolean"
return dict(filter(predicate, mapping.iteritems()))

# useage:
fd = filterdict(lambda item: len(item[1])==4, d)

Jul 18 '05 #10

Similar topics

181

8601

map/filter/reduce/lambda opinions and background unscientificmini-survey

by: Tom Anderson | last post by:

Comrades, During our current discussion of the fate of functional constructs in python, someone brought up Guido's bull on the matter: http://www.artima.com/weblogs/viewpost.jsp?thread=98196 ...

Python

1770

DOM sub trees whilst SAX'ing in perl?

by: bugbear | last post by:

I need to process some XML files that are rather large. However their structure may usefully be expressed as <ELEMENT FILE (RECORD)+> .. .. .. Each record is a few Kb. The files are many...

.NET Framework

4412

xmltextreader filter results based on attribute value

by: soupaman | last post by:

Im trying to output some filtered xml using the xmlTextReader I know the code and commenting needs cleaned up and eventually plan to add the values to a dataset. currently what this is doing is...

.NET Framework

1645

filter on form with VBA

by: tod4 | last post by:

Hi, Im using such procedure on my form to filtering dates to raport: Private Sub wyswietlraport Click() Dim i As Integer Dim Indexgorny As Integer Dim warunek As String Dim klienci() As...

Microsoft Access / VBA

3002

dict.reserve and other tricks

by: bearophileHUGS | last post by:

I have started doing practice creating C extensions for CPython, so here are two ideas I have had, possibly useless. If you keep adding elements to a CPython dict/set, it periodically rebuilds...

Python

6600

How to test if one dict is subset of another?

by: Jay Tee | last post by:

Hi, I have some code that does, essentially, the following: - gather information on tens of thousands of items (in this case, jobs running on a compute cluster) - store the information as a...

Python

2302

How to build Hierarchies of dict's? (Prototypes in Python?)

by: Charles D Hixson | last post by:

I'm sure I've read before about how to construct prototypes in Python, but I haven't been able to track it down (or figure it out). What I basically want is a kind of class that has both class...

Python

3317

Quick Filter Dictionary

by: abcd | last post by:

Hi, I have a dictionary which may contain various keys/values, however, it will always contain 'name' and 'age' keys. This dictionary is kept inside a class, such as.... class Person: def...

Python

4979

Clustering the keys of a dict according to its values

by: Florian Brucker | last post by:

Hi everybody! Given a dictionary, I want to create a clustered version of it, collecting keys that have the same value: {1:, 2:, 3:} That is, generate a new dict which holds for each value...

Python

7148

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

7367

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

7430

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

5673

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

5072

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

4743

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

1581

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

790

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

451

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

General