<a href="https://bytes.com/topic/python/answers/616217-iterating-across-filtered-list">Iterating across a filtered list

"Drew" <ol*****@gmail.comwrites:

I'm currently writing a toy program as I learn python that acts as a
simple address book. I've run across a situation in my search function
where I want to iterate across a filtered list. My code is working
just fine, but I'm wondering if this is the most "elegant" way to do
this. Essentially, I'm searching the dict self.contacts for a key that
matches the pattern entered by the user. If so, I print the value
associated with that key. A pastie to the method is below, any help/
advice is appreciated:

If I can decipher your Ruby example (I don't know Ruby), I think you
want:

for name,contact in contacts.iteritems():
if re.search('search', name):
print contact

If you just want to filter the dictionary inside an expression, you
can use a generator expression:

d = ((name,contact) for (name,contact) in contacts.iteritems() \
if re.search('search', name))

print '\n'.join(d) # prints items from filtered dict, one per line

Note that d is an iterator, which means it mutates when you step
through it.

Mar 13 '07 #2

Drew

On Mar 13, 2:42 pm, Paul Rubin <http://phr...@NOSPAM.invalidwrote:

If I can decipher your Ruby example (I don't know Ruby), I think you
want:

for name,contact in contacts.iteritems():
if re.search('search', name):
print contact

If you just want to filter the dictionary inside an expression, you
can use a generator expression:

d = ((name,contact) for (name,contact) in contacts.iteritems() \
if re.search('search', name))

print '\n'.join(d) # prints items from filtered dict, one per line

Note that d is an iterator, which means it mutates when you step
through it.

Paul -

You're exactly on the mark. I guess I was just wondering if your first
example (that is, breaking the if statement away from the iteration)
was preferred rather than initially filtering and then iterating.
However, you're examples make a lot of sense are are quite helpful.

Thanks,
Drew

Mar 13 '07 #3

On Mar 13, 6:04 pm, "Drew" <olso...@gmail.comwrote:

All -

Hi!

[snip]

http://pastie.caboo.se/46647

There is no need for such a convoluted list comprehension as you
iterate over it immediately! It is clearer to put the filtering logic
in the for loop. Moreover you recalculate the regexp for each element
of the list. Instead I would do something like this:

def find(search_str, flags=re.IGNORECASE):
print "Contact(s) found:"
search = re.compile(search_str, flags).search
for name, contact in self.contacts.items():
if search(name):
print contact
print

Although I would rather have one function that returns the list of all
found contacts:

def find(search_str, flags=re.IGNORECASE):
search = re.compile(search_str, flags).search
for name, contact in self.contacts.items():
if search(name):
yield contact

And then another one that prints it.

Side note: I'm learning python after ruby experience. In ruby I would
do something like:

contacts.find_all{|name,contact| name =~ /search/}.each{|name,contact|
puts contact}

And that's why you're right to learn Python ;)

HTH

--
Arnaud

Mar 13 '07 #4

"Drew" <ol*****@gmail.comwrites:

You're exactly on the mark. I guess I was just wondering if your first
example (that is, breaking the if statement away from the iteration)
was preferred rather than initially filtering and then iterating.

I think the multiple statement version is more in Python tradition.
Python is historically an imperative, procedural language with some OO
features. Iterators like that are a new Python feature and they have
some annoying characteristics, like the way they mutate when you touch
them. It's usually safest to create and consume them in the same
place, e.g. creating some sequence and passing it through map, filter, etc.

Mar 13 '07 #5

"Arnaud Delobelle" <ar*****@googlemail.comwrites:

in the for loop. Moreover you recalculate the regexp for each element
of the list.

The re library caches the compiled regexp, I think.

Mar 13 '07 #6

Bruno Desthuilliers

Paul Rubin a écrit :

"Drew" <ol*****@gmail.comwrites:

>>You're exactly on the mark. I guess I was just wondering if your first
example (that is, breaking the if statement away from the iteration)
was preferred rather than initially filtering and then iterating.

I think the multiple statement version is more in Python tradition.

I don't know if I qualify as a Python traditionalist, but I'm using
Python since the 1.5.2 days, and I usually favor list comps or generator
expressions over old-style loops when it comes to this kind of operations.

Python is historically an imperative, procedural language with some OO
features.

Python has had functions as first class objects and (quite-limited-but)
anonymous functions, map(), filter() and reduce() as builtin funcs at
least since 1.5.2 (quite some years ago).

Iterators like that are a new Python feature

List comps are not that new (2.0 or 2.1 ?):
print "\n".join([contact for name, contact in contacts.items() \
if search.match(name)])

and they have
some annoying characteristics, like the way they mutate when you touch
them.

While sequences are iterables, all iterables are not sequences. Know
what you use, and you'll be fine.

It's usually safest to create and consume them in the same
place, e.g. creating some sequence and passing it through map, filter, etc.

Safest ? Why so ?

Mar 13 '07 #7

On Mar 13, 7:36 pm, Paul Rubin <http://phr...@NOSPAM.invalidwrote:

"Arnaud Delobelle" <arno...@googlemail.comwrites:
in the for loop. Moreover you recalculate the regexp for each element
of the list.

The re library caches the compiled regexp, I think.

That would surprise me.
How can re.search know that string.lower(search) is the same each
time? Or else there is something that I misunderstand.

Moreover:

In [49]: from timeit import Timer
In [50]: Timer('for i in range(1000): search("abcdefghijk")', 'import
re; search=re.compile("ijk").search').timeit(100)
Out[50]: 0.36964607238769531

In [51]: Timer('for i in range(1000): re.search("ijk",
"abcdefghijk")', 'import re;
search=re.compile("ijk").search').timeit(100)
Out[51]: 1.4777300357818604

--
Arnaud

Mar 13 '07 #8

Bruno Desthuilliers <bd*****************@free.quelquepart.frwrites:

I don't know if I qualify as a Python traditionalist, but I'm using
Python since the 1.5.2 days, and I usually favor list comps or
generator expressions over old-style loops when it comes to this kind
of operations.

I like genexps when they're nested inside other expressions so they're
consumed as part of the evaluation of the outer expression. They're a
bit scary when the genexp-created iterator is saved in a variable.

Listcomps are different, they allocate storage for the entire list, so
they're just syntax sugar for a loop. They have an annoying
misfeature of their

Python has had functions as first class objects and
(quite-limited-but) anonymous functions, map(), filter() and reduce()
as builtin funcs at least since 1.5.2 (quite some years ago).

True, though no iterators so you couldn't easily use those functions
on lazily-evaluated streams like you can now.

Iterators like that are a new Python feature
List comps are not that new (2.0 or 2.1 ?):
print "\n".join([contact for name, contact in contacts.items() \
if search.match(name)])

Well you could do it that way but it allocates the entire filtered
list in memory. In this example "\n".join() also builds up a string
in memory, but you could do something different, like run the sequence
through another filter or print out one element at a time, in which
case lazy evaluation can be important (imagine that contacts.iteritems
chugs through a billion row table in an SQL database).

It's usually safest to create and consume them in the same
place, e.g. creating some sequence and passing it through map, filter, etc.
Safest ? Why so ?

Just that things can get confusing if you're consuming the iterator in
more than one place. It can get to be like those old languages where
you had to do your own storage management ;-).

Mar 13 '07 #9

On Mar 13, 8:53 pm, Bruno Desthuilliers
<bdesth.quelquech...@free.quelquepart.frwrote:

Paul Rubin a écrit :

[snip]

Iterators like that are a new Python feature

List comps are not that new (2.0 or 2.1 ?):
print "\n".join([contact for name, contact in contacts.items() \
if search.match(name)])

You can write this, but:
* it is difficult to argue that it is more readable than Paul's (or
my) 'imperative' version;
* it has no obvious performance benefit, in fact it creates a list
unnecessarily (I know you could use a generator with recent python).

and they have
some annoying characteristics, like the way they mutate when you touch
them.

While sequences are iterables, all iterables are not sequences. Know
what you use, and you'll be fine.

....And know when to use for statements :)

--
Arnaud

Mar 13 '07 #10

En Tue, 13 Mar 2007 15:04:50 -0300, Drew <ol*****@gmail.comescribió:

I'm currently writing a toy program as I learn python that acts as a
simple address book. I've run across a situation in my search function
where I want to iterate across a filtered list. My code is working
just fine, but I'm wondering if this is the most "elegant" way to do
this. Essentially, I'm searching the dict self.contacts for a key that
matches the pattern entered by the user. If so, I print the value
associated with that key. A pastie to the method is below, any help/
advice is appreciated:

http://pastie.caboo.se/46647

Side note: I'm learning python after ruby experience. In ruby I would
do something like:

contacts.find_all{|name,contact| name =~ /search/}.each{|name,contact|
puts contact}

Just a few changes:

def find(self, search):
search_re = re.compile(search, re.IGNORECASE)
for result in [self.contacts[name] for name in self.contacts if
search_re.match(name)]:
print result

- you can iterate directly over a dictionary keys using: for key in dict
- you can compile a regexp to re-use it in all loops; using re.IGNORECASE,
you don't need to explicitely convert all to lowercase before comparing
- if all you want to do is to print the results, you can even avoid the
for loop:

print '\n'.join('%s' % self.contacts[name] for name in self.contacts
if search_re.match(name))

--
Gabriel Genellina

Mar 13 '07 #11

On Mar 13, 8:59 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
[snip]

def find(self, search):
search_re = re.compile(search, re.IGNORECASE)
for result in [self.contacts[name] for name in self.contacts if
search_re.match(name)]:
print result

I do not see how

for y in [f(x) for x in L if g(x)]:
do stuff with y

can be preferable to

for x in L:
if g(x):
do stuff with f(x)

What can be the benefit of creating a list by comprehension for the
sole purpose of iterating over it?

--
Arnaud

Mar 13 '07 #12

En Tue, 13 Mar 2007 17:19:53 -0300, Arnaud Delobelle
<ar*****@googlemail.comescribió:

On Mar 13, 7:36 pm, Paul Rubin <http://phr...@NOSPAM.invalidwrote:
>>
The re library caches the compiled regexp, I think.

That would surprise me.
How can re.search know that string.lower(search) is the same each
time? Or else there is something that I misunderstand.

It does.

pyimport re
pyx = re.compile("ijk")
pyy = re.compile("ijk")
pyx is y
True

Both, separate calls, returned identical results. You can show the cache:

pyre._cache
{(<type 'str'>, '%(?:\$(?P<key>.*?)\$)?(?P<modifiers>[-#0-9
+*.hlL]*?)[eEfFgGd
iouxXcrs%]', 0): <_sre.SRE_Pattern object at 0x00A786A0>,
(<type 'str'>, 'ijk', 0): <_sre.SRE_Pattern object at 0x00ABB338>}

--
Gabriel Genellina

Mar 13 '07 #13

En Tue, 13 Mar 2007 18:16:32 -0300, Arnaud Delobelle
<ar*****@googlemail.comescribió:

On Mar 13, 8:59 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
[snip]
>def find(self, search):
search_re = re.compile(search, re.IGNORECASE)
for result in [self.contacts[name] for name in self.contacts if
search_re.match(name)]:
print result

I do not see how

for y in [f(x) for x in L if g(x)]:
do stuff with y

can be preferable to

for x in L:
if g(x):
do stuff with f(x)

What can be the benefit of creating a list by comprehension for the
sole purpose of iterating over it?

No benefit...

--
Gabriel Genellina

Mar 13 '07 #14

On Mar 13, 9:31 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:

En Tue, 13 Mar 2007 17:19:53 -0300, Arnaud Delobelle
<arno...@googlemail.comescribió:

On Mar 13, 7:36 pm, Paul Rubin <http://phr...@NOSPAM.invalidwrote:

The re library caches the compiled regexp, I think.

That would surprise me.
How can re.search know that string.lower(search) is the same each
time? Or else there is something that I misunderstand.

It does.

pyimport re
pyx = re.compile("ijk")
pyy = re.compile("ijk")
pyx is y
True

Both, separate calls, returned identical results. You can show the cache:

OK I didn't realise this. But even so each time there is the cost of
looking up the regexp string in the cache dictionary.

--
Arnaud

Mar 13 '07 #15

=?utf-8?q?=C5=81ukasz_Ligowski?=

Hi,

On Tuesday 13 of March 2007 22:16:32 Arnaud Delobelle wrote:

for x in L:
if g(x):
do stuff with f(x)

for x in itertools.ifilterfalse(g, L):
do stuff

Maybe this would be even better?

L

Mar 13 '07 #16