By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,714 Members | 1,318 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,714 IT Pros & Developers. It's quick & easy.

matching objects by a tuple field criterion

P: n/a
i have a large collection of python objects, each of which contains an
integer 6-tuple as part of its data payload. what i need to be able to
do is select only those objects which meet a simple tuple element
wildcard matching criterion. e.g. given the following python objects:

object A includes tuple (1,2,3,4,5,6)
object B includes tuple (1,4,4,4,11,1)
object C includes tuple (1,3,9,1,1,1)

all tuples are unique. for what it's worth, the values in each field
are independent of the other fields and range from 1 to 14. although
'large', my collection is sparse cf. the 14^6 possible tuples.

i want to search on *one only* tuple field/value. if my search
criterion is (*,*,*,4,*,*), then i want to find object A and object B.
if (1,*,*,*,*,*), i want to find objects A, B, and C, etc. i will only
ever specify an integer match for one tuple field.

i can think of some naive approaches, but is there an elegant way to
do this?

Jun 10 '07 #1
Share this Question
Share on Google+
9 Replies


P: n/a
On Sun, 10 Jun 2007 03:58:44 -0700, bullockbefriending bard wrote:
i have a large collection of python objects, each of which contains an
integer 6-tuple as part of its data payload. what i need to be able to
do is select only those objects which meet a simple tuple element
wildcard matching criterion. e.g. given the following python objects:

object A includes tuple (1,2,3,4,5,6)
object B includes tuple (1,4,4,4,11,1)
object C includes tuple (1,3,9,1,1,1)

all tuples are unique. for what it's worth, the values in each field
are independent of the other fields and range from 1 to 14. although
'large', my collection is sparse cf. the 14^6 possible tuples.

i want to search on *one only* tuple field/value. if my search
criterion is (*,*,*,4,*,*), then i want to find object A and object B.
if (1,*,*,*,*,*), i want to find objects A, B, and C, etc. i will only
ever specify an integer match for one tuple field.

i can think of some naive approaches, but is there an elegant way to
do this?
Instead of passing a wild-card tuple like (*,*,*,4,*,*) simply pass the
integer you want to match and the position you want to match it in.

As a generator:
def matcher(list_of_objects, what, where):
for obj in list_of_objects:
if obj.data[where] == what:
yield obj
As a generator expression:

(obj for obj in list_of_objects if obj.data[what] == where)

--
Steven.

Jun 10 '07 #2

P: n/a
bullockbefriending bard schrieb:
i have a large collection of python objects, each of which contains an
integer 6-tuple as part of its data payload. what i need to be able to
do is select only those objects which meet a simple tuple element
wildcard matching criterion. e.g. given the following python objects:

object A includes tuple (1,2,3,4,5,6)
object B includes tuple (1,4,4,4,11,1)
object C includes tuple (1,3,9,1,1,1)

all tuples are unique. for what it's worth, the values in each field
are independent of the other fields and range from 1 to 14. although
'large', my collection is sparse cf. the 14^6 possible tuples.

i want to search on *one only* tuple field/value. if my search
criterion is (*,*,*,4,*,*), then i want to find object A and object B.
if (1,*,*,*,*,*), i want to find objects A, B, and C, etc. i will only
ever specify an integer match for one tuple field.

i can think of some naive approaches, but is there an elegant way to
do this?
Depends on what you find elegant. Are the criteria runtime-specified,
and is anything other than the * allowerd, e.g. [1,3,8]? IMHO the best
thing is to create a filter-function that you then use to... filter :)

Like this:

def create_filter_predicate(criteria):
"""
criteria is an iterable containing either
an '*' or a list of comma-separated
integers
"""
sub_preds = []
for i, sub_crit in enumerate(criteria):
if sub_crit == "*":
continue
matching_set = set(int(n) for n in sub_crit.split(","))
sub_preds.append((i, matching_set))
def predicate(o):
t = o.my_tuple
for i, ms in sub_preds:
if not t[i] in ms:
return False
return True
return predicate

Diez
Jun 10 '07 #3

P: n/a
Diez B. Roggisch schrieb:
bullockbefriending bard schrieb:
>i have a large collection of python objects, each of which contains an
integer 6-tuple as part of its data payload. what i need to be able to
do is select only those objects which meet a simple tuple element
wildcard matching criterion. e.g. given the following python objects:

object A includes tuple (1,2,3,4,5,6)
object B includes tuple (1,4,4,4,11,1)
object C includes tuple (1,3,9,1,1,1)

all tuples are unique. for what it's worth, the values in each field
are independent of the other fields and range from 1 to 14. although
'large', my collection is sparse cf. the 14^6 possible tuples.

i want to search on *one only* tuple field/value. if my search
criterion is (*,*,*,4,*,*), then i want to find object A and object B.
if (1,*,*,*,*,*), i want to find objects A, B, and C, etc. i will only
ever specify an integer match for one tuple field.

i can think of some naive approaches, but is there an elegant way to
do this?

Depends on what you find elegant. Are the criteria runtime-specified,
and is anything other than the * allowerd, e.g. [1,3,8]? IMHO the best
thing is to create a filter-function that you then use to... filter :)

Like this:

def create_filter_predicate(criteria):
"""
criteria is an iterable containing either
an '*' or a list of comma-separated
integers
"""
sub_preds = []
for i, sub_crit in enumerate(criteria):
if sub_crit == "*":
continue
matching_set = set(int(n) for n in sub_crit.split(","))
sub_preds.append((i, matching_set))
def predicate(o):
t = o.my_tuple
for i, ms in sub_preds:
if not t[i] in ms:
return False
return True
return predicate
Obviously the docs are faulty - the criteria looks like this:

['*', '1,2,3']

But I think you can get the gist of it - regardless on how the actual
criteria are entered.

Diez
Jun 10 '07 #4

P: n/a
Instead of passing a wild-card tuple like (*,*,*,4,*,*) simply pass the
integer you want to match and the position you want to match it in.
for sure. that was more for expository purpose rather than how i was
planning to go about it.

As a generator expression:

(obj for obj in list_of_objects if obj.data[what] == where)
above or equivalent list comprehension was what i had in mind as far
as linear search goes. and scanning the list like this will most
likely be 'good enough' performance-wise. however, partly just out of
curiosity, i was wondering if there is some kind of data structure
which might let me find all the matches a bit faster.

Jun 10 '07 #5

P: n/a
bullockbefriending bard schrieb:
>Instead of passing a wild-card tuple like (*,*,*,4,*,*) simply pass the
integer you want to match and the position you want to match it in.

for sure. that was more for expository purpose rather than how i was
planning to go about it.

>As a generator expression:

(obj for obj in list_of_objects if obj.data[what] == where)

above or equivalent list comprehension was what i had in mind as far
as linear search goes. and scanning the list like this will most
likely be 'good enough' performance-wise. however, partly just out of
curiosity, i was wondering if there is some kind of data structure
which might let me find all the matches a bit faster.
You can of course create a tree from the tuples, where the first level
of nodes corresponds to the first attribute of the tuple and so forth.

There are certainly cases where the speedup is tremendous - think of a
single integer in the first criteria - but then the overall performance
depends on the real-live queries. If lot's of wildcards are used, you
might end up slower if the tree-walk takes more time than the
C-implemented list-iteration will cost you.

Diez
Jun 10 '07 #6

P: n/a
On Jun 10, 8:58 pm, bullockbefriending bard <kinch1...@gmail.com>
wrote:
i have a large collection of python objects, each of which contains an
integer 6-tuple as part of its data payload. what i need to be able to
do is select only those objects which meet a simple tuple element
wildcard matching criterion. e.g. given the following python objects:

object A includes tuple (1,2,3,4,5,6)
object B includes tuple (1,4,4,4,11,1)
object C includes tuple (1,3,9,1,1,1)

all tuples are unique. for what it's worth, the values in each field
are independent of the other fields and range from 1 to 14. although
'large', my collection is sparse cf. the 14^6 possible tuples.

i want to search on *one only* tuple field/value. if my search
criterion is (*,*,*,4,*,*), then i want to find object A and object B.
if (1,*,*,*,*,*), i want to find objects A, B, and C, etc. i will only
ever specify an integer match for one tuple field.

i can think of some naive approaches, but is there an elegant way to
do this?
You are going to have to tell us how many is in a "large" collection,
how often you will do this query, how much memory you have to spare,
how fast you want the answer back, whether you want the answer as a
generator, or in a list/tuple/set/whatever -- or you are going to have
to do a fair bit of prototyping and benchmarking.

Steven has given you one end of the spectrum: a naive serial scan. Now
I'll give you another end: a naive pre-build all possible answers
method:

[quite untested]
# build 14x6 array of empty sets
answer = [[set() for what in range(14)] for slot in range(6)]
# populate the sucker
for obj in obj_list:
for slot in range(6):
answer[slot][obj.data[slot]-1].add(obj)

Later, the answer to 'Which objects have the value 11 in slot 3?' is
answer[3][11-1]

HTH,
John

Jun 10 '07 #7

P: n/a
quite so, i rephrased docstring to be:

"""criteria is an iterable containing either '*' instances or strings
of comma-separated integers. e.g. ['*','1,2,3', '11,12']"""

thanks very much for the idea! upon further reflection, this seems to
be a more elegant solution for my case than the ad-hoc generator or
list comprehension approach. this is because i *do* have to filter
data based on multiple single field criteria and i know all of these
criteria at the same time - so it makes a lot of sense to do as you
have done and then only do one filter operation to pull out all the
objects i am interested in.

Jun 10 '07 #8

P: n/a
There are certainly cases where the speedup is tremendous - think of a
single integer in the first criteria - but then the overall performance
depends on the real-live queries. If lot's of wildcards are used, you
might end up slower if the tree-walk takes more time than the
C-implemented list-iteration will cost you.
hopefully, won't need to do this then. i'll test the filter approach
tonight. generally speaking, going to be mostly wildcards most of the
time.

Jun 10 '07 #9

P: n/a
On Jun 10, 10:32 pm, bullockbefriending bard <kinch1...@gmail.com>
wrote:
quite so, i rephrased docstring to be:

"""criteria is an iterable containing either '*' instances or strings
of comma-separated integers. e.g. ['*','1,2,3', '11,12']"""

thanks very much for the idea! upon further reflection, this seems to
be a more elegant solution for my case than the ad-hoc generator or
list comprehension approach. this is because i *do* have to filter
data based on multiple single field criteria and i know all of these
criteria at the same time - so it makes a lot of sense to do as you
have done and then only do one filter operation to pull out all the
objects i am interested in.
"""i want to search on *one only* tuple field/value."""
"""i will only ever specify an integer match for one tuple field."""
"""the cheque's in the mail"""
"""of course i'll still love you in the morning"""

So augment my pre-built approach by intersection (or union? your
wording is somewhat vague) of the selected answer sets.
Jun 10 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.