432,549 Members | 1,717 Online + Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,549 IT Pros & Developers. It's quick & easy.

# An efficient, pythonic way to calculate result sets

 P: n/a Hello everyone, I've got a little issue, both programming and performance-wise. I have a set, containing objects that refer to other sets. For example, in a simple notation: (, ) (or in a more object-like display: set(obj1.choices=set(a, b, c) ). There may be obj1..objN objects in the outer set, and the amount of items in the choices sets could also reach a high number. The objects in the choices sets might overlap. Now I want to have all possible combinations, like this: (a, d), (b, d), (c, d), (a, e), (b, e), (c, e). However, I've got a few catches that make an already (icky) recursive function worse to use. First of all, I want to be able to only choose things so that the outer 'result sets' have the same length. For example, if you'd have (, ), you might pick (a, a) with a simple algorythm, the basic behaviour of sets reducing it to (a) and thus having an improper length. I could add yet another loop after calculating everything to throw out any result sets with the improper length, but that would be highly inefficient. Second, I'd hope to be able to say that objX should be assumed to have made the choice z. In the first example I mentioned, if I said that 'obj1 == a', the only result sets that would come out would be (a, d) and (a, e). I've been toying with this problem for a while, but I've found out it quickly gets slow, so I hope some people here could find a way to write code like this that is efficient (and hopefully not rely on recursion and 'fix up' loops like I've got mockups with right now). Thank you for any suggestions you can offer. Oct 25 '07 #1
6 Replies

 P: n/a On Oct 25, 10:31 am, happyhon...@gmail.com wrote: Hello everyone, I've got a little issue, both programming and performance-wise. I have a set, containing objects that refer to other sets. For example, in a simple notation: (, ) (or in a more object-like display: set(obj1.choices=set(a, b, c) ). There may be obj1..objN objects in the outer set, and the amount of items in the choices sets could also reach a high number. The objects in the choices sets might overlap. Now I want to have all possible combinations, like this: (a, d), (b, d), (c, d), (a, e), (b, e), (c, e). However, I've got a few catches that make an already (icky) recursive function worse to use. First of all, I want to be able to only choose things so that the outer 'result sets' have the same length. For example, if you'd have (, ), you might pick (a, a) with a simple algorythm, the basic behaviour of sets reducing it to (a) and thus having an improper length. I could add yet another loop after calculating everything to throw out any result sets with the improper length, but that would be highly inefficient. Does this do what you want? result_set = set([]) seta = set(['a','b','c','d','e']) setb = set(['a','c','f','g','h']) for i in seta: temp1 = setb.difference(set([i])) for j in temp1: result_set.add(tuple(set([i,j]))) for i in result_set: print i I figure there should be 4+5+3+5+5 results. No ('a'), no ('c'). Has ('a','c') but not ('c','a'). ## ('c', 'g') ## ('a', 'd') ## ('h', 'e') ## ('a', 'b') ## ('c', 'f') ## ('e', 'g') ## ('c', 'b') ## ('d', 'f') ## ('a', 'g') ## ('a', 'h') ## ('c', 'e') ## ('e', 'f') ## ('d', 'g') ## ('h', 'b') ## ('a', 'f') ## ('b', 'f') ## ('c', 'd') ## ('h', 'c') ## ('a', 'c') ## ('b', 'g') ## ('a', 'e') ## ('h', 'd') > Second, I'd hope to be able to say that objX should be assumed to have made the choice z. In the first example I mentioned, if I said that 'obj1 == a', the only result sets that would come out would be (a, d) and (a, e). Like this? result_set = set([]) seta = set(['a','b','c','d','e']) setb = set(['a','c','f','g','h']) target = 'a' for i in seta: temp1 = setb.difference(set([i])) for j in temp1: temp2 = set([i,j]) if target in temp2: result_set.add(tuple(temp2)) for i in result_set: print i ## ('a', 'f') ## ('a', 'g') ## ('a', 'd') ## ('a', 'e') ## ('a', 'h') ## ('a', 'b') ## ('a', 'c') > I've been toying with this problem for a while, but I've found out it quickly gets slow, so I hope some people here could find a way to write code like this that is efficient (and hopefully not rely on recursion and 'fix up' loops like I've got mockups with right now). Thank you for any suggestions you can offer. Oct 25 '07 #2

 P: n/a On Oct 25, 8:44 pm, "mensana...@aol.com" , ) (or in a more object-like display: set(obj1.choices=set(a, b, c) ). There may be obj1..objN objects in the outer set, and the amount of items in the choices sets could also reach a high number. The objects in the choices sets might overlap. Now I want to have all possible combinations, like this: (a, d), (b, d), (c, d), (a, e), (b, e), (c, e). However, I've got a few catches that make an already (icky) recursive function worse to use. First of all, I want to be able to only choose things so that the outer 'result sets' have the same length. For example, if you'd have (, ), you might pick (a, a) with a simple algorythm, the basic behaviour of sets reducing it to (a) and thus having an improper length. I could add yet another loop after calculating everything to throw out any result sets with the improper length, but that would be highly inefficient. Does this do what you want? result_set = set([]) seta = set(['a','b','c','d','e']) setb = set(['a','c','f','g','h']) for i in seta: temp1 = setb.difference(set([i])) for j in temp1: result_set.add(tuple(set([i,j]))) for i in result_set: print i I figure there should be 4+5+3+5+5 results. No ('a'), no ('c'). Has ('a','c') but not ('c','a'). ## ('c', 'g') ## ('a', 'd') ## ('h', 'e') ## ('a', 'b') ## ('c', 'f') ## ('e', 'g') ## ('c', 'b') ## ('d', 'f') ## ('a', 'g') ## ('a', 'h') ## ('c', 'e') ## ('e', 'f') ## ('d', 'g') ## ('h', 'b') ## ('a', 'f') ## ('b', 'f') ## ('c', 'd') ## ('h', 'c') ## ('a', 'c') ## ('b', 'g') ## ('a', 'e') ## ('h', 'd') Second, I'd hope to be able to say that objX should be assumed to have made the choice z. In the first example I mentioned, if I said that 'obj1 == a', the only result sets that would come out would be (a, d) and (a, e). Like this? result_set = set([]) seta = set(['a','b','c','d','e']) setb = set(['a','c','f','g','h']) target = 'a' for i in seta: temp1 = setb.difference(set([i])) for j in temp1: temp2 = set([i,j]) if target in temp2: result_set.add(tuple(temp2)) for i in result_set: print i ## ('a', 'f') ## ('a', 'g') ## ('a', 'd') ## ('a', 'e') ## ('a', 'h') ## ('a', 'b') ## ('a', 'c') I've been toying with this problem for a while, but I've found out it quickly gets slow, so I hope some people here could find a way to write code like this that is efficient (and hopefully not rely on recursion and 'fix up' loops like I've got mockups with right now). Thank you for any suggestions you can offer. Almost, except for the one big issue that is missing: scalability. I don't have a seta and setb, but rather set1..setN, with N varying while the script is running. That's pretty much where the pain comes in. I have some very (butt-ugly) code that works right now, but it's more a hack than pretty (or efficient) code. Thank you for your effort though, since your usage of set.difference() gave me an idea on another issue I was wrestling with. Oct 25 '07 #3

 P: n/a On Oct 25, 8:31 am, happyhon...@gmail.com wrote: I've got a little issue, both programming and performance-wise. I have a set, containing objects that refer to other sets. For example, in a simple notation: (, ) (or in a more object-like display: set(obj1.choices=set(a, b, c) ). There may be obj1..objN objects in the outer set, and the amount of items in the choices sets could also reach a high number. The objects in the choices sets might overlap. Now I want to have all possible combinations, like this: (a, d), (b, d), (c, d), (a, e), (b, e), (c, e). However, I've got a few catches that make an already (icky) recursive function worse to use. First of all, I want to be able to only choose things so that the outer 'result sets' have the same length. For example, if you'd have (, ), you might pick (a, a) with a simple algorythm, the basic behaviour of sets reducing it to (a) and thus having an improper length. I could add yet another loop after calculating everything to throw out any result sets with the improper length, but that would be highly inefficient. Second, I'd hope to be able to say that objX should be assumed to have made the choice z. In the first example I mentioned, if I said that 'obj1 == a', the only result sets that would come out would be (a, d) and (a, e). def cross_nodups(*args): 'Cross product after eliminating repeat elements, keeping constant size' ans = [[]] for arg in args: ans = [x+[y] for x in ans for y in arg if y not in x] return ans def choose_first(obj1, *args): 'Assume a choice of a first object' return cross_nodups(obj1, *args[1:]) >>print cross_nodups('ab', 'acd', 'fg') [['a', 'c', 'f'], ['a', 'c', 'g'], ['a', 'd', 'f'], ['a', 'd', 'g'], ['b', 'a', 'f'], ['b', 'a', 'g'], ['b', 'c', 'f'], ['b', 'c', 'g'], ['b', 'd', 'f'], ['b', 'd', 'g']] >>print choose_first('a', s1,s2,s3) [['a', 'c', 'f'], ['a', 'c', 'g'], ['a', 'd', 'f'], ['a', 'd', 'g']] (and hopefully not rely on recursion Easy exercise of transforming recursion to iteration left to the reader. Raymond Oct 26 '07 #4

 P: n/a Easy exercise of transforming recursion to iteration left to the reader. Ack! That part was already done. Raymond Oct 26 '07 #5

 P: n/a On Oct 26, 2:33 am, Raymond Hettinger , ) (or in a more object-like display: set(obj1.choices=set(a, b, c) ). There may be obj1..objN objects in the outer set, and the amount of items in the choices sets could also reach a high number. The objects in the choices sets might overlap. Now I want to have all possible combinations, like this: (a, d), (b, d), (c, d), (a, e), (b, e), (c, e). However, I've got a few catches that make an already (icky) recursive function worse to use. First of all, I want to be able to only choose things so that the outer 'result sets' have the same length. For example, if you'd have (, ), you might pick (a, a) with a simple algorythm, the basic behaviour of sets reducing it to (a) and thus having an improper length. I could add yet another loop after calculating everything to throw out any result sets with the improper length, but that would be highly inefficient. Second, I'd hope to be able to say that objX should be assumed to have made the choice z. In the first example I mentioned, if I said that 'obj1 == a', the only result sets that would come out would be (a, d) and (a, e). def cross_nodups(*args): 'Cross product after eliminating repeat elements, keeping constant size' ans = [[]] for arg in args: ans = [x+[y] for x in ans for y in arg if y not in x] return ans def choose_first(obj1, *args): 'Assume a choice of a first object' return cross_nodups(obj1, *args[1:]) >print cross_nodups('ab', 'acd', 'fg') [['a', 'c', 'f'], ['a', 'c', 'g'], ['a', 'd', 'f'], ['a', 'd', 'g'], ['b', 'a', 'f'], ['b', 'a', 'g'], ['b', 'c', 'f'], ['b', 'c', 'g'], ['b', 'd', 'f'], ['b', 'd', 'g']] >print choose_first('a', s1,s2,s3) [['a', 'c', 'f'], ['a', 'c', 'g'], ['a', 'd', 'f'], ['a', 'd', 'g']] (and hopefully not rely on recursion Easy exercise of transforming recursion to iteration left to the reader. Raymond Wonderful, thank you Raymond! I only had one problem with it still (damn I'm picky) and despite prodding around in that generator expression which absolutely defies my logic, I've been able to get it working. In my situation, the choices are always sets rather than strings. While it shouldn't make a difference (both are iterable), my sets contain strings.. which end up being cut in pieces and matched seperately with everything. I've solved this as follows: def cross_nodups(*args): 'Cross product after eliminating repeat elements, keeping constant size' ans = [[]] for arg in args: ans = [x+[y] for x in ans for y in arg if y not in x] return ans def choose_first(obj1, *args): 'Assume a choice of a first object' return cross_nodups(frozenset((obj1,)), *args[1:]) Gives results: >>choose_first("H", set(["H", "R"]), set(["H", "R", "K", "S"]), set(["H", "R", "K", "S"])) set([frozenset(['H', 'S', 'R']), frozenset(['H', 'K', 'R']), frozenset(['H', 'K', 'S'])]) >>choose_first("HA", set(["HA", "RA"]), set(["HA", "RA", "KA", "SA"]), set(["HA", "RA", "KA", "SA"])) set([frozenset(['KA', 'HA', 'RA']), frozenset(['SA', 'HA', 'RA']), frozenset(['KA', 'HA', 'SA'])]) rather than >>choose_first("HA", set(["HA", "RA"]), set(["HA", "RA", "KA", "SA"]), set(["HA", "RA", "KA", "SA"])) set([frozenset(['A', 'SA', 'HA']), frozenset(['H', 'KA', 'SA']), frozenset(['H', 'KA', 'RA']), frozenset(['A', 'KA', 'HA']), frozenset(['H', 'HA', 'RA']), frozenset(['H', 'SA', 'HA']), frozenset(['A', 'SA', 'RA']), frozenset(['A', 'KA', 'SA']), frozenset(['A', 'HA', 'RA']), frozenset(['H', 'KA', 'HA']), frozenset(['H', 'SA', 'RA']), frozenset(['A', 'KA', 'RA'])]) Now, this works great! Although I am still torn apart on a possible optimization (that I'm not sure is an optimization yet): - should I use the original cross_nodups() function you thought up and convert all of its contents to a proper set of sets (a secondary loop afterwards). This seems more efficient since I would not be doing a lot of frozenset() and list() conversions while looping. - however, the list() and frozenset() construct should reduce looping quite a bit and essentially prevent a lot of iterations that would be created due to the doubles in the original functions output. Of course that depends on the contents, so for simplicities sake, say that the algorythm is to run with 5-10 choices to make, each choice being out of averaged 6 items that may or may not conflict with other items. Again, thank you very much Raymond, your snippet makes my monstrosity quite ready for /dev/null. :) Oct 26 '07 #6

 P: n/a def cross_nodups(*args): 'Cross product after eliminating repeat elements, keeping constant size' ans = [[]] for arg in args: ans = [x+[y] for x in ans for y in arg if y not in x] return ans def choose_first(obj1, *args): 'Assume a choice of a first object' return cross_nodups(frozenset((obj1,)), *args[1:]) Oops, crap, I pasted the unchanged cross_nodups() you wrote. My adjusted variety: def cross_nodups(*args): 'Cross product after eliminating repeat elements, keeping constant size' ans = [[]] for arg in args: ans = [frozenset(list(x)+[y]) for x in ans for y in arg if y not in x] return set(ans) Anyhow, thank you! :) Oct 26 '07 #7

### This discussion thread is closed

Replies have been disabled for this discussion. 