An efficient, pythonic way to calculate result sets

happyhondje

Hello everyone,

I've got a little issue, both programming and performance-wise. I have
a set, containing objects that refer to other sets. For example, in a
simple notation: (<a, b, c>, <d, e>) (or in a more object-like
display: set(obj1.choices=set(a, b, c) ). There may be obj1..objN
objects in the outer set, and the amount of items in the choices sets
could also reach a high number. The objects in the choices sets might
overlap.

Now I want to have all possible combinations, like this: (a, d), (b,
d), (c, d), (a, e), (b, e), (c, e).

However, I've got a few catches that make an already (icky) recursive
function worse to use.

First of all, I want to be able to only choose things so that the
outer 'result sets' have the same length. For example, if you'd have
(<a, b>, <a, c>), you might pick (a, a) with a simple algorythm, the
basic behaviour of sets reducing it to (a) and thus having an improper
length. I could add yet another loop after calculating everything to
throw out any result sets with the improper length, but that would be
highly inefficient.

Second, I'd hope to be able to say that objX should be assumed to have
made the choice z. In the first example I mentioned, if I said that
'obj1 == a', the only result sets that would come out would be (a, d)
and (a, e).

I've been toying with this problem for a while, but I've found out it
quickly gets slow, so I hope some people here could find a way to
write code like this that is efficient (and hopefully not rely on
recursion and 'fix up' loops like I've got mockups with right now).

Thank you for any suggestions you can offer.

Oct 25 '07 #1

Subscribe Post Reply

1354

mensanator

On Oct 25, 10:31 am, happyhon...@gmail.com wrote:

Hello everyone,

I've got a little issue, both programming and performance-wise. I have
a set, containing objects that refer to other sets. For example, in a
simple notation: (<a, b, c>, <d, e>) (or in a more object-like
display: set(obj1.choices=set(a, b, c) ). There may be obj1..objN
objects in the outer set, and the amount of items in the choices sets
could also reach a high number. The objects in the choices sets might
overlap.

Now I want to have all possible combinations, like this: (a, d), (b,
d), (c, d), (a, e), (b, e), (c, e).

However, I've got a few catches that make an already (icky) recursive
function worse to use.

First of all, I want to be able to only choose things so that the
outer 'result sets' have the same length. For example, if you'd have
(<a, b>, <a, c>), you might pick (a, a) with a simple algorythm, the
basic behaviour of sets reducing it to (a) and thus having an improper
length. I could add yet another loop after calculating everything to
throw out any result sets with the improper length, but that would be
highly inefficient.

Does this do what you want?

result_set = set([])
seta = set(['a','b','c','d','e'])
setb = set(['a','c','f','g','h'])
for i in seta:
temp1 = setb.difference(set([i]))
for j in temp1:
result_set.add(tuple(set([i,j])))
for i in result_set:
print i

I figure there should be 4+5+3+5+5 results.
No ('a'), no ('c'). Has ('a','c') but not ('c','a').

## ('c', 'g')
## ('a', 'd')
## ('h', 'e')
## ('a', 'b')
## ('c', 'f')
## ('e', 'g')
## ('c', 'b')
## ('d', 'f')
## ('a', 'g')
## ('a', 'h')
## ('c', 'e')
## ('e', 'f')
## ('d', 'g')
## ('h', 'b')
## ('a', 'f')
## ('b', 'f')
## ('c', 'd')
## ('h', 'c')
## ('a', 'c')
## ('b', 'g')
## ('a', 'e')
## ('h', 'd')

>
Second, I'd hope to be able to say that objX should be assumed to have
made the choice z. In the first example I mentioned, if I said that
'obj1 == a', the only result sets that would come out would be (a, d)
and (a, e).

Like this?

result_set = set([])
seta = set(['a','b','c','d','e'])
setb = set(['a','c','f','g','h'])
target = 'a'
for i in seta:
temp1 = setb.difference(set([i]))
for j in temp1:
temp2 = set([i,j])
if target in temp2:
result_set.add(tuple(temp2))
for i in result_set:
print i

## ('a', 'f')
## ('a', 'g')
## ('a', 'd')
## ('a', 'e')
## ('a', 'h')
## ('a', 'b')
## ('a', 'c')

>
I've been toying with this problem for a while, but I've found out it
quickly gets slow, so I hope some people here could find a way to
write code like this that is efficient (and hopefully not rely on
recursion and 'fix up' loops like I've got mockups with right now).

Thank you for any suggestions you can offer.

Oct 25 '07 #2

happyhondje

On Oct 25, 8:44 pm, "mensana...@aol.com" <mensana...@aol.comwrote:

On Oct 25, 10:31 am, happyhon...@gmail.com wrote:

Hello everyone,

I've got a little issue, both programming and performance-wise. I have
a set, containing objects that refer to other sets. For example, in a
simple notation: (<a, b, c>, <d, e>) (or in a more object-like
display: set(obj1.choices=set(a, b, c) ). There may be obj1..objN
objects in the outer set, and the amount of items in the choices sets
could also reach a high number. The objects in the choices sets might
overlap.

Now I want to have all possible combinations, like this: (a, d), (b,
d), (c, d), (a, e), (b, e), (c, e).

However, I've got a few catches that make an already (icky) recursive
function worse to use.

First of all, I want to be able to only choose things so that the
outer 'result sets' have the same length. For example, if you'd have
(<a, b>, <a, c>), you might pick (a, a) with a simple algorythm, the
basic behaviour of sets reducing it to (a) and thus having an improper
length. I could add yet another loop after calculating everything to
throw out any result sets with the improper length, but that would be
highly inefficient.

Does this do what you want?

result_set = set([])
seta = set(['a','b','c','d','e'])
setb = set(['a','c','f','g','h'])
for i in seta:
temp1 = setb.difference(set([i]))
for j in temp1:
result_set.add(tuple(set([i,j])))
for i in result_set:
print i

I figure there should be 4+5+3+5+5 results.
No ('a'), no ('c'). Has ('a','c') but not ('c','a').

## ('c', 'g')
## ('a', 'd')
## ('h', 'e')
## ('a', 'b')
## ('c', 'f')
## ('e', 'g')
## ('c', 'b')
## ('d', 'f')
## ('a', 'g')
## ('a', 'h')
## ('c', 'e')
## ('e', 'f')
## ('d', 'g')
## ('h', 'b')
## ('a', 'f')
## ('b', 'f')
## ('c', 'd')
## ('h', 'c')
## ('a', 'c')
## ('b', 'g')
## ('a', 'e')
## ('h', 'd')

Second, I'd hope to be able to say that objX should be assumed to have
made the choice z. In the first example I mentioned, if I said that
'obj1 == a', the only result sets that would come out would be (a, d)
and (a, e).

Like this?

result_set = set([])
seta = set(['a','b','c','d','e'])
setb = set(['a','c','f','g','h'])
target = 'a'
for i in seta:
temp1 = setb.difference(set([i]))
for j in temp1:
temp2 = set([i,j])
if target in temp2:
result_set.add(tuple(temp2))
for i in result_set:
print i

## ('a', 'f')
## ('a', 'g')
## ('a', 'd')
## ('a', 'e')
## ('a', 'h')
## ('a', 'b')
## ('a', 'c')

I've been toying with this problem for a while, but I've found out it
quickly gets slow, so I hope some people here could find a way to
write code like this that is efficient (and hopefully not rely on
recursion and 'fix up' loops like I've got mockups with right now).

Thank you for any suggestions you can offer.

Almost, except for the one big issue that is missing: scalability. I
don't have a seta and setb, but rather set1..setN, with N varying
while the script is running. That's pretty much where the pain comes
in. I have some very (butt-ugly) code that works right now, but it's
more a hack than pretty (or efficient) code.

Thank you for your effort though, since your usage of set.difference()
gave me an idea on another issue I was wrestling with.

Oct 25 '07 #3

Raymond Hettinger

On Oct 25, 8:31 am, happyhon...@gmail.com wrote:

I've got a little issue, both programming and performance-wise. I have
a set, containing objects that refer to other sets. For example, in a
simple notation: (<a, b, c>, <d, e>) (or in a more object-like
display: set(obj1.choices=set(a, b, c) ). There may be obj1..objN
objects in the outer set, and the amount of items in the choices sets
could also reach a high number. The objects in the choices sets might
overlap.

Now I want to have all possible combinations, like this: (a, d), (b,
d), (c, d), (a, e), (b, e), (c, e).

However, I've got a few catches that make an already (icky) recursive
function worse to use.

First of all, I want to be able to only choose things so that the
outer 'result sets' have the same length. For example, if you'd have
(<a, b>, <a, c>), you might pick (a, a) with a simple algorythm, the
basic behaviour of sets reducing it to (a) and thus having an improper
length. I could add yet another loop after calculating everything to
throw out any result sets with the improper length, but that would be
highly inefficient.

Second, I'd hope to be able to say that objX should be assumed to have
made the choice z. In the first example I mentioned, if I said that
'obj1 == a', the only result sets that would come out would be (a, d)
and (a, e).

def cross_nodups(*args):
'Cross product after eliminating repeat elements, keeping constant
size'
ans = [[]]
for arg in args:
ans = [x+[y] for x in ans for y in arg if y not in x]
return ans

def choose_first(obj1, *args):
'Assume a choice of a first object'
return cross_nodups(obj1, *args[1:])

>>print cross_nodups('ab', 'acd', 'fg')

[['a', 'c', 'f'], ['a', 'c', 'g'], ['a', 'd', 'f'], ['a', 'd', 'g'],
['b', 'a', 'f'], ['b', 'a', 'g'], ['b', 'c', 'f'], ['b', 'c', 'g'],
['b', 'd', 'f'], ['b', 'd', 'g']]

>>print choose_first('a', s1,s2,s3)

[['a', 'c', 'f'], ['a', 'c', 'g'], ['a', 'd', 'f'], ['a', 'd', 'g']]

(and hopefully not rely on
recursion

Easy exercise of transforming recursion to iteration left to the
reader.
Raymond

Oct 26 '07 #4

Raymond Hettinger

Easy exercise of transforming recursion to iteration left to the

reader.

Ack! That part was already done.
Raymond

Oct 26 '07 #5

happyhondje

On Oct 26, 2:33 am, Raymond Hettinger <pyt...@rcn.comwrote:

On Oct 25, 8:31 am, happyhon...@gmail.com wrote:

I've got a little issue, both programming and performance-wise. I have
a set, containing objects that refer to other sets. For example, in a
simple notation: (<a, b, c>, <d, e>) (or in a more object-like
display: set(obj1.choices=set(a, b, c) ). There may be obj1..objN
objects in the outer set, and the amount of items in the choices sets
could also reach a high number. The objects in the choices sets might
overlap.

Now I want to have all possible combinations, like this: (a, d), (b,
d), (c, d), (a, e), (b, e), (c, e).

However, I've got a few catches that make an already (icky) recursive
function worse to use.

First of all, I want to be able to only choose things so that the
outer 'result sets' have the same length. For example, if you'd have
(<a, b>, <a, c>), you might pick (a, a) with a simple algorythm, the
basic behaviour of sets reducing it to (a) and thus having an improper
length. I could add yet another loop after calculating everything to
throw out any result sets with the improper length, but that would be
highly inefficient.

Second, I'd hope to be able to say that objX should be assumed to have
made the choice z. In the first example I mentioned, if I said that
'obj1 == a', the only result sets that would come out would be (a, d)
and (a, e).

def cross_nodups(*args):
'Cross product after eliminating repeat elements, keeping constant
size'
ans = [[]]
for arg in args:
ans = [x+[y] for x in ans for y in arg if y not in x]
return ans

def choose_first(obj1, *args):
'Assume a choice of a first object'
return cross_nodups(obj1, *args[1:])

>print cross_nodups('ab', 'acd', 'fg')

[['a', 'c', 'f'], ['a', 'c', 'g'], ['a', 'd', 'f'], ['a', 'd', 'g'],
['b', 'a', 'f'], ['b', 'a', 'g'], ['b', 'c', 'f'], ['b', 'c', 'g'],
['b', 'd', 'f'], ['b', 'd', 'g']]

>print choose_first('a', s1,s2,s3)

[['a', 'c', 'f'], ['a', 'c', 'g'], ['a', 'd', 'f'], ['a', 'd', 'g']]

(and hopefully not rely on
recursion

Easy exercise of transforming recursion to iteration left to the
reader.

Raymond

Wonderful, thank you Raymond!

I only had one problem with it still (damn I'm picky) and despite
prodding around in that generator expression which absolutely defies
my logic, I've been able to get it working.

In my situation, the choices are always sets rather than strings.
While it shouldn't make a difference (both are iterable), my sets
contain strings.. which end up being cut in pieces and matched
seperately with everything. I've solved this as follows:

def cross_nodups(*args):
'Cross product after eliminating repeat elements, keeping constant
size'
ans = [[]]
for arg in args:
ans = [x+[y] for x in ans for y in arg if y not in x]
return ans

def choose_first(obj1, *args):
'Assume a choice of a first object'
return cross_nodups(frozenset((obj1,)), *args[1:])

Gives results:

>>choose_first("H", set(["H", "R"]), set(["H", "R", "K", "S"]), set(["H", "R", "K", "S"]))

set([frozenset(['H', 'S', 'R']), frozenset(['H', 'K', 'R']),
frozenset(['H', 'K', 'S'])])

>>choose_first("HA", set(["HA", "RA"]), set(["HA", "RA", "KA", "SA"]), set(["HA", "RA", "KA", "SA"]))

set([frozenset(['KA', 'HA', 'RA']), frozenset(['SA', 'HA', 'RA']),
frozenset(['KA', 'HA', 'SA'])])

rather than

>>choose_first("HA", set(["HA", "RA"]), set(["HA", "RA", "KA", "SA"]), set(["HA", "RA", "KA", "SA"]))

set([frozenset(['A', 'SA', 'HA']), frozenset(['H', 'KA', 'SA']),
frozenset(['H', 'KA', 'RA']), frozenset(['A', 'KA', 'HA']),
frozenset(['H', 'HA', 'RA']), frozenset(['H', 'SA', 'HA']),
frozenset(['A', 'SA', 'RA']), frozenset(['A', 'KA', 'SA']),
frozenset(['A', 'HA', 'RA']), frozenset(['H', 'KA', 'HA']),
frozenset(['H', 'SA', 'RA']), frozenset(['A', 'KA', 'RA'])])

Now, this works great! Although I am still torn apart on a possible
optimization (that I'm not sure is an optimization yet):

- should I use the original cross_nodups() function you thought up and
convert all of its contents to a proper set of sets (a secondary loop
afterwards). This seems more efficient since I would not be doing a
lot of frozenset() and list() conversions while looping.
- however, the list() and frozenset() construct should reduce looping
quite a bit and essentially prevent a lot of iterations that would be
created due to the doubles in the original functions output.

Of course that depends on the contents, so for simplicities sake, say
that the algorythm is to run with 5-10 choices to make, each choice
being out of averaged 6 items that may or may not conflict with other
items.

Again, thank you very much Raymond, your snippet makes my monstrosity
quite ready for /dev/null. :)

Oct 26 '07 #6

happyhondje

def cross_nodups(*args):

'Cross product after eliminating repeat elements, keeping constant
size'
ans = [[]]
for arg in args:
ans = [x+[y] for x in ans for y in arg if y not in x]
return ans

def choose_first(obj1, *args):
'Assume a choice of a first object'
return cross_nodups(frozenset((obj1,)), *args[1:])

Oops, crap, I pasted the unchanged cross_nodups() you wrote. My
adjusted variety:

def cross_nodups(*args):
'Cross product after eliminating repeat elements, keeping constant
size'
ans = [[]]
for arg in args:
ans = [frozenset(list(x)+[y]) for x in ans for y in arg if y
not in x]
return set(ans)

Anyhow, thank you! :)

Oct 26 '07 #7

by: Gordon Williams | last post by:

Hi, I have to lists that I need to find the common numbers (2nd rounded to nearest integral) and I am wondering if there is a more efficient way of doing it. >>> a= >>> b= >>> ...

Python

Pythonic search of list of dictionaries

by: Bulba! | last post by:

Hello everyone, I'm reading the rows from a CSV file. csv.DictReader puts those rows into dictionaries. The actual files contain old and new translations of software strings. The dictionary...

Python

Calculate Form

by: Building Blocks | last post by:

Hi, All I need is a simle calculate form script which contains this: A script that can handle text input, radio buttons, checkboxes, and dropdowns. Each one of these variables will contain a...

Javascript

organizing elements of two sets for efficient search.

by: Amit | last post by:

Hi, I am trying to do the following: I have two sets of integers(both are disjoint). Based on some criteria, I choose one integer from each set and do one computationally expensive operation using...

C / C++

Set operations for lists: pythonic hints please!

by: jmdeschamps | last post by:

Working with several thousand tagged items on a Tkinter Canvas, I want to change different configurations of objects having a certain group of tags. I've used the sets module, on the tuple...

Python

pythonic equivalent of upvar?

by: David MacKay | last post by:

Dear Greater Py, <motivation note="reading this bit is optional"> I am writing a command-line reader for python. I'm trying to write something with the same brevity as perl's one-liner ...

Python

Merging two lists of data (Pythonic way)

by: SMB | last post by:

I have two lists of data like the following: LIST1 , ] LIST2 , 'label': 'First Name', 'width': 0L, 'separator': ',', 'height': 0L, 'type': 2L, 'order': 1L}, {'code': 14L, 'name': 'Last...

Python

Calculate slope of a data set or trend line

by: kux | last post by:

Hello everyone, I hope someone is out here who can help me with a simple calculation... I have a sales data base in access with monthly sales history by product. to make future predictions I...

Microsoft Access / VBA

seeking highly efficient caches scheme for demanding engineering computing?

by: Luna Moon | last post by:

seeking highly efficient caches scheme for demanding engineering computing? HI all, To same the time of costly function evaluation, I want to explore the possibility of caching. Suppose in...

C / C++

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

An efficient, pythonic way to calculate result sets

Similar topics