<snip>I think this problem is related to integer partitioning, but it's not

>Wow, that's helpful. http://en.wikipedia.org/wiki/Subset_sum clarifies

If, by "integer partitioning," you mean the "subset sum

problem" (given a finite set S of integers, does S contain a subset

which sums up to some given integer k?), then you are right. *I'm

reasonably sure there's a polynomial reduction from your problem to

the subset sum problem. *That would make your problem NP-complete.

what I'm trying to do. Yes, it probably is NP-complete. Fortunately my

use case doesn't require that the process finish, just that it produce

a lot. I can exclude trivial cases (e.g., f' = f(S) + s_1 - s_1) with

some sort of simplifier.

Like I said, because this looks related to the subset sum problemThere are a few real-world constraints which make the problem easier,

(though harder, because you're asking for "all" such functions, for

some rigorous definition of "all," as per the previous sentence), I

suspect it's NP-complete. *However, there are probably some good

heuristics, such as if s1 has a large intersection with s0, it's

probably a good idea to use s1 in whatever formula you come up with.

but only by linear factors, I suspect. More on this below.

Other than that, I don't really have an idea. *Can you say what theThis is a problem in statistical estimation. Given n records made up

application of this algorithm would be? *Knowing the use case might

suggest a better approach.

of k variables, define a cell as the point in the cartesian product of

v_1 * v_2 * ... * v_k. I want to apply an estimator on the data in

each cell.

However, many cells are empty, and others are too sparse to support

estimation. One solution is to build lots of valid aggregations,

called "strata," for which estimates can be made for chunks of cells.

A "valid aggregation" is a sequence of *adjacent* cells (see below)

whose data are pooled (by whatever mechanism is relevant to the

estimator).

The cells are "elements" in my initial problem statement, and the

strata are the sets.

The constraints I mentioned are on the composition of strata: strata

can only contain sequences of adjacent cells, where adjacency is

defined for each variable. Two cells are adjacent if they are equal on

every variable except one, and on that one and by its definition, they

are adjacent.

Does this make it easier to understand, or more difficult? thanks --

PB.