On Feb 7, 6:30 am, "Mike Heywood" <michael.heyw...@easyjet.comwrote:

Hi,

I am currently trying to automate a process that I have been studying

the manual results from for a while. The process simply identifies

events that meet certain criteria and at the moment it then requires

human interaction to decide what to do next (is a seperate process run

or not). I have set up the process to do this automatically based on a

couple of variables calculated but I am having a few problems

optimising the data to find these variables.

In order to increase the 'correctness' of the system I need to run an

optimisation weekly to take into account any new data collected. The

collected data takes the following form:

x Value Result

6 37 A

6 19 A

7 25 A

8 9 N

9 15 A

11 18 A

What I need to do is find a formula that calculates a value for each

x, above which the secondary process is run (A) and below which it

isn't (N). This isn't as simple as fitting a best fit line as in this

example the best fit line would take the form y=134.63x^-0.9706 which

would only be correct 50% of the time. If we instead use y = x^1.125

then it is correct 100% of the time. The underlying thing is that it

doesn't matter if I am 0.1 or 50 above the line, it will be actioned

and likewise, it won't be no matter how far below the line.

At the moment I run a series of queries that take each row and combine

it with a selection of m and c value (y=mx^c) and decide if for that

selection of values if that row would be correctly predicted or not.

When it has done this for every row (about 100x100 combinations) it

looks for the combination that is most accurate. This is obviously

time consuming and, as there are upwards of 90 datasets, it creates

loads of 'ghost' tables that bloat the database and can only be

removed by compacting.

What I am hoping is that someone has done something similar before or

has an idea how I could speed up the process without losing accuracy

or resolution (indeed I hope to increase this). I have tried exporting

data to excel to use solver but that can't deal with the non-linearity

and I can't afford Premium Solver (although having used the trial

version, that does work). I have also tried looping through a vba code

but can't help but create temp tables.

If anything written isn't clear or you need more data please ask.

Mike

ps. I'm using Office 2003 on Windows XP Professional

I might be able to help. Let me give my perspective about it so that

you can help me understand the problem. I had a little trouble

following you in a few spots.

Equations of the form y = m x ^ c were called power curves in my

engineering classes. Finding m and c is a problem that comes up in

engineering quite often because many empirical relationships have this

form. If the data creates a straight line on a logarithmic or semi-

logarithmic plot it's possible to adjust a least-squares best fit line

to the situation:

Y = m X ^ c

log Y = m + c * log X (log can be any convenient base; suitable X and

Y values)

Thus the logarithms of the data points (Xi, Yi) can be used with the

basic equations to find m and c by using log Y in place of Y and log X

in place of X. Note that a correlation coefficient should be used to

determine how well the curve fit. Then for a given value you can use:

Sgn(Ys - m Xs ^ c)

to determine whether a newly sampled point (Xs, Ys) is above (1), on

(0) or below the best fit curve (-1).

I show the basic equations here:

http://groups.google.com/group/comp....f7af5f9a498f39
James A. Fortune

CD********@FortuneJames.com