By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,712 Members | 1,362 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,712 IT Pros & Developers. It's quick & easy.

Predictive or scoring solution for PostgreSQL ?

P: n/a
Hi,

Does anyone know a predictive or a database scoring solution for PostgreSQL ?

I'm looking for a system able to take a database with for example 100 000
records in total, inside them we have got 1000 records with one field set to
YES ... with about 100 fields in the table ...
The system should be able to set a score to the 100 fields to determine the
most importants fields to this 1000 records who's got the YES value ...
Then set a formula ... to calculate and to apply to the rest of the database
the same score ... and then estimate (predictive thing) in the 90 000 rest of
records which one may have the famous field set to YES ...

I hope I'm clear in my demand ... ;o)

Hope also someone have already heard about this ... and may be could help
me ;o)

best regards,
--
Hervé
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #1
Share this Question
Share on Google+
6 Replies

P: n/a
Hmmmm, it's been a while since I did this but...

This was with Sybase (it should be configurable with ODBC by now) but we used a
tool called ModelMAX (Advanced Software Appliactions or A.S.A) which could
select a sample of records and score them on the basis of fields (you need some
NO's as well). It produced 'C' code that would score non-flagged records on the
basis of the new results.

Our process was to select a sample of YES/NO records and split it into to two
samples. (The Yes records are actually coded as '1' and the No records as '0').
The No records give the system something to differentiate.

The first and larger sample was used to generate or train the neural net. Then
the second sample (with known values) was scored using the new model, and the
known result compared with the score.

Generally the score was a probability - of response or credit card application
approval or the like.

If the model is valid, the formula can be rolled out to the database.

The trick is that the tool needs to understand something about the fields
available for scoring. Domain and type, ranges and codings - if these are fixed
they are a one time setup.

Other tools do similar things - another was Knowledge Seeker from Angoss
Software - which built turnkey decision trees (this was fairly cheap depending
on the system it is running on). SAS also produced a turnkey modeling solution
(not cheap $$$$). You could also try SPSS (cheaper than SAS). Group 1 Software
also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never
actually used it.

I'll dig around and see if I can find an article I wrote about this...

Marc A. Leith
President
redboxdata inc.

E-mail:ml****@redboxdata.com

Quoting Hervé Piedvache <fo*****@noos.fr>:
Hi,

Does anyone know a predictive or a database scoring solution for PostgreSQL
?

I'm looking for a system able to take a database with for example 100 000
records in total, inside them we have got 1000 records with one field set to

YES ... with about 100 fields in the table ...
The system should be able to set a score to the 100 fields to determine the
most importants fields to this 1000 records who's got the YES value ...
Then set a formula ... to calculate and to apply to the rest of the database

the same score ... and then estimate (predictive thing) in the 90 000 rest of

records which one may have the famous field set to YES ...

I hope I'm clear in my demand ... ;o)

Hope also someone have already heard about this ... and may be could help
me ;o)

best regards,
--
Hervé
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 22 '05 #2

P: n/a
Quoting Hervé Piedvache <fo*****@noos.fr>:
Hi,
Does anyone know a predictive or a database scoring solution for PostgreSQL

in response, Marc A. Leith wrote:
Hmmmm, it's been a while since I did this but...

Other tools do similar things - another was Knowledge Seeker from Angoss
Software - which built turnkey decision trees (this was fairly cheap depending
on the system it is running on). SAS also produced a turnkey modeling solution
(not cheap $$$$). You could also try SPSS (cheaper than SAS). Group 1 Software
also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never
actually used it.

Would Joe Conway's PL/R procedural language be any help here? I'd guess
there's an R package to fit the bill, but then again I'm only on page 30
of Modern Applied Statistics in S-Plus. ;-)

Mike Mascari

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 22 '05 #3

P: n/a
Marc A. Leith wrote:
Other tools do similar things - another was Knowledge Seeker from Angoss
Software - which built turnkey decision trees (this was fairly cheap depending
on the system it is running on). SAS also produced a turnkey modeling solution
(not cheap $$$$). You could also try SPSS (cheaper than SAS). Group 1 Software
also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never
actually used it.


Or try R (open source implementation of the S language, similar to
S-PLUS)...
http://www.r-project.org/

....along with PL/R:
http://www.joeconway.com/plr/

And see here for a variety of packages to do just about any kind of
analysis you can think of:
http://cran.r-project.org/

Some assembly required, but powerful and free.

HTH,

Joe

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 22 '05 #4

P: n/a
Quoting Mike Mascari <ma*****@mascari.com>:
Quoting Hervé Piedvache <fo*****@noos.fr>:
Hi,
Does anyone know a predictive or a database scoring solution for

PostgreSQL

in response, Marc A. Leith wrote:
Hmmmm, it's been a while since I did this but...

Other tools do similar things - another was Knowledge Seeker from Angoss
Software - which built turnkey decision trees (this was fairly cheap

depending
on the system it is running on). SAS also produced a turnkey modeling

solution
(not cheap $$$$). You could also try SPSS (cheaper than SAS). Group 1

Software
also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never
actually used it.

Would Joe Conway's PL/R procedural language be any help here? I'd guess
there's an R package to fit the bill, but then again I'm only on page 30
of Modern Applied Statistics in S-Plus. ;-)

Mike Mascari


For a turnkey modeling solution, you need more than simple stat functions. These
solutions automatically transform or 'bucketize' the data and then analyze the
covariance between the score variables and the known result.

They then select a smaller number of variables and use them to build a model -
this may be done with a backward-propogation neural network, a more traditional
regression model, or some sort of decision tree or CHAID system. Model 1 uses 3
or 4 approaches and selects the 1 with the best (truest fit).

ModelMAX (and the like) have been honed over the last decade by teams of
statisticians and still generate models that are close but not yet equal to
those that our modeling team used to build. The difference was I could build a
model in a few hours (limited by the CPU on the PC) and they took several weeks
to hand tune the result.

Marc A. Leith
President
redboxdata inc.

E-mail:ml****@redboxdata.com

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 22 '05 #5

P: n/a
Marc A. Leith wrote:
Quoting Mike Mascari <ma*****@mascari.com>:

Would Joe Conway's PL/R procedural language be any help here? I'd guess
there's an R package to fit the bill, but then again I'm only on page 30
of Modern Applied Statistics in S-Plus. ;-)

For a turnkey modeling solution, you need more than simple stat functions. These
solutions automatically transform or 'bucketize' the data and then analyze the
covariance between the score variables and the known result.


I'm obviously not in any position to define what is needed here. I only
had business statistics in college as a requirement for an economics
degree many years ago. However, I will say that you may be
underestimating R's capabilities. It includes linear and non-linear
regression models, neural networks, time-series analysis, and a host
(and I mean 100's) of other models I have yet to fathom. I'd humbly
speculate that the core developers, include the chairman of the
statistics department at Oxford, would take issue with its
characterization as "simple stat functions". But what do I know... :-)

Mike Mascari


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 22 '05 #6

P: n/a
On Thu, 05 Feb 2004 07:45:41 -0500, Mike Mascari wrote:
Marc A. Leith wrote:
Quoting Mike Mascari <ma*****@mascari.com>:

Would Joe Conway's PL/R procedural language be any help here? I'd guess
there's an R package to fit the bill, but then again I'm only on page 30
of Modern Applied Statistics in S-Plus. ;-)

For a turnkey modeling solution, you need more than simple stat functions. These
solutions automatically transform or 'bucketize' the data and then analyze the
covariance between the score variables and the known result.

I'm obviously not in any position to define what is needed here. I only
had business statistics in college as a requirement for an economics
degree many years ago. However, I will say that you may be
underestimating R's capabilities. It includes linear and non-linear
regression models, neural networks, time-series analysis, and a host
(and I mean 100's) of other models I have yet to fathom. I'd humbly
speculate that the core developers, include the chairman of the
statistics department at Oxford, would take issue with its
characterization as "simple stat functions". But what do I know... :-) Mike Mascari

Fair enough - I took a look at the links that Joe Conway provided and it seems very powerful and feature complete. My comment was unfair, and consider it
rephrased/withdrawn

- BUT is it turnkey? The original question sought a 'system' to score the database.

SAS & SPSS can be configured to do this, as likely R can be, but does that make it a system?

The solutions I suggested can be run by someone with virtually no knowledge of stats (Not that I suggest this for complex issues). They can select an appropriate model in
minutes rather than needing a MA to desing a solution.

Marc
Marc A. Leith
President
redboxdata inc.

e-mail: ma**@redboxdata.com
cell: (416) 737 0045
Nov 22 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.