ScientificPython - LeastSquareFit diverges

Harold Fellermann

Dear all,

I am trying to fit a powerlaw to a small dataset using
Scientific.Functions.LeastSquares fit.
Unfortunately, the algorithm seems to diverge and throws an
OverflowException.
Here is how I try it:

>>from Scientific.Functions.LeastSquares import leastSquaresFit

data = [

.... (2.5, 589.0, 0.10000000000000001),
.... (7.5, 442.0, 0.10000000000000001),
.... (12.5, 96.0, 0.10000000000000001),
.... (17.5, 36.0, 0.10000000000000001),
.... (22.5, 16.0, 0.10000000000000001),
.... (27.5, 7.0, 0.10000000000000001),
.... (32.5, 6.0, 0.10000000000000001),
.... (37.5, 3.0, 0.10000000000000001),
.... (42.5, 3.0, 0.10000000000000001),
.... (47.5, 1.0, 0.10000000000000001),
.... (52.5, 3.0, 0.10000000000000001),
.... (57.5, 1.0, 0.10000000000000001),
.... (67.5, 1.0, 0.10000000000000001),
.... (77.5, 2.0, 0.10000000000000001),
.... (82.5, 1.0, 0.10000000000000001),
.... (87.5, 2.0, 0.10000000000000001)
.... ]

>>>
def powerlaw((a,b),x) :

.... return a*x**b
....

>>params,chisq = leastSquaresFit(powerlaw,(10,-3),data)

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File
"/usr/lib/python2.4/site-packages/Scientific/Functions/LeastSquares.py",
line 72, in leastSquaresFit
next_chi_sq, next_alpha = _chiSquare(model, next_p, data)
File
"/usr/lib/python2.4/site-packages/Scientific/Functions/LeastSquares.py",
line 22, in _chiSquare
f = model(parameters, point[0])
File "<stdin>", line 2, in powerlaw
File
"/usr/lib/python2.4/site-packages/Scientific/Functions/FirstDerivatives.py",
line 182, in __rpow__
return pow(other, self)
File
"/usr/lib/python2.4/site-packages/Scientific/Functions/FirstDerivatives.py",
line 171, in __pow__
raise OverflowError, "Numerical result of pow(%s,%s) out of range."
% (self.value,other.value-1)
OverflowError: Numerical result of pow(2.5,8376.79243687) out of range.

>>>

I added some debugging output in
/usr/lib/python-2.4/site-packages/Scientifc/Functions/LeastSquares.py
in the function _chiSquare that prints the fit parameters during the
Levenberg-Marquardt iteration.
The procedure seems do diverge after the first step:

((10, [1]), (-3, [0, 1]))
[(-67402.311817579117, [1]), (8377.7924368716158, [0, 1])]

Note that I could easily fit the above data using gnuplots internal
fitting procedure. Any idea what is going wrong here? Is it a known
problem? Are there any work arounds or other packages?

Any help is appreciated!

- harold -

Jul 18 '06 #1

Subscribe Post Reply

2339

Terry Reedy

"Harold Fellermann" <da******@googlemail.comwrote in message
news:11**********************@s13g2000cwa.googlegr oups.com...

I am trying to fit a powerlaw to a small dataset using
Scientific.Functions.LeastSquares fit.

This is a bit off-topic here, and normally better for the scipy list, but I
have some experience with nonlinear least squares.

Unfortunately, the algorithm seems to diverge and throws an
OverflowException.

Assuming the program is okay, this means that either the function
mismatches the data or the initial values are too far off to converge.

Here is how I try it:

>>>from Scientific.Functions.LeastSquares import leastSquaresFit

data = [

... (2.5, 589.0, 0.10000000000000001),
... (7.5, 442.0, 0.10000000000000001),
... (12.5, 96.0, 0.10000000000000001),

I presume that tuples are x, y, some_error_indicator. But the last does
not matter here.

>>>def powerlaw((a,b),x) :

... return a*x**b

Did you try plotting logx versus log y to see if you get approximately a
straight line? If so, the intercept and slope are estimates of loga and b.

...

>>>params,chisq = leastSquaresFit(powerlaw,(10,-3),data)

I presume (10,-3) is the starting (a,b). But, for instance 10*7.5**-3 =
..02, which has no relation to 442, whereas, for instance, 1000*7.5-.75 =
221, which is in the ballpark, at least. So (a,b)=(1000, -.75) might have
a chance.

Terry Jan Reedy

Jul 18 '06 #2

konrad.hinsen

On 18.07.2006, at 15:59, Harold Fellermann wrote:

>>>def powerlaw((a,b),x) :

... return a*x**b

Fitting power laws is a tricky business, you need a pretty good
initial guess to get convergence.

Note that I could easily fit the above data using gnuplots internal
fitting procedure. Any idea what is going wrong here? Is it a known
problem? Are there any work arounds or other packages?

My suggestion is to fit, at least as a first step, the logarithms of
your data points:

import Numeric as N

def powerlaw_log((a, b), x) :
return N.log(a) + b*N.log(x)

params1, chisq = leastSquaresFit(powerlaw_log, (10., -3.),
[(x, N.log(y)) for x, y, sigma in
data])
You can then use those parameters as starting values for fitting your
original problem:

params2, chisq = leastSquaresFit(powerlaw, params1, data)

Doing this for your data yields:

params1: [9469.9675999067185, -2.0881423620750521]

params2: [1591.4025775162165, -1.0112284948049179]

The big difference between the two fits is a further indicator for a
stability problem. I would trust the first set more than the second one.

As a general rule, the model to be fitted should be a smoothly
varying function of the parameters, and the same should be true for
the derivatives.

The second general rule is never to trust a non-linear fit algorithm
blindly. Look at your data first, see if the model can be a good fit,
and play with some paramater values to get a feeling for how they
influence the fit. Plotting your data set, it is immediately clear
that the first point ruins any nice power law behaviour. You might
thus prefer to do the fit without the first point, and you will get a
much better defined exponent:

params1: [31363.301954929859, -2.4047303053979046]
params2: [182522.2346197216, -2.9893640209815757]

Plotting the models corresponding to these two sets together with the
data, you will see that everything coincides well for large x values,
meaning that the first two points make all the difference - another
pointer towards a lack of stability in the fit.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: hi****@cnrs-orleans.fr
---------------------------------------------------------------------

Jul 18 '06 #3

Harold Fellermann

Thanks for your advices, Terry and Konrad,

using the linear fit as initial condition for the pawerlow fit works
pretty well for my data.
(I already had the two calculations but performed them vice versa ...
:-) Anyway, I had
the impression that the leastSquaresFit in Scientific Python is an
implementation of
the Levenberg Marquardt algorithm as it is presented in the Numerical
Recipes. Accoring
to reviews, this algorithm is not famous for its stability
(e.g. http://www.stanford.edu/class/cme302/wnnr/nr.html). Better
implementations
are out there (e.g. http://www.ics.forth.gr/~lourakis/levmar/). Are
there any plans to
improve the SciPy algorithm? Would it be a welcome contribution to
SciPy to work
this part out?

- harold -

Jul 19 '06 #4

Konrad Hinsen

On Jul 19, 2006, at 16:53, Harold Fellermann wrote:

:-) Anyway, I had
the impression that the leastSquaresFit in Scientific Python is an
implementation of
the Levenberg Marquardt algorithm as it is presented in the Numerical
Recipes.

True.

Accoring
to reviews, this algorithm is not famous for its stability
(e.g. http://www.stanford.edu/class/cme302/wnnr/nr.html). Better
implementations
are out there (e.g. http://www.ics.forth.gr/~lourakis/levmar/). Are
there any plans to
improve the SciPy algorithm? Would it be a welcome contribution to
SciPy to work
this part out?

Yes, definitely. And no, I have no plans to do it myself any time
soon. The current implementation has always been sufficient for my
needs, and time is scarce...

BTW, ScientificPython (http://dirac.cnrs-orleans.fr/
ScientificPython/) is not the same thing as SciPy (http://
www.scipy.org/). Both are scientific libraries for Python, but their
focus is different: ScientificPython aims at providing pythonic
modules for scientific computing, whereas SciPy's objective is to
provide Python interfaces to the large pool of scientific libraries
from the Fortran/C/C++ world. Scientific users of Python should
probably have both of them installed.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: hi****@cnrs-orleans.fr
---------------------------------------------------------------------

Jul 19 '06 #5

Similar topics

How to make a Gaussian Fit?

by: Xcal | last post by:

Anyone knows how to make a Gaussian fit to a histogram data using Python, or where I can find a library that helps me in this task? Tks everyone... Paxcal

Python

Class for Time Series Data

by: sdhyok | last post by:

Hi, I am trying to build up a system handling time series data a lot. Do you know any well-designed python class specially for time series data? Thanks in advance. Shin, Daehyok

Python

any Python equivalent of Math::Polynomial::Solve?

by: Just | last post by:

While googling for a non-linear equation solver, I found Math::Polynomial::Solve in CPAN. It seems a great little module, except it's not Python... I'm especially looking for its poly_root()...

Python

How to compile ScientificPython-2.4.9 under XP cygwin environment?

by: Zhuanshi He | last post by:

I try to compile ScientificPython-2.4.9 (http://starship.python.net/~hinsen/ScientificPython/) under Windows XP cygwin environment using python 2.3.3 ,and gcc (GCC) 3.3.3 (cygwin special). The...

Python

109

Top warning signs of bad code?

by: Andrew Thompson | last post by:

It seems most people get there JS off web sites, which is entirely logical. But it is also a great pity since most of that code is of such poor quality. I was looking through the JS FAQ for any...

Javascript

SciPy and NetCDF

by: skilpat | last post by:

I am going to be doing a lot of work with large data sets stored in various netCDF files, and after checking out the alternatives, I would really like to go with SciPy. The problem is that SciPy...

Python

deriving from float or int

by: Russ | last post by:

Does it ever make sense to derive a class from a basic type such as float or int? Suppose, for example, that I want to create a class for physical scalars with units. I thought about deriving from...

Python

Parallel Python

by: parallelpython | last post by:

Has anybody tried to run parallel python applications? It appears that if your application is computation-bound using 'thread' or 'threading' modules will not get you any speedup. That is because...

Python

Help with apppproximate value of pi = sqrt(6* (1/(1)2 + 1/(2)2 + 1/(3)2+ 1/(4)2 + 1/(

by: pyramid | last post by:

Hello I am working on one of my lab for this week, which calculates the approximate value of pi. Listed below is the actual problem, which I am posting here, so that you can see the different...

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice