Hi to all, I need to calculate the hpergeometric distribution:
choose(r, x) * choose(b, n-x)
p(x; r,b,n) = -----------------------------
choose(r+b, n)
choose(r,x) is the binomial coefficient
I use the factorial to calculate the above formula but since I am using
large numbers, the result of choose(a,b) (ie: the binomial coefficient)
is too big even for large int. I've tried the scipy library, but this
library calculates
the hypergeometric using the factorials too, so the problem subsist. Is
there any other libray or an algorithm to calculate
the hypergeometric distribution? The statistical package R can handle
such calculations but I don't want to use python R binding since I want
a standalone app.
Thanks a lot
Ale 24 6267
Raven wrote: Hi to all, I need to calculate the hpergeometric distribution:
choose(r, x) * choose(b, n-x) p(x; r,b,n) = ----------------------------- choose(r+b, n)
choose(r,x) is the binomial coefficient I use the factorial to calculate the above formula but since I am using large numbers, the result of choose(a,b) (ie: the binomial coefficient) is too big even for large int. I've tried the scipy library, but this library calculates the hypergeometric using the factorials too, so the problem subsist. Is there any other libray or an algorithm to calculate the hypergeometric distribution?
Use logarithms.
Specifically,
from scipy import special
def logchoose(n, k):
lgn1 = special.gammaln (n+1)
lgk1 = special.gammaln (k+1)
lgnk1 = special.gammaln (n-k+1)
return lgn1 - (lgnk1 + lgk1)
def gauss_hypergeom (x, r, b, n):
return exp(logchoose(r , x) +
logchoose(b, n-x) -
logchoose(r+b, n))
Or you could use gmpy if you need exact rational arithmetic rather than floating
point.
--
Robert Kern ro*********@gma il.com
"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
Raven wrote: Hi to all, I need to calculate the hpergeometric distribution:
choose(r, x) * choose(b, n-x) p(x; r,b,n) = ----------------------------- choose(r+b, n)
choose(r,x) is the binomial coefficient I use the factorial to calculate the above formula but since I am using large numbers, the result of choose(a,b) (ie: the binomial coefficient) is too big even for large int. I've tried the scipy library, but this library calculates the hypergeometric using the factorials too, so the problem subsist. Is there any other libray or an algorithm to calculate the hypergeometric distribution? The statistical package R can handle such calculations but I don't want to use python R binding since I want a standalone app. Thanks a lot Ale
Ale
I had this code lying about if it helps. I don't know if it's even
correct but it's non-recursive!
def Binomial( n, k ):
ret = 0
if k == 0:
ret = 1
elif k > 0:
a = range( n+1 )
a[0] = 1
for i in a[1:]:
a[i] = 1
for j in range(i-1,0,-1):
a[j] = a[j] + a[j-1]
ret = a[k]
return ret
Gerard
On Mon, 26 Dec 2005 12:18:55 -0800, Raven wrote: Hi to all, I need to calculate the hpergeometric distribution:
choose(r, x) * choose(b, n-x) p(x; r,b,n) = ----------------------------- choose(r+b, n)
choose(r,x) is the binomial coefficient I use the factorial to calculate the above formula but since I am using large numbers, the result of choose(a,b) (ie: the binomial coefficient) is too big even for large int.
Are you sure about that? Python long ints can be as big as you have enough
memory for. My Python can print 10L**10000 to the console with a barely
detectable pause, and 10L**100000 with about a ten second delay. (Most of
that delay is printing it, not calculating it.)
25206 is the first integer whose factorial exceeds 10L**100000, so even if
you are calculating the binomial coefficient using the most naive
algorithm, calculating the factorials and dividing, you should easily be
able to generate it for a,b up to 20,000 unless you have a severe
shortage of memory.
I've tried the scipy library, but this library calculates the hypergeometric using the factorials too, so the problem subsist.
What exactly is your problem? What values of hypergeometric( x; r,b,n) fail
for you?
--
Steven.
Raven <ba********@gma il.com> wrote: Hi to all, I need to calculate the hpergeometric distribution:
choose(r, x) * choose(b, n-x) p(x; r,b,n) = ----------------------------- choose(r+b, n)
choose(r,x) is the binomial coefficient I use the factorial to calculate the above formula but since I am using large numbers, the result of choose(a,b) (ie: the binomial coefficient) is too big even for large int. I've tried the scipy library, but this library calculates the hypergeometric using the factorials too, so the problem subsist. Is there any other libray or an algorithm to calculate the hypergeometric distribution? The statistical package R can handle such calculations but I don't want to use python R binding since I want a standalone app.
Have you tried with gmpy?
Alex
Thanks to all of you guys, I could resolve my problem using the
logarithms as proposed by Robert. I needed to calculate the factorial
for genomic data, more specifically for the number of genes in the
human genome i.e. about 30.000 and that is a big number :-)
I didn't know gmpy
Thanks a lot, really
Ale
On Sat, 31 Dec 2005 16:24:02 -0800, Raven wrote: Thanks to all of you guys, I could resolve my problem using the logarithms as proposed by Robert. I needed to calculate the factorial for genomic data, more specifically for the number of genes in the human genome i.e. about 30.000 and that is a big number :-) I didn't know gmpy Thanks a lot, really
Are you *sure* the existing functions didn't work? Did you try them? def log2(x):
.... return math.log(x)/math.log(2)
.... n = 0.0 for i in range(1, 300000): # ten times bigger than you need
.... n += log2(i)
.... n
5025564.6087276 665 t = time.time(); x = 2L**(int(n) + 1); time.time() - t
0.2664909362792 9688
That's about one quarter of a second to calculate 300,000 factorial
(approximately) , and it shows that the calculations are well within
Python's capabilities.
Of course, converting this 1.5 million-plus digit number to a string takes
a bit longer:
t = time.time(); len(str(x)); time.time() - t
1512846
6939.3762848377 228
A quarter of a second to calculate, and almost two hours to convert to a
string. Lesson one: calculations on longints are fast. Converting them to
strings is not.
As far as your original question goes, try something like this:
(formula from memory, may be wrong)
def bincoeff(n,r):
.... x = 1
.... for i in range(r+1, n+1):
.... x *= i
.... for i in range(1, n-r+1):
.... x /= i
.... return x
.... bincoeff(10, 0)
1 bincoeff(10, 1)
10 bincoeff(10, 2)
45 bincoeff(10, 3)
120 import time t = time.time(); L = bincoeff(30000, 7000); time.time() - t
28.317800045013 428
Less than thirty seconds to calculate a rather large binomial coefficient
exactly. How many digits?
len(str(L))
7076
If you are calculating hundreds of hypergeometric probabilities, 30
seconds each could be quite painful, but it certainly shows that Python is
capable of doing it without resorting to logarithms which may lose some
significant digits. Although, in fairness, the log function doesn't seem
to lose much accuracy for arguments in the range you are dealing with.
How long does it take to calculate factorials?
def timefact(n):
.... # calculate n! and return the time taken in seconds
.... t = time.time()
.... L = 1
.... for i in range(1, n+1):
.... L *= i
.... return time.time() - t
.... timefact(3000)
0.0549139976501 46484 timefact(30000) # equivalent to human genome
5.0699510574340 82 timefact(300000 ) # ten times bigger
4255.2370519638 062
Keep in mind, if you are calculating the hypergeometric probabilities
using raw factorials, you are doing way too much work.
--
Steven.
Thanks Steven for your very interesting post.
This was a critical instance from my problem: from scipy import comb comb(14354,174)
inf
The scipy.stats.dis tributions.hype rgeom function uses the scipy.comb
function, so it returned nan since it tries to divide an infinite. I
did not tried to write a self-made function using standard python as I
supposed that the scipy functions reached python's limits but I was
wrong, what a fool :-)
If you are calculating hundreds of hypergeometric probabilities, 30 seconds each could be quite painful, but it certainly shows that Python is capable of doing it without resorting to logarithms which may lose some significant digits. Although, in fairness, the log function doesn't seem to lose much accuracy for arguments in the range you are dealing with.
Yes I am calculating hundreds of hypergeometric probabilities so I need
fast calculations
Ale
"Raven" <ba********@gma il.com> writes: Yes I am calculating hundreds of hypergeometric probabilities so I need fast calculations
Can you use Stirling's approximation to get the logs of the factorials?
On Sun, 01 Jan 2006 14:24:39 -0800, Raven wrote: Thanks Steven for your very interesting post.
This was a critical instance from my problem:
from scipy import comb comb(14354,174) inf
Curious. It wouldn't surprise me if scipy was using floats, because 'inf'
is usually a floating point value, not an integer.
Using my test code from yesterday, I got: bincoeff(14354, 174)
111727771935623 249173533679580 244374733360180 53487854593870
070906374894056 044891924883461 446844023623444 09632515556732
335635231613081 458252082763952 387644418578294 54464446478336
901737770950418 910676375517833 240712336253706 19908633625448
310766773824486 162461253466677 378968915481668 98009878730510
574761395158405 427699564142041 306927336297233 05869285300247
645972456505830 620188961902165 086857407612722 931651840L
Took about three seconds on my system.
Yes I am calculating hundreds of hypergeometric probabilities so I need fast calculations
Another possibility, if you want exact integer maths rather than floating
point with logarithms, is to memoise the binomial coefficients. Something
like this:
# untested
def bincoeff(n,r, \
cache={}):
try:
return cache((n,r))
except KeyError:
x = 1
for i in range(r+1, n+1):
x *= i
for i in range(1, n-r+1):
x /= i
cache((n,r)) = x
return x
--
Steven. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Michael Peuser |
last post by:
Hi,
I should like to make a distribution (using Tkinter), with standard DLLs
removed.
pythonXX.dll is no problem.
tcl und tk, which make the mass of mega bytes, cannot be removed because
_tkinter.pyd seems to expect them in the same directory
(Question 1) Is there a configration or path setting in py2exe/distutils I
|
by: Tan Thuan Seah |
last post by:
Hi all,
I am looking for a way to generate a random number given the variance of a
gaussian distribution(or normal distribution). The mean is 0 but the
variance will be a user input. Does C++ have any of this sort of generator
available? Or must I use some transformation to get a random number from a
standardised normal distribution and map it to my distribution? Any link and
references are welcome.
Thuan Seah
|
by: joshsackett |
last post by:
Hi all,
Is it possible to move the distribution database to a new folder/drive
without removing replication? I am attempting to do it the same way you
would move tempdb:
ALTER DATABASE distribution MODIFY FILE
(name = distmodel, filename = 'C:\DISTMOVED\distribution.MDF')
ALTER DATABASE distribution MODIFY FILE
(name = distmodel_log, filename = 'C:\DISTMOVED\distribution.LDF')
|
by: Edward Hua |
last post by:
Hi,
I'm wondering if there is a function written in C that
computes the inverse hypergeometric distribution (i.e., not the density
function, but the cumulative distribution) that's been written by anyone?
If it's in some online resource, please give me its URL; if it's in print,
plese give me the title of the book and/or where it is in the book.
Thanks in advance.
|
by: kabradley |
last post by:
Hello,
Thanks for looking at my post and hopefully having an answer or at least a suggestion to my problem. I currently work at a financial planning office that deals with many clients and accounts. Each client may have multiple accounts such as an individual, IRA, and possibly a joint tenant account. Each one of these accounts for the particular client is contained in a 'Portfolio'. For example, Joe Smith may have 3 accounts: Joe Smith...
| |
by: Peter Graf |
last post by:
Hi,
i tried to evaluate the 2F1(a,b,c,z)
hypergeometric function for a=1, b=2-0.1i,
c=2.5-0.01i and z=-5e8. I've used the numerical
recipes routine but this routine couldn't act with
this large value of z. Does someone know a C or
C++ routine, which can act with complex b, c and
large values of z.
|
by: Alex |
last post by:
Hi everybody,
I wonder if it is possible in python to produce random numbers
according to a user defined distribution?
Unfortunately the random module does not contain the distribution I
need :-(
Many thanks
axel
|
by: eGenix Team: M.-A. Lemburg |
last post by:
________________________________________________________________________
ANNOUNCING
eGenix.com mx Base Distribution
Version 3.1.1 for Python 2.6
Open Source Python extensions providing important and useful
services for Python programmers.
|
by: M.-A. Lemburg |
last post by:
Just to let you know: we also provide binaries and support for
Mac OS X Intel and PPC.
Thanks to Joe Strout for pinging us about this.
On 2008-10-15 17:41, eGenix Team: M.-A. Lemburg wrote:
--
Marc-Andre Lemburg
eGenix.com
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
| |
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |