I am trying to calculate the density function of a continuos random variable in range in Julia using Distributions, but I am not able to define the range. I used Truncator constructor to construct the distribution, but I have no idea how to define the range. By density function I mean P(a
Would appreciate any help. The distribution I'm using is Gamma btw!
Thanks
To get the maximum and minimum of the support of distribution d just write maximum(d) and minimum(d) respectively. Note that for some distributions this might be infinity, e.g. maximum(Normal()) is Inf.
What version of Julia and Distributions du you use? In Distribution v0.16.4, it can be easily defined with the second and third arguments of Truncated.
julia> a = Gamma()
Gamma{Float64}(α=1.0, θ=1.0)
julia> b = Truncated(a, 2, 3)
Truncated(Gamma{Float64}(α=1.0, θ=1.0), range=(2.0, 3.0))
julia> p = rand(b, 1000);
julia> extrema(p)
(2.0007680527633305, 2.99864177354943)
You can see the document of Truncated by typing ?Truncated in REPL and enter.
Related
I am following the book (Statistical Rethinking) which has code in R and want to reproduce the same in code in Julia. In the book, they compute the likelihood of six successes out of 9 trials where a success, has a probability of 0.5. They achieve this using the following R code.
#R Code
dbinom(6, size = 9, prob=0.5)
#Out > 0.1640625
I am wondering how to do the same in Julia,
#Julia
using Distributions
b = Binomial(9,0.5)
# Its possible to look at random value,
rand(b)
#Out > 5
But how do I look at a specific value such as six successes?
I'm sure you know this but just to be sure the r dbinom function is the probability density (mass) function for the Binomial distribution.
Julia's Distributions package makes use of multiple dispatch to just have one generic pdf function that can be called with any type of Distribution as the first argument, rather than defining a bunch of methods like dbinom, dnorm (for the Normal distribution). So you can do:
julia> using Distributions
julia> b = Binomial(9, 0.5)
Binomial{Float64}(n=9, p=0.5)
julia> pdf(b, 6)
0.1640625000000001
There is also cdf which works in the same way to calculate (maybe unsurprisingly) for the cumulative density function.
I have to compute the probability of Poisson distribution in Julia. I just know how to get Poisson distribution. But i have to compute the probability. also i have lambda from 20 to 100.
using Distributions
Poisson()
The objects in Distributions.jl are like random variables. If you declare a value to be of a distribution, you can sample from it using rand, but there are a whole lot of other methods you can apply to it. Among them is pdf:
julia> X = Poisson(30)
Distributions.Poisson{Float64}(λ=30.0)
julia> pdf(X, 2)
4.2109303359780846e-11
julia> pdf(X, 0:1:10)
11-element Array{Float64,1}:
9.35762e-14
2.80729e-12
4.21093e-11
4.21093e-10
3.1582e-9
1.89492e-8
9.47459e-8
4.06054e-7
1.5227e-6
5.07567e-6
1.5227e-5
I have a random variable X and a transformation f and I would like to know the probability distribution function of f(X), at least approximately. In Mathematica there is TransformedDistribution, but I could not find something similar in R. As I said, some kind of approximative solution would be fine, too.
You can check the distr package. For instance, say that y = x^2+2x+1, where x is normally distributed with mean 2 and standard deviation 5. You can:
require(distr)
x<-Norm(2,5)
y<-x^2+2*x+1
#y#r gives random samples. We make an histogram.
hist(y#r(10000))
#y#d and y#p are the density and the cumulative functions
y#d(80)
#[1] 0.002452403
y#p(80)
#[1] 0.8891796
I use density() to do KDE,like
#Rscript#
x <- c(rep(1,3),rep(2,4),rep(3,5))
density(x)
Am I suppose to get a probability density function? If so, How do I reuse it to obtain the probability of 1 value e.g. what is the probability of x<=2 P(x<=2) under my KDE function?
Tanks for sharing your idea!
Because density() gives you the continous KDE, the probability of an exact value is zero. You can only get some information like P(x <= 1). In your case hist() should be the correct selection.
EDIT:
Please have a look here
https://stats.stackexchange.com/questions/78711/how-to-find-estimate-probability-density-function-from-density-function-in-r
I would like to evaluate the inverse Student's t-distribution function for small values, e.g., 1e-18, in Matlab. The degrees of freedom is 2.
Unfortunately, Matlab returns NaN:
tinv(1e-18,2)
NaN
However, if I use R's built-in function:
qt(1e-18,2)
-707106781
The result is sensible. Why can Matlab not evaluate the function for this small value? The Matlab and R results are quite similar to 1e-15, but for smaller values the difference is considerable:
tinv(1e-16,2)/qt(1e-16,2) = 1.05
Does anyone know what is the difference in the implemented algorithms of Matlab and R, and if R gives correct results, how could I effectively calculate the inverse t-distribution, in Matlab, for smaller values?
It appears that R's qt may use a completely different algorithm than Matlab's tinv. I think that you and others should report this deficiency to The MathWorks by filing a service request. By the way, in R2014b and R2015a, -Inf is returned instead of NaN for small values (about eps/8 and less) of the first argument, p. This is more sensible, but I think they should do better.
In the interim, there are several workarounds.
Special Cases
First, in the case of the Student's t-distribution, there are several simple analytic solutions to the inverse CDF or quantile function for certain integer parameters of ν. For your example of ν = 2:
% for v = 2
p = 1e-18;
x = (2*p-1)./sqrt(2*p.*(1-p))
which returns -7.071067811865475e+08. At a minimum, Matlab's tinv should include these special cases (they only do so for ν = 1). It would probably improve the accuracy and speed of these particular solutions as well.
Numeric Inverse
The tinv function is based on the betaincinv function. It appears that it may be this function that is responsible for the loss of precision for small values of the first argument, p. However, as suggested by the OP, one can use the CDF function, tcdf, and root-finding methods to evaluate the inverse CDF numerically. The tcdf function is based on betainc, which doesn't appear to be as sensitive. Using fzero:
p = 1e-18;
v = 2
x = fzero(#(x)tcdf(x,v)-p, 0)
This returns -7.071067811865468e+08. Note that this method is not very robust for values of p close to 1.
Symbolic Solutions
For more general cases, you can take advantage of symbolic math and variable precision arithmetic. You can use identities in terms of Gausian hypergeometric functions, 2F1, as given here for the CDF. Thus, using solve and hypergeom:
% Supposedly valid for or x^2 < v, but appears to work for your example
p = sym('1e-18');
v = sym(2);
syms x
F = 0.5+x*gamma((v+1)/2)*hypergeom([0.5 (v+1)/2],1.5,-x^2/v)/(sqrt(sym('pi')*v)*gamma(v/2));
sol_x = solve(p==F,x);
vpa(sol_x)
The tinv function is based on the betaincinv function. There is no equivalent function or even an incomplete Beta function in the Symbolic Math toolbox or MuPAD, but a similar 2F1 relation for the incomplete Beta function can be used:
p = sym('1e-18');
v = sym(2);
syms x
a = v/2;
F = 1-x^a*hypergeom([a 0.5],a+1,x)/(a*beta(a,0.5));
sol_x = solve(2*abs(p-0.5)==F,x);
sol_x = sign(p-0.5).*sqrt(v.*(1-sol_x)./sol_x);
vpa(sol_x)
Both symbolic schemes return results that agree to -707106781.186547523340184 using the default value of digits.
I've not fully validated the two symbolic methods above so I can't vouch for their correctness in all cases. The code also needs to be vectorized and will be slower than a fully numerical solution.