I am following the book (Statistical Rethinking) which has code in R and want to reproduce the same in code in Julia. In the book, they compute the likelihood of six successes out of 9 trials where a success, has a probability of 0.5. They achieve this using the following R code.
#R Code
dbinom(6, size = 9, prob=0.5)
#Out > 0.1640625
I am wondering how to do the same in Julia,
#Julia
using Distributions
b = Binomial(9,0.5)
# Its possible to look at random value,
rand(b)
#Out > 5
But how do I look at a specific value such as six successes?
I'm sure you know this but just to be sure the r dbinom function is the probability density (mass) function for the Binomial distribution.
Julia's Distributions package makes use of multiple dispatch to just have one generic pdf function that can be called with any type of Distribution as the first argument, rather than defining a bunch of methods like dbinom, dnorm (for the Normal distribution). So you can do:
julia> using Distributions
julia> b = Binomial(9, 0.5)
Binomial{Float64}(n=9, p=0.5)
julia> pdf(b, 6)
0.1640625000000001
There is also cdf which works in the same way to calculate (maybe unsurprisingly) for the cumulative density function.
Related
How do I use the Binomial function to solve this experiment:
number of trials -> n=18,
p=10%
success x=2
The answer is 28% . I am using Binomial(18, 0.1) but how I pass the n=2?
julia> d=Binomial(18,0.1)
Binomial{Float64}(n=18, p=0.1)
pdf(d,2)
How can I solve this in Julia?
What you want is the Probability Mass Function, aka the probability, that in a binomial experiment of n Bernoulli independent trials with a probability p of success on each individual trial, we obtain exactly x successes.
The way to answer this question in Julia is, using the Distribution package, to first create the "distribution" object with parameters n and p, and then call the function pdf to this object and the variable x:
using Distributions
n = 18 # number of trials in our experiments
p = 0.1 # probability of success of a single trial
x = 2 # number of successes for which we want to compute the probability/PMF
binomialDistribution = Binomial(n,p)
probOfTwoSuccesses = pdf(binomialDistribution,x)
Note that all the other probability related functions (like cdf, quantile, .. but also rand) work in the same way.. you first build the distribution object, that embed the specific distribution parameters, and then you call the function over the distribution object and the variable you are looking for, e.g. quantile(binomialDistribution,0.9) for 90% quantile.
My question is how to generate a sample in R from a logistic CDF with the inverse CDF method. The logistic density is p(θ) = exp(θ)/(1 + exp(θ))^2
Here is the algorithm for that method:
1: for t = 1 to T do
2: sample q(t) ∼ Unif(0, 1)
3: θ(t) ← F^−1(q(t))
4: end for
Here is my code but it just generates a vector of the same number. The result should be log-concave but obviously it would not be that if I put it in the histogram, so what is the problem?:
First define T as the number of draws you're taking from uniform distribution
T<-100000
sample_q<-runif(T,0,1)
It seems like plogis will give you the cumulative distribution function, so I suppose I can just take its inverse:
generate_samples_from_logistic_CDF <- function(p) {
for(t in length(T))
cdf<-plogis((1+exp(p)/(exp(p))))
inverse_cdf<-(1/cdf)
return(inverse_cdf)
}
should generate_samples_from_logistic_CDF(sample_q)
but instead it only gives me the same value for everything
Since the inverse CDF is already coded in R as qlogis(), this should work:
qlogis(runif(100000))
or if you want to do it "by hand" rather than using the built-in qlogis(), you can use R <- runif(100000); log(R/(1-R))
Note that rlogis(100000) should be more efficient.
One of your confusions is that "inverse" in the algorithm description above doesn't mean the multiplicative inverse or reciprocal (i.e. 1/x), but rather the function inverse (which in this case is log(q/(1-q)))
I am trying to calculate the density function of a continuos random variable in range in Julia using Distributions, but I am not able to define the range. I used Truncator constructor to construct the distribution, but I have no idea how to define the range. By density function I mean P(a
Would appreciate any help. The distribution I'm using is Gamma btw!
Thanks
To get the maximum and minimum of the support of distribution d just write maximum(d) and minimum(d) respectively. Note that for some distributions this might be infinity, e.g. maximum(Normal()) is Inf.
What version of Julia and Distributions du you use? In Distribution v0.16.4, it can be easily defined with the second and third arguments of Truncated.
julia> a = Gamma()
Gamma{Float64}(α=1.0, θ=1.0)
julia> b = Truncated(a, 2, 3)
Truncated(Gamma{Float64}(α=1.0, θ=1.0), range=(2.0, 3.0))
julia> p = rand(b, 1000);
julia> extrema(p)
(2.0007680527633305, 2.99864177354943)
You can see the document of Truncated by typing ?Truncated in REPL and enter.
I have to compute the probability of Poisson distribution in Julia. I just know how to get Poisson distribution. But i have to compute the probability. also i have lambda from 20 to 100.
using Distributions
Poisson()
The objects in Distributions.jl are like random variables. If you declare a value to be of a distribution, you can sample from it using rand, but there are a whole lot of other methods you can apply to it. Among them is pdf:
julia> X = Poisson(30)
Distributions.Poisson{Float64}(λ=30.0)
julia> pdf(X, 2)
4.2109303359780846e-11
julia> pdf(X, 0:1:10)
11-element Array{Float64,1}:
9.35762e-14
2.80729e-12
4.21093e-11
4.21093e-10
3.1582e-9
1.89492e-8
9.47459e-8
4.06054e-7
1.5227e-6
5.07567e-6
1.5227e-5
note: originally posted on Cross Validated (stats SE) on 07-26-2011, with no correct answers to date.
Background
I have a model, f, where Y=f(X)
X is an n x m matrix of samples from m parameters and Y is the n x 1 vector of model outputs.
f is computationally intensive, so I would like to approximate f using a multivariate cubic spline through (X,Y) points, so that I can evaluate Y at a larger number of points.
Question
Is there an R function that will calculate an arbitrary relationship between X and Y?
Specifically, I am looking for a multivariate version of the splinefun function, which generates a spline function for the univariate case.
e.g. this is how splinefun works for the univariate case
x <- 1:100
y <- runif(100)
foo <- splinefun(x,y, method = "monoH.FC")
foo(x) #returns y, as example
The test that the function interpolates exactly through the points is successful:
all(y == foo(1:100))
## TRUE
What I have tried
I have reviewed the mda package, and it seems that the following should work:
library(mda)
x <- data.frame(a = 1:100, b = 1:100/2, c = 1:100*2)
y <- runif(100)
foo <- mars(x,y)
predict(foo, x) #all the same value
however the function does not interpolate exactly through the design points:
all(y == predict(foo,x))
## FALSE
I also could not find a way to implement a cubic-spline in either the gam, marss, or earth packages.
Actually several packages can do it. The one I use is the "rms" package which has rcs, but the survival package also has pspline and the splines package has the ns function {}. "Natural splines" (constructed with ns) are also cubic splines. You will need to form multivariate fitting function with the '*' operator in the multivariate formula creating "crossed" spline terms.
that the example you offered was not sufficiently rich.
I guess I am confused that you want exact fits. R is a statistical package. Approximate estimation is the goal. Generally exact fits are more of a problem because they lead to multicollinearity.
Have a look at the DiceKriging package which was developed to undertake tasks like this.
http://cran.r-project.org/web/packages/DiceKriging/index.html
I've provided an example application at
https://stats.stackexchange.com/questions/13510/fitting-multivariate-natural-cubic-spline/65012#65012
I'm not sure if this is precisely what you are looking for, but you could try Tps() in the R package fields. It's meant for doing thin-plate splines interpolations (2D equivalent of cubic splines) for spatial data, but will take up to four covariates, although it will expect them to be euclidean x,y,z + time, so you need to be clear that you are selecting the correct options for your particular case. If you want to interpolate, set the smoothing parameter lambda to zero. You might also try the function polymars() in the R package polspline.