Does randsample function exit in Julia? - julia

would you mind if I asked, is there any function like randsample in Julia?
sample function is in Julia. Is it same to randsample in matlab?? What is difference between sample and rand in Julia?
Tanks very much

rand is defined in Base. It supports unweighted sampling with replacement. You can sample from the set of values (there is a broad range what that set can be, for instance if you use Distributions.jl you can sample from a wide array of distributions defined there; by default rand() samples from a uniform distribution on [0,1[ inverval).
sample is defined in the StatsBase.jl package. It supports sampling from a population with or without replacement, that is optionally weighted.
EDIT
A simple example of sampling without replacement:
julia> using StatsBase
julia> sample(1:5, 4, replace=false)
4-element Array{Int64,1}:
4
3
1
2
julia> sample(1:5, 5, replace=false)
5-element Array{Int64,1}:
3
4
2
5
1
julia> sample(1:5, 6, replace=false)
ERROR: Cannot draw more samples without replacement.

Related

Computing the likelihood of data for Binomial Distribution

I am following the book (Statistical Rethinking) which has code in R and want to reproduce the same in code in Julia. In the book, they compute the likelihood of six successes out of 9 trials where a success, has a probability of 0.5. They achieve this using the following R code.
#R Code
dbinom(6, size = 9, prob=0.5)
#Out > 0.1640625
I am wondering how to do the same in Julia,
#Julia
using Distributions
b = Binomial(9,0.5)
# Its possible to look at random value,
rand(b)
#Out > 5
But how do I look at a specific value such as six successes?
I'm sure you know this but just to be sure the r dbinom function is the probability density (mass) function for the Binomial distribution.
Julia's Distributions package makes use of multiple dispatch to just have one generic pdf function that can be called with any type of Distribution as the first argument, rather than defining a bunch of methods like dbinom, dnorm (for the Normal distribution). So you can do:
julia> using Distributions
julia> b = Binomial(9, 0.5)
Binomial{Float64}(n=9, p=0.5)
julia> pdf(b, 6)
0.1640625000000001
There is also cdf which works in the same way to calculate (maybe unsurprisingly) for the cumulative density function.

Range for continuos distribution in Julia

I am trying to calculate the density function of a continuos random variable in range in Julia using Distributions, but I am not able to define the range. I used Truncator constructor to construct the distribution, but I have no idea how to define the range. By density function I mean P(a
Would appreciate any help. The distribution I'm using is Gamma btw!
Thanks
To get the maximum and minimum of the support of distribution d just write maximum(d) and minimum(d) respectively. Note that for some distributions this might be infinity, e.g. maximum(Normal()) is Inf.
What version of Julia and Distributions du you use? In Distribution v0.16.4, it can be easily defined with the second and third arguments of Truncated.
julia> a = Gamma()
Gamma{Float64}(α=1.0, θ=1.0)
julia> b = Truncated(a, 2, 3)
Truncated(Gamma{Float64}(α=1.0, θ=1.0), range=(2.0, 3.0))
julia> p = rand(b, 1000);
julia> extrema(p)
(2.0007680527633305, 2.99864177354943)
You can see the document of Truncated by typing ?Truncated in REPL and enter.

How to vectorize rmultinom?

Many R functions for simulating from probability distributions are vectorised. ?rmultinom says that dmultinom is not vectorized, hence I assume that also rmultinom is not. What is the most efficient way to execute rmultinom repeatedly across a set of probabilities?
For example:
p = matrix( c(0.1,0.2,0.3,0.4,0.2,0.3,0.4,0.1,0.3,0.4,0.2,0.1), ncol=4, nrow = 3, T)
p is a 3 x 4 matrix of probabilities that sum to one for each row. It is now the goal to create n samples of size for each row. For simplicity use n=1, size=1, the categorical distribution.
rmultinom(1,1,p) gives a 12 x 1 matrix. The desired result is a 4 x 3 matrix though, for which there is exactly 1 element equal to 1 for each column.
A for loop is possible but seems inefficient. Is there a better way to achieve this (for large matrices p)?

how to compute the probability of Poisson distribution in julia

I have to compute the probability of Poisson distribution in Julia. I just know how to get Poisson distribution. But i have to compute the probability. also i have lambda from 20 to 100.
using Distributions
Poisson()
The objects in Distributions.jl are like random variables. If you declare a value to be of a distribution, you can sample from it using rand, but there are a whole lot of other methods you can apply to it. Among them is pdf:
julia> X = Poisson(30)
Distributions.Poisson{Float64}(λ=30.0)
julia> pdf(X, 2)
4.2109303359780846e-11
julia> pdf(X, 0:1:10)
11-element Array{Float64,1}:
9.35762e-14
2.80729e-12
4.21093e-11
4.21093e-10
3.1582e-9
1.89492e-8
9.47459e-8
4.06054e-7
1.5227e-6
5.07567e-6
1.5227e-5

Clustering and distance calculation in Julia

I have a collection of n coordinate points of the form (x,y,z). These are stored in an n x 3 matrix M.
Is there a built in function in Julia to calculate the distance between each point and every other point? I'm working with a small number of points so calculation time isn't too important.
My overall goal is to run a clustering algorithm, so if there is a clustering algorithm that I can look at that doesn't require me to first calculate these distances please suggest that too. An example of the data I would like to perform clustering on is below. Obviously I'd only need to do this for the z coordinate.
To calculate distances use the Distances package.
Given a matrix X you can calculate pairwise distances between columns. This means that you should supply your input points (your n objects) to be the columns of the matrices. (In your question you mention nx3 matrix, so you would have to transpose this with the transpose() function.)
Here is an example on how to use it:
>using Distances # install with Pkg.add("Distances")
>x = rand(3,2)
3x2 Array{Float64,2}:
0.27436 0.589142
0.234363 0.728687
0.265896 0.455243
>pairwise(Euclidean(), x, x)
2x2 Array{Float64,2}:
0.0 0.615871
0.615871 0.0
As you can see the above returns the distance matrix between the columns of X. You can use other distance metrics if you need to, just check the docs for the package.
Just for completeness to the #niczky12 answer, there is a package in Julia called Clustering which essentially, as the name says, allows you to perform clustering.
A sample kmeans algorithm:
>>> using Clustering # Pkg.add("Clustering") if not installed
>>> X = rand(3, 100) # data, each column is a sample
>>> k = 10 # number of clusters
>>> r = kmeans(X, k)
>>> fieldnames(r)
8-element Array{Symbol,1}:
:centers
:assignments
:costs
:counts
:cweights
:totalcost
:iterations
:converged
The result is stored in the return of the kmeans (r) which contains the above fields. The two probably most interesting fields: r.centers contains the centers detected by the kmeans algorithm and r.assigments contains the cluster to which each of the 100 samples belongs.
There are several other clustering methods in the same package. Feel free to dive into the documentation and apply the one that best suits your needs.
In your case, as your data is an N x 3 matrix you only need to transpose it:
M = rand(100, 3)
kmeans(M', k)

Resources