Does anyone know the algorithm to generate random variable from $F(x) = \Pi_{i=1}^n F_i(x)$ distribution where $i=1,2,..,n$ in R?
That's equivalent to saying Pr{X1 <= x & X2 <= x & ... & Xn <= x} for independent X's. Generate from each of the Fi's independently, and take the max. Independent generation gives you an ensemble meeting the individual distribution requirements, and taking the max meets the requirement of the joint probability statement.
In the special case where the Fi's are invertible and are the same for all i, this can be done with a single inverse transformation. Let's call Fi(x) === G(x) to avoid tripping over notation. Then F(x) = Gn(x), and by the inverse transform theorem Gn(X) is distributed uniformly between 0 and 1 for random variable X. Therefore, for U ~ uniform(0,1) and starting with your statement about F(x):
F(X) = Gn(X) = U
G(X) = U1/n
X = G-1(U1/n)
This gives a one-step method for generating the max of n independent and identically distributed random variates X, as opposed to having to generate n of them and take the max.
Related
Suppose I have an arbitrary probability distribution function (PDF) defined as a function f, for example:
using Random, Distributions
#// PDF with parameter θ ϵ (0,1), uniform over
#// segments [-1,0) and [0,1], zero elsewhere
f = θ -> x ->
(-1 <= x < 0) ? θ^2 :
(0 <= x <= 1) ? 1-θ^2 :
0
How can I sample values from a random variable with this PDF in Julia? (or alternatively, how can I at least simulate sampling from such a random variable?)
i.e. I want the equivalent of rand(Normal(),10) for 10 values from a (standard) normal disribution, but I want to use the function f to define the distribution used (something like rand(f(0.4),10) - but this doesn't work)
(This is already an answer for discrete distributions at How can I write an arbitrary discrete distribution in Julia?: however I'm wanting to use a continuous distribution. There are some details of creating a sampler at https://juliastats.org/Distributions.jl/v0.14/extends.html which I think might be useful, but I don't understand how to apply them. Also in R I've used the inverse CDF technique as described at https://blogs.sas.com/content/iml/2013/07/22/the-inverse-cdf-method.html for simulating such random variables, but am unsure how it might best be implemented in Julia.)
First problem is that what you've provided is not a complete specification of a probability distribution since it doesn't say anything about the distribution within the interval [-1, 0) or within the interval [0, 1]. So for the purposes of this answer I'm going to assume your probability distribution function is uniform on each of these intervals. Given this, then I would argue the most Julian way to implement your own distribution would be to create a new subtype, in this case, of ContinuousUnivariateDistribution. Example code follows:
using Distributions
struct MyDistribution <: ContinuousUnivariateDistribution
theta::Float64
function MyDistribution(theta::Float64)
!(0 <= theta <= 1) && error("Invalid theta: $(theta)")
new(theta)
end
end
function Distributions.rand(d::MyDistribution)::Float64
if rand() < d.theta^2
x = rand() - 1
else
x = rand()
end
return x
end
function Distributions.quantile(d::MyDistribution, p::Real)::Float64
!(0 <= p <= 1) && error("Invalid probability input: $(p)")
if p < d.theta^2
x = -1.0 + (p / d.theta^2)
else
x = (p - d.theta^2) / (1 - d.theta^2)
end
return x
end
In the above code I have implemented a rand and quantile method for the new distribution which is the minimum to be able to make function calls like rand(MyDistribution(0.4), 20) to sample 20 random numbers from your new distribution. See here for a list of other methods you may want to add to your new distribution type (depending on your use-case perhaps you won't bother).
Note, if efficiency is an issue, you may look into some of the methods that will allow you to minimise the number of d.theta^2 operations, e.g. Distributions.sampler. Alternatively, you could just store theta^2 internally in MyDistribution but always display the underlying theta. Up to you really.
Finally, you don't really need type annotations on function outputs. I've just included them for clarity.
I am asked to implement an algorithm to simulate from a poisson (lambda) distribution using simulation from an exponential distribution.
I was given the following density:
P(X = k) = P(X1 + · · · + Xk ≤ 1 < X1 + · · · + Xk+1), for k = 1, 2, . . . .
P(X = k) is the poisson with lambda, and Xi is exponential distribution.
I wrote code to simulate the exponential distribution, but have no clue how to simulate a poisson. Could anybody help me about this? Thanks million.
My code:
n<-c(1:k)
u<-runif(k)
x<--log(1-u)/lambda
I'm working on the assumption you (or your instructor) want to do this from first principles rather than just calling the builtin Poisson generator. The algorithm is pretty straightforward. You count how many exponentials you can generate with the specified rate until their sum exceeds 1.
My R is rusty and this sounds like a homework anyway, so I'll express it as pseudo-code:
count <- 0
sum <- 0
repeat {
generate x ~ exp(lambda)
sum <- sum + x
if sum > 1
break
else
count <- count + 1
}
The value of count after you break from the loop is your Poisson outcome for this trial. If you wrap this as a function, return count rather than breaking from the loop.
You can improve this computationally in a couple of ways. The first is to notice that the 1-U term for generating the exponentials has a uniform distribution, and can be replaced by just U. The more significant improvement is obtained by writing the evaluation as maximize i s.t. SUM(-log(Ui) / rate) <= 1, so SUM(log(Ui)) >= -rate.
Now exponentiate both sides and simplify to get
PRODUCT(Ui) >= Exp(-rate).
The right-hand side of this is constant, and can be pre-calculated, reducing the amount of work from k+1 log evaluations and additions to one exponentiation and k+1 multiplications:
count <- 0
product <- 1
threshold = Exp(-lambda)
repeat {
generate u ~ Uniform(0,1)
product <- product * u
if product < threshold
break
else
count <- count + 1
}
Assuming you do the U for 1-U substitution for both implementations, they are algebraically equal and will yield identical answers to within the precision of floating point arithmetic for a given set of U's.
You can use rpois to generate Poisson variates as per above suggestion. However, my understanding of the question is that you wish to do so from first principles rather than using built-in functions. To do this, you need to use the property of the Poisson arrivals stating that the inter-arrival times are exponentially distributed. Therefore we proceed as follows:
Step 1: Generate a (large) sample from the exponential distribution and create vector of cumulative sums. The k-th entry of this vector is the waiting time to the k-th Poisson arrival
Step 2: Measure how many arrivals we see in a unit time interval
Step3: Repeat steps 1 and 2 many times and gather the results into a vector
This will be your sample from the Poisson distribution with the correct rate parameter.
The code:
lambda=20 # for example
out=sapply(1:100000, function(i){
u<-runif(100)
x<--log(1-u)/lambda
y=cumsum(x)
length(which(y<=1))
})
Then you can test the validity vs the built-in function via the Kolmogorov-Smirnov test:
ks.test(out, rpois(100000, lambda))
I have the pdf of a distribution. This distribution is not a standard distribution and no functions exist in R to sample from it. How to I sample from this pdf using R?
This is more of a statistics question, as it requires sampling, but in general, you can take this approach to the problem:
Find a distribution f, whose pdf, when multiplied by any given constant k, is always greater than the pdf of the distribution in question, g.
For each sample, do the following steps:
Sample a random number x from the distribution f.
Calculate C = f(x)*k/g(x). This should be equal to or less than 1.
Draw a random number u from a uniform distribution U(0,1). If C < u, then go back to step 3. Otherwise keep x as the number and continue sampling if desired.
This process is known as rejection sampling, and is often used in random number generators that are not uniform.
The normal distribution and the uniform distribution are some of the more common distributions to sample from, but you can do other ones. Generally you want the shapes of k*f(x) and g(x) to be very close, so you don't have to reject a lot of samples.
Here's an example implementation:
#n is sample size
#g is pdf you want to sample from
#rf is sampling function for f
#df is density function for f
#k is multiplicative constant
#... is any necessary parameters for f
function.sample <- function(n,g,rf,df,k,...){
results = numeric(n)
counter = 0
while(counter < n){
x = rf(1,...)
x.pdf = df(x,...)
if (runif(0,1) >= x.pdf * k/g(x)){
results[counter+1] = x
counter = counter + 1
}
}
}
There are other methods to do random sampling, but this is usually the easiest, and it works well for most functions (unless their PDF is hard to calculate but their CDF isn't).
I want to compute the expected value of a multivariate function f(x) wrt to dirichlet distribution. My problem is "penta-nomial" (i.e 5 variables) so calculating the explicit form of the expected value seems unreasonable. Is there a way to numerically integrate it efficiently?
f(x) = \sum_{0,4}(x_i*log(n/x_i))
x = <x_0, x_1, x_2, x_3, x_4> and n is a constant
I have a quadratic function f where, f = function (x) {2+.1*x+.23*(x*x)}. Let's say I have another quadratic fn g where g = function (x) {3+.4*x-.60*(x*x)}
Now, I want to maximize f given the constraints 1. g>0 and 2. 600<x<650
I have tried the packages optim,constrOptim and optimize. optimize does one dim. optimization, but without constraints and constrOptim I couldn't understand. I need to this using R. Please help.
P.S. In this example, the values may be erratic as I have given two random quadratic functions, but basically I want maximization of a quadratic fn given a quadratic constraint.
If you solve g(x)=0 for x by the usual quadratic formula then that just gives you another set of bounds on x. If your x^2 coefficent is negative then g(x) > 0 between the solutions, otherwise g(x)>0 outside the solutions, so within (-Inf, x1) and (x2, Inf).
In this case, g(x)>0 for -1.927 < x < 2.59. So in this case both your constraints cannot be simultaneously achieved (g(x) is LESS THAN 0 for 600<x<650).
But supposing your second condition was 1 < x < 5, then you'd just combine the solution from g(x)>0 with that interval to get 1 < x < 2.59, and then maximise f in that interval using standard univariate optimisation.
And you don't even need to run an optimisation algorithm. Your target f is quadratic. If the coefficient of x^2 is positive the maximum is going to be at one of your limits of x, so you only have a small number of values to try. If the coefficient of x^2 is -ve then the maximum is either at a limit or at the point where f(x) peaks (solve f'(x)=0) if that is within your limits.
So you can do this precisely, there's just a few conditions to test and then some intervals to compute and then some values of f at those interval limits to calculate.