Efficiently calculating integral of a multivariate function on non-rectangular region? - r

I want to compute the expected value of a multivariate function f(x) wrt to dirichlet distribution. My problem is "penta-nomial" (i.e 5 variables) so calculating the explicit form of the expected value seems unreasonable. Is there a way to numerically integrate it efficiently?
f(x) = \sum_{0,4}(x_i*log(n/x_i))
x = <x_0, x_1, x_2, x_3, x_4> and n is a constant

Related

Using FFT in R to Determine Density Function for IID Sum

The goal is to compute the density function of a sum of n IID random variables via the density function of one of these random variables by:
Transforming the density function into the characteristic function via fft
Raise the characteristic function to the n
Transform the resulting characteristic function into the density function of interest via fft(inverse=TRUE)
The below is my naive attempt at this:
sum_of_n <- function(density, n, xstart, xend, power_of_2)
{
x <- seq(from=xstart, to=xend, by=(xend-xstart)/(2^power_of_2-1))
y <- density(x)
fft_y <- fft(y)
fft_sum_of_y <- (fft_y ^ n)
sum_of_y <- Re(fft(fft_sum_of_y, inverse=TRUE))
return(sum_of_y)
}
In the above, density is an arbitrary density function: for example
density <- function(x){return(dgamma(x = x, shape = 2, rate = 1))}
n indicates the number of IID random variables being summed. xstart and xend are the start and end of the approximate support of the random variable. power_of_2 is the power of 2 length for the numeric vectors used. As I understand things, lengths of powers of two increase the efficiency of the fft algorithm.
I understand at least partially why the above does not work as intended in general. Firstly, the values themselves will not be scaled correctly, as fft(inverse=TRUE) does not normalize by default. However, I find that the values are still not correct when I divide by the length of the vector i.e.
sum_of_y <- sum_of_y / length(sum_of_y)
which based on my admittedly limited understanding of fft is the normalizing calculation. Secondly, the resulting vector will be out of phase due to (someone correct me on this if I am wrong) the shifting of the zero frequency that occurs when fft is performed. I have tried to use, for example, pracma's fftshift and ifftshift, but they do not appear to address this problem correctly. For symmetric distributions e.g. normal, this is not difficult to address since the phase shift is typically exactly half, so that an operation like
sum_of_y <- c(sum_of_y[(length(y)/2+1):length(y)], sum_of_y[1:(length(y)/2)])
works as a correction. However, for asymmetric distributions like the gamma distribution above this fails.
In conclusion, are there adjustments to the code above that will result in an appropriately scaled and appropriately shifted final density function for the IID sum?

Binomial Experiment

How do I use the Binomial function to solve this experiment:
number of trials -> n=18,
p=10%
success x=2
The answer is 28% . I am using Binomial(18, 0.1) but how I pass the n=2?
julia> d=Binomial(18,0.1)
Binomial{Float64}(n=18, p=0.1)
pdf(d,2)
How can I solve this in Julia?
What you want is the Probability Mass Function, aka the probability, that in a binomial experiment of n Bernoulli independent trials with a probability p of success on each individual trial, we obtain exactly x successes.
The way to answer this question in Julia is, using the Distribution package, to first create the "distribution" object with parameters n and p, and then call the function pdf to this object and the variable x:
using Distributions
n = 18 # number of trials in our experiments
p = 0.1 # probability of success of a single trial
x = 2 # number of successes for which we want to compute the probability/PMF
binomialDistribution = Binomial(n,p)
probOfTwoSuccesses = pdf(binomialDistribution,x)
Note that all the other probability related functions (like cdf, quantile, .. but also rand) work in the same way.. you first build the distribution object, that embed the specific distribution parameters, and then you call the function over the distribution object and the variable you are looking for, e.g. quantile(binomialDistribution,0.9) for 90% quantile.

Defining exponential distribution in R to estimate probabilities

I have a bunch of random variables (X1,....,Xn) which are i.i.d. Exp(1/2) and represent the duration of time of a certain event. So this distribution has obviously an expected value of 2, but I am having problems defining it in R. I did some research and found something about a so-called Monte-Carlo Stimulation, but I don't seem to find what I am looking for in it.
An example of what i want to estimate is: let's say we have 10 random variables (X1,..,X10) distributed as above, and we want to determine for example the probability P([X1+...+X10<=25]).
Thanks.
You don't actually need monte carlo simulation in this case because:
If Xi ~ Exp(λ) then the sum (X1 + ... + Xk) ~ Erlang(k, λ) which is just a Gamma(k, 1/λ) (in (k, θ) parametrization) or Gamma(k, λ) (in (α,β) parametrization) with an integer shape parameter k.
From wikipedia (https://en.wikipedia.org/wiki/Exponential_distribution#Related_distributions)
So, P([X1+...+X10<=25]) can be computed by
pgamma(25, shape=10, rate=0.5)
Are you aware of rexp() function in R? Have a look at documentation page by typing ?rexp in R console.
A quick answer to your Monte Carlo estimation of desired probability:
mean(rowSums(matrix(rexp(1000 * 10, rate = 0.5), 1000, 10)) <= 25)
I have generated 1000 set of 10 exponential samples, putting them into a 1000 * 10 matrix. We take row sum and get a vector of 1000 entries. The proportion of values between 0 and 25 is an empirical estimate of the desired probability.
Thanks, this was helpful! Can I use replicate with this code, to make it look like this: F <- function(n, B=1000) mean(replicate(B,(rexp(10, rate = 0.5)))) but I am unable to output the right result.
replicate here generates a matrix, too, but it is an 10 * 1000 matrix (as opposed to a 1000* 10 one in my answer), so you now need to take colSums. Also, where did you put n?
The correct function would be
F <- function(n, B=1000) mean(colSums(replicate(B, rexp(10, rate = 0.5))) <= n)
For non-Monte Carlo method to your given example, see the other answer. Exponential distribution is a special case of gamma distribution and the latter has additivity property.
I am giving you Monte Carlo method because you name it in your question, and it is applicable beyond your example.

univariate nonlinear optimization with quadratic constraint in R

I have a quadratic function f where, f = function (x) {2+.1*x+.23*(x*x)}. Let's say I have another quadratic fn g where g = function (x) {3+.4*x-.60*(x*x)}
Now, I want to maximize f given the constraints 1. g>0 and 2. 600<x<650
I have tried the packages optim,constrOptim and optimize. optimize does one dim. optimization, but without constraints and constrOptim I couldn't understand. I need to this using R. Please help.
P.S. In this example, the values may be erratic as I have given two random quadratic functions, but basically I want maximization of a quadratic fn given a quadratic constraint.
If you solve g(x)=0 for x by the usual quadratic formula then that just gives you another set of bounds on x. If your x^2 coefficent is negative then g(x) > 0 between the solutions, otherwise g(x)>0 outside the solutions, so within (-Inf, x1) and (x2, Inf).
In this case, g(x)>0 for -1.927 < x < 2.59. So in this case both your constraints cannot be simultaneously achieved (g(x) is LESS THAN 0 for 600<x<650).
But supposing your second condition was 1 < x < 5, then you'd just combine the solution from g(x)>0 with that interval to get 1 < x < 2.59, and then maximise f in that interval using standard univariate optimisation.
And you don't even need to run an optimisation algorithm. Your target f is quadratic. If the coefficient of x^2 is positive the maximum is going to be at one of your limits of x, so you only have a small number of values to try. If the coefficient of x^2 is -ve then the maximum is either at a limit or at the point where f(x) peaks (solve f'(x)=0) if that is within your limits.
So you can do this precisely, there's just a few conditions to test and then some intervals to compute and then some values of f at those interval limits to calculate.

Probability transformation using R

I want to turn a continuous random variable X with cdf F(x) into a continuous random variable Y with cdf F(y) and am wondering how to implement it in R.
For example, perform a probability transformation on data following normal distribution (X) to make it conform to a desirable Weibull distribution (Y).
(x=0 has CDF F(x=0)=0.5, CDF F(y)=0.5 corresponds to y=5, then x=0 corresponds to y=5 etc.)
There are many built in distribution functions, those starting with a 'p' will transform to a uniform and those starting with a 'q' will transform from a uniform. So the transform in your example can be done by:
y <- qweibull( pnorm( x ), 2, 6.0056 )
Then just change the functions and/or parameters for other cases.
The distr package may also be of interest for additional capabilities.
In general, you can transform an observation x on X to an observation y on Y by
getting the probability of X≤x, i.e. FX(x).
then determining what observation y has the same probability,
I.e. you want the probability Y≤y = FY(y) to be the same as FX(x).
This gives FY(y) = FX(x).
Therefore y = FY-1(FX(x))
where FY-1 is better known as the quantile function, QY. The overall transformation from X to Y is summarized as: Y = QY(FX(X)).
In your particular example, from the R help, the distribution functions for the normal distribution is pnorm and the quantile function for the Weibull distribution is qweibull, so you want to first of all call pnorm, then qweibull on the result.

Resources