Linear Transformation of Dirichlet Distribution - linear-algebra

I am wondering what is the best way to represent the probability distribution of a linearly transformed Dirichlet distribution:
let {X = [x1, x2 ... xn] | Σxi = 1} be a set of multinomial probabilities that follow a Dirichlet distribution Dɑn.
Consider matrix Anxm where:
AnxmX = Y st. (Σyi = 1 ∀X) (Note Anxm is derived from a Bayesian net)
If X follows a Derichlet Distribution what distribution does Y follow?!?
If believe that if Anxm is an invertible square matrix. By AnxmX being injective and surjective, Y follows the below distribution:
Likelihood(Y) = Dɑn(A-1Y)
But I am not sure what to do for other cases.
Thank you!
this paper may be useful but it is beyond me...
https://onlinelibrary.wiley.com/doi/epdf/10.2307/3315988

Related

Random number simulation in R

I have been going through some random number simulation equations while i found out that as Pareto dosent have an inbuilt function.
RPareto is found as
rpareto <- function(n,a,l){
rp <- l*((1-runif(n))^(-1/a)-1)
rp
}
can someone explain the intuitive meaning behind this.
It's a well known result that if X is a continuous random variable with CDF F(.), then Y = F(X) has a Uniform distribution on [0, 1].
This result can be used to draw random samples of any continuous random variable whose CDF is known: generate u, a Uniform(0, 1) random variable and then determine the value of x for which F(x) = u.
In specific cases, there may well be more efficient ways of sampling from F(.), but this will always work as a fallback.
It's likely (I haven't checked the accuracy of the code myself, but it looks about right) that the body of your function solves f(x) = u for known u in order to generate a random variable with a Pareto distribution. You can check it with a little algebra after getting the CDF from this Wikipedia page.

Empirical CDF vs Theoretical CDF in R

I want to check the "probability integral transform" theorem using R.
Let's suppose X is an exponential random variable with lambda = 5.
I want to check that the random variable U = F_X = 1 - exp(-5*X) has a uniform (0,1) distribution.
How would you do it?
I would start in this way:
nsample <- 1000
lambda <- 5
x <- rexp(nsample, lambda) #1000 exponential observation
u <- 1- exp(-lambda*x) #CDF of x
Then I need to find the CDF of u and compare it with the CDF of a Uniform (0,1).
For the empirical CDF of u I could use the ECDF function:
ECDF_u <- ecdf(u) #empirical CDF of U
Now I should create the theoretical CDF of Uniform (0,1) and plot it on the same graph of the ECDF in order to compare the two graphs.
Can you help with the code?
You are almost there. You don't need to compute the ECDF yourself – qqplot will take care of this. All you need is your sample (u) and data from the distribution you want to check against. The lazy (and not quite correct) approach would be to check against a random sample drawn from a uniform distribution:
qqplot(runif(nsample), u)
But of course, it is better to plot against the theoretical quantiles:
# the actual plot
qqplot( qunif(ppoints(length(u))), u )
# add a line
qqline(u, distribution=qunif, col='red', lwd=2)
Looks pretty good to me.

Efficiently calculating integral of a multivariate function on non-rectangular region?

I want to compute the expected value of a multivariate function f(x) wrt to dirichlet distribution. My problem is "penta-nomial" (i.e 5 variables) so calculating the explicit form of the expected value seems unreasonable. Is there a way to numerically integrate it efficiently?
f(x) = \sum_{0,4}(x_i*log(n/x_i))
x = <x_0, x_1, x_2, x_3, x_4> and n is a constant

Testing ratio of density distributions for normality

I have a normal distribution and a uniform distribution. I want to calculate a ratio: the density of the normal distribution, over the density of the uniform. Then I want to test this ratio for normality.
ht <- runif(3000, 1, 18585056) # Uniform distribution
hm <- rnorm(35, 10000000, 5000000) # Normal distribution
hmd <- density(hm, from=0, to=18585056) # Kernel density of distributions over range
htd <- density(ht, from=0, to=18585056)
ratio <- hmd$y/htd$y # Ratio of kernel density values
The distributions hm and ht above are examples of what my experimental data shows; the vectors I will actually be using are not randomly generated in R.
I know that I can get a good idea of normality from the correlation coefficient of a Q-Q plot:
qqp <- qqnorm(hm)
cor(qqp$x,qqp$y)
For hm, which is normally distributed, this gives a value close to 1.
Is there a way of determining the normality of the density vectors? e.g. hmd and ratio.
(Additional information: hm and ht are modelling homozygous and heterozygous SNPs across a genome of length 18585056)
First, this is really a statistics question; you should consider posting it on stats.stackexchange.com - you are likely to get a better answer.
Second, the short answer to your question is that "testing the ratio of two density functions for normality" is not a meaningful idea. As mentioned in the comment, the ratio of two density functions is not a density function. Among other things, a density function must integrate to 1 over (-Inf,+Inf), which this ratio will not (generally).
It is meaningful, however, to test if the distribution of the ratio of two random variables is normal. If you know that the numerator is normally distributed and the denominator is uniformly distributed, then the ratio will definitely not be normally distributed, as demonstrated below in the discussion of the slash distribution.
If you do not know the distributions of the numerator and denominator, but just have random samples, you should calculate the ratio of the random variates and test that for normality. In your case (with minor edits):
set.seed(123)
ht <- runif(3000, 1, 18585056)
hm <- rnorm(3500, 10000000, 5000000)
Z <- sample(hm,1000)/sample(ht,1000) # numer. and denom. must be same length
par(mfrow=c(1,2))
# histogram of Z
hist(Z,xlim=c(-5,5), breaks=c(-Inf,seq(-5,5,0.2),Inf),freq=F, ylim=c(0,.4))
# normal Q-Q plot
qqnorm(Z,ylim=c(-5,5))
qqline(Z,xlim=c(-5,5),lty=2,col="blue")
Clearly, the ratio distribution is not normal.
Slash Distribution
In the special case
X ~ N[0,1] = φ(x) (-Inf ≤ x ≤ Inf), and
Y ~ U[0,1] = 1 (0 ≤ x ≤ 1); 0 elsewhere
Z = X/Y ~ [ φ(0) - φ(x) ]/x2
That is, a random variable formed as the ratio of two other (independent) random variables, the numerator distributed as N(0,1) and the denominator distributed as U(0,1), has the slash distribution, defined above. We can show this in R code as follows
set.seed(123)
X <- rnorm(10000)
Y <- runif(10000)
Z <- X/Y
dslash <- function(x) (dnorm(0)-dnorm(x))/x^2
x <- seq(-5,5,0.02)
par(mfrow=c(1,2))
hist(Z,xlim=c(-5,5), breaks=c(-Inf,seq(-5,5,0.2),Inf),freq=F, ylim=c(0,.4))
lines(x,dslash(x),xlim=c(-5,5),col="red")
lines(x,dnorm(x),xlim=c(-5,5),col="blue",lty=2)
qqnorm(Z,ylim=c(-5,5))
qqline(Z,xlim=c(-5,5),lty=2,col="blue")
The bars represent the histogram of Z = X/Y, the red curve is the slash distribution, and the blue curve is the pdf of N[0,1] for reference. Because the red curve is "bell shaped" there is a temptation to think that Z is normally distributed, just with a larger variance. The Q-Q plot shows clearly that this is not the case. The tails of the slash distribution are much larger than would be expected from a normal distribution.

Probability transformation using R

I want to turn a continuous random variable X with cdf F(x) into a continuous random variable Y with cdf F(y) and am wondering how to implement it in R.
For example, perform a probability transformation on data following normal distribution (X) to make it conform to a desirable Weibull distribution (Y).
(x=0 has CDF F(x=0)=0.5, CDF F(y)=0.5 corresponds to y=5, then x=0 corresponds to y=5 etc.)
There are many built in distribution functions, those starting with a 'p' will transform to a uniform and those starting with a 'q' will transform from a uniform. So the transform in your example can be done by:
y <- qweibull( pnorm( x ), 2, 6.0056 )
Then just change the functions and/or parameters for other cases.
The distr package may also be of interest for additional capabilities.
In general, you can transform an observation x on X to an observation y on Y by
getting the probability of X≤x, i.e. FX(x).
then determining what observation y has the same probability,
I.e. you want the probability Y≤y = FY(y) to be the same as FX(x).
This gives FY(y) = FX(x).
Therefore y = FY-1(FX(x))
where FY-1 is better known as the quantile function, QY. The overall transformation from X to Y is summarized as: Y = QY(FX(X)).
In your particular example, from the R help, the distribution functions for the normal distribution is pnorm and the quantile function for the Weibull distribution is qweibull, so you want to first of all call pnorm, then qweibull on the result.

Resources