density of x^2 when x has a triangular distribution - r

I would like to find out the x^2 density distribution by given x distribution. Currently my samples fall into triangular distribution(min = 1, max = 6, mode = 3). How to generate samples for x^2 distribution given x distribution is triangular in r ? Any help will be appreciate it.
library(ExtraDistr)
sample_size <- 100
x <- rtri(sample_size, 1, 6, 3) # ExtraDistr::triangular distribution (min, max, mode)
I can figure out by hand using inverting CDF. But it will give back two pdf functions. I do not know how to convert to sampling in R when there are two functions , which one x falls into (1,9), then another is in (9,36). Should I have half sample size for each pdf ?

Related

How to randomly generate (x, y) points following a linear equation?

I have a equation y=9x+6. I want to extract 10 random points from this function. How should I proceed?
Generate 10 random x-values, in this example uniformly distributed (function runif), and then calculate the corresponding y-values following your equation.
You can control the x-range by setting different min and max parameters to function runif.
x <- runif(10, min = 0, max = 1)
y <- 9*x+6
plot(x,y)

Convert uniform draws to normal distributions with known mean and std in R

I apply the sensitivity package in R. In particular, I want to use sobolroalhs as it uses a sampling procedure for inputs that allow for evaluations of models with a large number of parameters. The function samples uniformly [0,1] for all inputs. It is stated that desired distributions need to be obtained as follows
####################
# Test case: dealing with non-uniform distributions
x <- sobolroalhs(model = NULL, factors = 3, N = 1000, order =1, nboot=0)
# X1 follows a log-normal distribution:
x$X[,1] <- qlnorm(x$X[,1])
# X2 follows a standard normal distribution:
x$X[,2] <- qnorm(x$X[,2])
# X3 follows a gamma distribution:
x$X[,3] <- qgamma(x$X[,3],shape=0.5)
# toy example
toy <- function(x){rowSums(x)}
y <- toy(x$X)
tell(x, y)
print(x)
plot(x)
I have non-zero mean and standard deviations for some input parameter that I want to sample out of a normal distribution. For others, I want to uniformly sample between a defined range (e.g. [0.03,0.07] instead [0,1]). I tried using built in R functions such as
SA$X[,1] <- rnorm(1000, mean = 579, sd = 21)
but I am afraid this procedure messes up the sampling design of the package and resulted in odd results for the sensitivity indices. Hence, I think I need to adhere for the uniform draw of the sobolroalhs function in which and use the sampled value between [0, 1] when drawing out of the desired distribution (I think as density draw?). Does this make sense to anyone and/or does anyone know how I could sample out of the right distributions following the syntax from the package description?
You can specify mean and sd in qnorm. So modify lines like this:
x$X[,2] <- qnorm(x$X[,2])
to something like this:
x$X[,2] <- qnorm(x$X[,2], mean = 579, sd = 21)
Similarly, you could use the min and max parameters of qunif to get values in a given range.
Of course, it's also possible to transform standard normals or uniforms to the ones you want using things like X <- 579 + 21*Z or Y <- 0.03 + 0.04*U, where Z is a standard normal and U is standard uniform, but for some distributions those transformations aren't so simple and using the q* functions can be easier.

Extract approximate probability density function (pdf) in R from random sampling

I have got n>2 independent continuous Random Variables(RV). For example say I have 4 Uniform RVs with different set of Upper and lowers.
W~U[-1,5], X~U[0,1], Y~[0,2], Z~[0.5,2]
I am trying to find out the approximate PDF for the sum of these RVs i.e. for T=W+X+Y+Z. As I don't need any closed form solution, I have sampled 1 million points for each of them to get 1 million samples for T. Is it possible in R to get the approximate PDF function or a way to get approximate probability of P(t<T)from this samples I have drawn. For example is there a easy way I can calculate P(0.5<T) in R. My priority here is to get probability first even if getting the density function is not possible.
Thanks
Consider the ecdf function:
set.seed(123)
W <- runif(1e6, -1, 5)
X <- runif(1e6, 0, 1)
Y <- runif(1e6, 0, 2)
Z <- runif(1e6, 0.5, 2)
T <- Reduce(`+`, list(W, X, Y, Z))
cdfT <- ecdf(T)
1 - cdfT(0.5) # Pr(T > 0.5)
# [1] 0.997589
See How to calculate cumulative distribution in R? for more details.

Generating random numbers in a specific interval

I want to generate some Weibull random numbers in a given interval. For example 20 random numbers from the Weibull distribution with shape 2 and scale 30 in the interval (0, 10).
rweibull function in R produce random numbers from a Weibull distribution with given shape and scale values. Can someone please suggest a method? Thank you in advance.
Use the distr package. It allows to do this kind of stuff very easily.
require(distr)
#we create the distribution
d<-Truncate(Weibull(shape=2,scale=30),lower=0,upper=10)
#The d object has four slots: d,r,p,q that correspond to the [drpq] prefix of standard R distributions
#This extracts 10 random numbers
d#r(10)
#get an histogram
hist(d#r(10000))
Using base R you can generate random numbers, filter which drop into target interval and generate some more if their quantity appears to be less than you need.
rweibull_interval <- function(n, shape, scale = 1, min = 0, max = 10) {
weib_rnd <- rweibull(10*n, shape, scale)
weib_rnd <- weib_rnd[weib_rnd > min & weib_rnd < max]
if (length(weib_rnd) < n)
return(c(weib_rnd, rweibull_interval(n - length(weib_rnd), shape, scale, min, max))) else
return(weib_rnd[1:n])
}
set.seed(1)
rweibull_interval(20, 2, 30, 0, 10)
[1] 9.308806 9.820195 7.156999 2.704469 7.795618 9.057581 6.013369 2.570710 8.430086 4.658973
[11] 2.715765 8.164236 3.676312 9.987181 9.969484 9.578524 7.220014 8.241863 5.951382 6.934886

Calculate probability of point on 2d density surface

If I calculate the 2d density surface of two vectors like in this example:
library(MASS)
a <- rnorm(1000)
b <- rnorm(1000, sd=2)
f1 <- kde2d(a, b, n = 100)
I get the following surface
filled.contour(f1)
The z-value is the estimated density.
My question now is: Is it possible to calculate the probability of a single point, e.g. a = 1, b = -4
[as I'm not a statistician this is maybe the wrong wording. Sorry for that. I would like to know - if this is possible at all - with which probability a point occurs.]
Thanks for every comment!
If you specify an area, then that area has a probability with respect to your density function. Of course a single point does not have a probability different from zero. But it does have a non-zero density at that point. What is that then?
The density is the limit of integral of that probability density integrated over the area divided by the normal area measure as the normal area measure goes to zero. (It was actual rather hard to state that correctly, needed a few tries and it is still not optimal).
All this is really basic calculus. It is also fairly easy to write a routine to calculate the integral of that density over the area, although I imagine MASS has standard ways to do it that use more sophisticated integration techniques. Here is a quick routine that I threw together based on your example:
library(MASS)
n <- 100
a <- rnorm(1000)
b <- rnorm(1000, sd=2)
f1 <- kde2d(a, b, n = 100)
lims <- c(min(a),max(a),min(b),max(b))
filled.contour(f1)
prob <- function(f,xmin,xmax,ymin,ymax,n,lims){
ixmin <- max( 1, n*(xmin-lims[1])/(lims[2]-lims[1]) )
ixmax <- min( n, n*(xmax-lims[1])/(lims[2]-lims[1]) )
iymin <- max( 1, n*(ymin-lims[3])/(lims[4]-lims[3]) )
iymax <- min( n, n*(ymax-lims[3])/(lims[4]-lims[3]) )
avg <- mean(f$z[ixmin:ixmax,iymin:iymax])
probval <- (xmax-xmin)*(ymax-ymin)*avg
return(probval)
}
prob(f1,0.5,1.5,-4.5,-3.5,n,lims)
# [1] 0.004788993
prob(f1,-1,1,-1,1,n,lims)
# [1] 0.2224353
prob(f1,-2,2,-2,2,n,lims)
# [1] 0.5916984
prob(f1,0,1,-1,1,n,lims)
# [1] 0.119455
prob(f1,1,2,-1,1,n,lims)
# [1] 0.05093696
prob(f1,-3,3,-3,3,n,lims)
# [1] 0.8080565
lims
# [1] -3.081773 4.767588 -5.496468 7.040882
Caveat, the routine seems right and is giving reasonable answers, but it has not undergone anywhere near the scrutiny I would give it for a production function.
The z-value here is a called a "probability density" rather than a "probability". As comments have pointed out, if you want an estimated probability you will need to integrate the estimated density to find the volume under your estimated surface.
However, if what you want is the probability density at a particular point, then you can use:
kde2d(a, b, n=1, lims=c(1, 1, -4, -4))$z[1,1]
# [1] 0.006056323
This will calculate a 1x1 "grid" with a single density estimate for the point you want.
A plot confirming that it worked:
z0 <- kde2d(a, b, n=1, lims=c(1, 1, -4, -4))$z[1,1]
filled.contour(
f1,
plot.axes = {
contour(f1, levels=z0, add=TRUE)
abline(v=1, lty=3)
abline(h=-4, lty=3)
axis(1); axis(2)
}
)

Resources