efficient sampling from trucnated gamma distribution in R - r

After searching in the forum, I did not find similar questions. If I missed it, please let me know. I would really appreciate.
I need to generate N (can be 10000 or more) sample points from gamma distribution wth given shape and scale parameters and lower/upper bound in R.
I know how to do it by "for loop" but, it is not efficient.
library(distr)
get_sample_gamma(shape, scale, lb, ub)
{
v <- rgamma(n = 10000, shape, scale)
# check the elements of v to be located [lb, ub]
# if not in the range, count the number of points in the range as M
# generate the remaining N - M points until all N points are got.
}
This is not efficient.
Any more efficient solutions would be apprecaited.

See R Programs for Truncated Distributions by Saralees Nadarajah and Samuel Kotz.
Using their code on page 4:
qtrunc <- function(p, spec, a = -Inf, b = Inf, ...) {
tt <- p
G <- get(paste("p", spec, sep = ""), mode = "function")
Gin <- get(paste("q", spec, sep = ""), mode = "function")
tt <- Gin(G(a, ...) + p*(G(b, ...) - G(a, ...)), ...)
return(tt)
}
rtrunc <- function(n, spec, a = -Inf, b = Inf, ...) {
x <- u <- runif(n, min = 0, max = 1)
x <- qtrunc(u, spec, a = a, b = b,...)
return(x)
}
Now v <- rtrunc(10000, "gamma", lb, ub, shape=shape, scale=scale) should do the job.

Related

Integrating under a curve in R

I apologise if this is a duplicate; I've read answers to similar questions to no avail.
I'm trying to integrate under a curve, given a specific formula (below) for said integration.
As a toy example, here's some data:
Antia_Model <- function(t,y,p1){
r <- p1[1]; k <- p1[2]; p <- p1[3]; o <- p1[4]
P <- y[1]; I <- y[2]
dP = r*P - k*P*I
dI = p*I*(P/(P + o))
list(c(dP,dI))
}
r <- 0.25; k <- 0.01; p <- 1; o <- 1000 # Note that r can range btw 0.1 and 10 in this model
parms <- c(r, k, p, o)
P0 <- 1; I0 <- 1
N0 <- c(P0, I0)
TT <- seq(0.1, 50, 0.1)
results <- lsoda(N0, TT, Antia_Model, parms, verbose = FALSE)
P <- results[,2]; I <- results[,3]
As I understand it, I should be able to use the auc() function from the MESS package (can I just use the integrate() function? Unclear...), which should look something like this:
auc(P, TT, from = x1, to = x2, type = "spline")
Though I don't really understand how to use the "from" and "to" arguments, or how to incorporate "u" from the original integration formula...
Using the integrate() function seems more intuitive, but if I try:
u <- 1
integrand <- function(P) {u*P}
q <- integrate(integrand, lower = 0, upper = Inf)
I get this error:
# Error in integrate(integrand, lower = 0, upper = Inf) :
# the integral is probably divergent
As you can tell, I'm pretty lost, so any help would be greatly appreciated! Thank you so much! :)
integrand is technically acceptable but right now, it's the identity function f(x) = x. The area under it from [0, inf) is infinite, i.e. divergent.
From the documentation of integrate the first argument is:
an R function taking a numeric first argument and returning a numeric vector of the same length. Returning a non-finite element will generate an error.
If instead you use a pulse function:
pulse <- function(x) {ifelse(x < 5 & x >= 0, 1, 0)}
integrate(pulse, lower = 0, upper = Inf)
#> 5 with absolute error < 8.5e-05

How to make faster approximation of data by sinusoids

I have a function that generates a sine wave
my.sin <- function(vec,a,f,p) a*sin(f*vec+p)
vec = vector indices
a = amplitude
f = frequency
p = phase
I also have some data my_var that I want to approximate with several sinusoids
set.seed(22)
my_var <- rnorm(100)
plot(my_var,t="l")
There is also a fitness function that calculates the approximation error of the sum of two sinusoids,but there can be any number of sinusoids
fit <- function(x,test=F){
vec <- 1:length(my_var)
s1 <- my.sin(vec = vec,a = x[1],f = x[2],p = x[3])
s2 <- my.sin(vec = vec,a = x[4],f = x[5],p = x[6])
res_sin <- s1+s2
err <- sqrt(sum((res_sin - my_var) ^ 2))
if(test) return(res_sin)
return( err/-1 )
}
Next, I use a genetic algorithm to find the best solution.
library(GA)
GA <- ga("real-valued",
fit,
lower = rep(-5,6),
upper = rep( 5,6),
maxiter = 300,
popSize = 300)
sol <- tail(GA#solution,1)[1,]
ga_sin <- fit(sol,test = T)
lines(ga_sin,col=4,lwd=2)
best_sin_comb <- matrix(sol,ncol = 3,byrow = T)
colnames(best_sin_comb) <- c("amplitude","frequency","phase")
print(best_sin_comb)
result
amplitude frequency phase
[1,] -0.3435402 1.5458888 1.8904578
[2,] -0.4326791 -0.4886035 0.5606401
My question is: can the approximation be made more efficient in terms of time spent. Perhaps a different search algorithm or something else ..
Also I would like to keep compatibility with the function my.sin

Vectorising density of mixture Gaussian distribution and integrating/plotting in R

I'm trying to write the density of a mixture Gaussian distribution to an arbitrary power, b, in R. Currently, I have two methods that works, but I prefer if I could avoid a for loop.
dnorm_mix_tempered_unnorm <- function(x, w, m, s, b) {
value <- 0
for (i in 1:length(w)) {value <- value + w[i]*dnorm(x, mean = m[i], sd = s[i])}
value <- value^(b)
return(value)
}
Alternatively, I can vectorise this to avoid the for loop:
dnorm_mix_tempered_unnorm <- function(x, w, m, s, b) {
return(sum(w*dnorm(x, mean = m, sd = s))^b)
}
Both of these give the same result, but the second is more efficient since it is vectorised. But I need to next normalise this so that the density integrates to 1, I do this by using:
dnorm_mix_tempered <- function(x, weights, means, sds, beta) {
norm_constant <- integrate(function(x) dnorm_mix_tempered_unnorm(x, w = weights,
m = means, s = sds, b = 1/beta), lower = -Inf,
upper = Inf)$value
value <- dnorm_mix_tempered_unnorm(x, w = weights, m = means, s = sds, b = 1/beta)
/ norm_constant
return(value)
}
If I define dnorm_mix_tempered_unnorm with for loops, this works with no problem, and I can use curve() to plot the density. But if I define dnorm_mix_tempered_unnorm by using vectorisation, then I get the following error:
Error in integrate(function(x) dnorm_mix_tempered_unnorm(x, w = weights, :
evaluation of function gave a result of wrong length
Does anyone know what is going on when I am vectorising instead and trying to integrate?
Thanks in advance,
R.
A possible option is
dnorm_mix_tempered_unnorm <- function(x, w, m, s, b) {
return(rowSums(mapply(dnorm, mean = m, sd = m, MoreArgs = list(x = x)))^b)
}
But I think it is quite similar to your first proposal.

Best way to solve an integral including a nonparametric density and distribution

Suppose that I want to solve a function containing two integrals like (this is an example, the actual function is uglier)
where a and b are the boundaries, c and d are known parameters and f(x) and F(x) are the density and distribution of the random variable x. In my problem f(x) and F(x) are nonparametrically found, so that I know their values only for certain specific values of x. How would you set the integral?
I did:
# Create the data
val <- runif(300, min=1, max = 10) #use the uniform distribution
CDF <- (val - 1)/(10 - 1)
pdf <- 1 / (10 - 1)
data <- data.frame(val = val, CDF = CDF, pdf = pdf)
c = 2
d = 1
# Inner integral
integrand1 <- function(x) {
i <- which.min(abs(x - data$val))
FF <- data$CDF[i]
ff <- data$pdf[i]
(1 - FF)^(c/d) * ff
}
# Vectorize the inner integral
Integrand1 <- Vectorize(integrand1)
# Outer integral
integrand2 <- function(x){
i <- which.min(abs(x - data$val))
FF <- data$CDF[i]
ff <- data$pdf[i]
(quadgk(Integrand1, x, 10) / FF) * c * ff
}
# Vectorize the outer integral
Integrand2 <- Vectorize(integrand2)
# Solve
require(pracma)
quadgk(Integrand2, 1, 10)
The integral is extremely slow. Is there a better way to solve this? Thank you.
---------EDIT---------
In my problem the pdf and CDF are computed from a vector of values v as follows:
# Create the original data
v <- runif(300, min = 1, max = 10)
require(np)
# Compute the CDF and pdf
v.CDF.bw <- npudistbw(dat = v, bandwidth.compute = TRUE, ckertype = "gaussian")
v.pdf.bw <- npudensbw(dat = v, bandwidth.compute = TRUE, ckertype = "gaussian")
# Extend v on a grid (I add this step because the v vector in my data
# is not very large. In this way I approximate the estimated pdf and CDF
# on a grid)
val <- seq(from = min(v), to = max(v), length.out = 1000)
data <- data.frame(val)
CDF <- npudist(bws = v.CDF.bw, newdata = data$val, edat = data )
pdf <- npudens(bws = v.pdf.bw, newdata = data$val, edat = data )
data$CDF <- CDF$dist
data$pdf <- pdf$dens
Have you considered using approxfun?
It takes vectors x and y and gives you a function that linearly interpolates between those. So for example, try
x <- runif(1000)+runif(1000)+2*(runif(1000)^2)
dx <- density(x)
fa <- approxfun(dx$x,dx$y)
curve(fa,0,2)
fa(0.4)
You should be able to call it using your gridded evaluations. It may be faster than what you're doing (as well as more accurate)
(edit: yes, as you say, splinefun should be fine if its fast enough for your needs)

Solve an intergral in R: the integrand is a function of the solution of an ODE

I would like to compute an integral where the integrand is a function of the solution of an ODE.
In order to solve the integral, R needs to solve an ODE for each value the integration algorithm uses. This is what I have done so far:
require(deSolve)
# Function to be passed to zvode in order to solve the ODE
ODESR <- function(t, state, parameters) {
with(as.list(c(state, parameters)),{
dPSI <- -kappa*PSI+0.5*sigma^2*PSI^2
dPHI <- kappa*theta*PSI
return(list(c(dPSI, dPHI)))
})
}
# For a given value of p this code should return the solution of the integral
pdfSRP <- function (p) {
integrand <- function (u) {
state <- c(PSI = u*1i, PHI = 0)
out <- as.complex(zvode(y = state, times = times, parms = parameters, fun = ODESR)[2, 2:3])
Re(exp(out[2] + out[1]*x)*exp(-u*1i*p))
}
integrate(f = integrand, lower = -Inf, upper = Inf)$value/(2*pi)
}
For the following given values:
parameters <- c(kappa = 1, theta = 0.035, sigma = 0.05)
times <- c(0,1)
x <- 0.1
running:
pdfSRP(p = 2)
produces the following error:
Error in eval(expr, envir, enclos) : object 'PSI' not found
I just cannot figure out why. I'm quite sure it is due to a syntax error, because running:
integrand <- function (u) {
state <- c(PSI = u*1i, PHI = 0)
out <- as.complex(zvode(y = state, times = times, parms = parameters, fun = ODESR)[2, 2:3])
Re(exp(out[2] + out[1]*x)*exp(-u*1i*p))
}
with p <- 2 and (for example) u <- 3 works.
Can you help me spot the mistake?
It seems to be a vectorization problem in the integrand input u. If I understand correctly, PSI should be a number for each calculation and not a vector of numbers (which will give a dimensional problem between PSI and PHI. Hence
integrand <- Vectorize(integrand)
should resolve your issue. From ?integrate:
f must accept a vector of inputs and produce a vector of function evaluations at those points.
However, this leads to a different error.
pdfSRP(p = 2)
## Error in integrate(f = integrand, lower = -Inf, upper = Inf) :
## the integral is probably divergent
If we plot the integrand, we may spot the divergence problem
p <- 2
par(mfrow = c(1,2))
curve(integrand,-1e3,1e3,n = 100)
curve(integrand,-1e3,1e3,n = 1e3)
Assuming the integrand converges sufficiently fast to zero in both tails, the divergence of the integral could be a result from numerical imprecision. We can increase precision by increasing the number of subintervals for the integral, which does give a result - I suppose, as expected by heuristically looking at the plot.
pdfSRP <- function (p) {
int <- integrate(f = integrand, lower = -Inf, upper = Inf,
subdivisions = 1e3)
int$value/(2*pi)
}
## [1] 2.482281e-06

Resources