Integrating under a curve in R

Integrating under a curve in R - r

I apologise if this is a duplicate; I've read answers to similar questions to no avail.
I'm trying to integrate under a curve, given a specific formula (below) for said integration.
As a toy example, here's some data:
Antia_Model <- function(t,y,p1){
r <- p1[1]; k <- p1[2]; p <- p1[3]; o <- p1[4]
P <- y[1]; I <- y[2]
dP = r*P - k*P*I
dI = p*I*(P/(P + o))
list(c(dP,dI))
}
r <- 0.25; k <- 0.01; p <- 1; o <- 1000 # Note that r can range btw 0.1 and 10 in this model
parms <- c(r, k, p, o)
P0 <- 1; I0 <- 1
N0 <- c(P0, I0)
TT <- seq(0.1, 50, 0.1)
results <- lsoda(N0, TT, Antia_Model, parms, verbose = FALSE)
P <- results[,2]; I <- results[,3]
As I understand it, I should be able to use the auc() function from the MESS package (can I just use the integrate() function? Unclear...), which should look something like this:
auc(P, TT, from = x1, to = x2, type = "spline")
Though I don't really understand how to use the "from" and "to" arguments, or how to incorporate "u" from the original integration formula...
Using the integrate() function seems more intuitive, but if I try:
u <- 1
integrand <- function(P) {u*P}
q <- integrate(integrand, lower = 0, upper = Inf)
I get this error:
# Error in integrate(integrand, lower = 0, upper = Inf) :
# the integral is probably divergent
As you can tell, I'm pretty lost, so any help would be greatly appreciated! Thank you so much! :)

integrand is technically acceptable but right now, it's the identity function f(x) = x. The area under it from [0, inf) is infinite, i.e. divergent.
From the documentation of integrate the first argument is:
an R function taking a numeric first argument and returning a numeric vector of the same length. Returning a non-finite element will generate an error.
If instead you use a pulse function:
pulse <- function(x) {ifelse(x < 5 & x >= 0, 1, 0)}
integrate(pulse, lower = 0, upper = Inf)
#> 5 with absolute error < 8.5e-05

Related

Why can't I use uniroot with plot?

I am working on code that uses the uniroot function to approximate the root of an equation. I am trying to plot the behaviour of the function being passed through uniroot as the value of a free variable changes:
library(Deriv)
f1 <- function(s) {
(1 - 2*s)^(-3/2)*exp((8*s)/(1-2*s))
}
f2 <- function(s) {
log(f1(s))
}
f3 <- Deriv(f2, 's')
f4 <- Deriv(f3, 's')
f5 <- Deriv(f4, 's')
upp_s <- 1/2 - 1e-20
f_est <- function(x) {
f3a <- function(s) {f3(s = s) - x}
s_ <- uniroot(f3a,
lower = -9,
upper = upp_s)$root
return(s_)
}
plot(f_est, from = 0, to=100, col="red", main="header")
The output of f_est works as expected. However, when passed through the plot function, uniroot seems to break:
> plot(f_est, from = 0, to=100, col="red", main="header")
Error in uniroot(f3a, lower = -9, upper = upp_s) :
f() values at end points not of opposite sign
In addition: Warning messages:
1: In if (is.na(f.lower)) stop("f.lower = f(lower) is NA") :
the condition has length > 1 and only the first element will be used
2: In if (is.na(f.upper)) stop("f.upper = f(upper) is NA") :
Error in uniroot(f3a, lower = -9, upper = upp_s) :
f() values at end points not of opposite sign
The function is set up such that the endpoints specified in uniroot are always of opposite sign, and that there is always exactly one real root. I have also checked to confirm that the endpoints are non-missing when f_est is run by itself. I've tried vectorising the functions involved to no avail.
Why is this happening?

I was able to get most of the way there with
upp_s <- 0.497
plot(Vectorize(f_est), from = 0.2, to = 100)
Not only is 1/2 - epsilon exactly equal to 1/2 for values of epsilon that are too small (due to floating point error), I found that f3() gives NaN for values >= 0.498. Setting upp_s to 0.497 worked OK.
plot() applied to a function calls curve(), which needs a function that can take a vector of x values.
The curve broke with "f() values at end points not of opposite sign" if I started the curve from 0.1; I didn't dig in further and try to diagnose what was going wrong.
PS. It is generally more numerically stable and efficient to do computations directly on the log scale where possible. In this case, that means using
f2 <- function(s) { (-3/2)*log(1-2*s) + (8*s)/(1-2*s) }
instead of
f1 <- function(s) {
(1 - 2*s)^(-3/2)*exp((8*s)/(1-2*s))
}
f2_orig <- function(s) {
log(f1(s))
}
## check
all.equal(f2(0.25), f2_orig(0.25)) ## TRUE
Doing this and setting the lower bound of uniroot() to -500 lets us get pretty close to the zero boundary (although it looks both analytically and computationally as though the function diverges to -∞ as x goes to 0).
f3 <- Deriv(f2, 's')
upp_s <- 1/2 - 1e-10
lwr_a <- -500
f_est <- function(x) {
f3a <- function(s) { f3(s = s) - x}
s_ <- uniroot(f3a,
lower = lwr_a,
upper = upp_s)$root
return(s_)
}
plot(Vectorize(f_est), from = 0.005, to = 100, log = "x")
You can also solve this analytically, or ask caracas (an R interface to sympy) to do it for you:
library(caracas)
x <- symbol("x"); s <- symbol("s")
## peek at f3() guts to find the expression for the derivative;
## could also do the whole thing in caracas/sympy
solve_sys((11 +16*(s/(1-s*2)))/(1-s*2), x, list(s))
sol <- function(x) { (2*x - sqrt(32*x + 9) -3)/(4*x) }
curve(sol, add = TRUE, col = 2)

Vectorising density of mixture Gaussian distribution and integrating/plotting in R

I'm trying to write the density of a mixture Gaussian distribution to an arbitrary power, b, in R. Currently, I have two methods that works, but I prefer if I could avoid a for loop.
dnorm_mix_tempered_unnorm <- function(x, w, m, s, b) {
value <- 0
for (i in 1:length(w)) {value <- value + w[i]*dnorm(x, mean = m[i], sd = s[i])}
value <- value^(b)
return(value)
}
Alternatively, I can vectorise this to avoid the for loop:
dnorm_mix_tempered_unnorm <- function(x, w, m, s, b) {
return(sum(w*dnorm(x, mean = m, sd = s))^b)
}
Both of these give the same result, but the second is more efficient since it is vectorised. But I need to next normalise this so that the density integrates to 1, I do this by using:
dnorm_mix_tempered <- function(x, weights, means, sds, beta) {
norm_constant <- integrate(function(x) dnorm_mix_tempered_unnorm(x, w = weights,
m = means, s = sds, b = 1/beta), lower = -Inf,
upper = Inf)$value
value <- dnorm_mix_tempered_unnorm(x, w = weights, m = means, s = sds, b = 1/beta)
/ norm_constant
return(value)
}
If I define dnorm_mix_tempered_unnorm with for loops, this works with no problem, and I can use curve() to plot the density. But if I define dnorm_mix_tempered_unnorm by using vectorisation, then I get the following error:
Error in integrate(function(x) dnorm_mix_tempered_unnorm(x, w = weights, :
evaluation of function gave a result of wrong length
Does anyone know what is going on when I am vectorising instead and trying to integrate?
Thanks in advance,
R.

A possible option is
dnorm_mix_tempered_unnorm <- function(x, w, m, s, b) {
return(rowSums(mapply(dnorm, mean = m, sd = m, MoreArgs = list(x = x)))^b)
}
But I think it is quite similar to your first proposal.

Solve an intergral in R: the integrand is a function of the solution of an ODE

I would like to compute an integral where the integrand is a function of the solution of an ODE.
In order to solve the integral, R needs to solve an ODE for each value the integration algorithm uses. This is what I have done so far:
require(deSolve)
# Function to be passed to zvode in order to solve the ODE
ODESR <- function(t, state, parameters) {
with(as.list(c(state, parameters)),{
dPSI <- -kappa*PSI+0.5*sigma^2*PSI^2
dPHI <- kappa*theta*PSI
return(list(c(dPSI, dPHI)))
})
}
# For a given value of p this code should return the solution of the integral
pdfSRP <- function (p) {
integrand <- function (u) {
state <- c(PSI = u*1i, PHI = 0)
out <- as.complex(zvode(y = state, times = times, parms = parameters, fun = ODESR)[2, 2:3])
Re(exp(out[2] + out[1]*x)*exp(-u*1i*p))
}
integrate(f = integrand, lower = -Inf, upper = Inf)$value/(2*pi)
}
For the following given values:
parameters <- c(kappa = 1, theta = 0.035, sigma = 0.05)
times <- c(0,1)
x <- 0.1
running:
pdfSRP(p = 2)
produces the following error:
Error in eval(expr, envir, enclos) : object 'PSI' not found
I just cannot figure out why. I'm quite sure it is due to a syntax error, because running:
integrand <- function (u) {
state <- c(PSI = u*1i, PHI = 0)
out <- as.complex(zvode(y = state, times = times, parms = parameters, fun = ODESR)[2, 2:3])
Re(exp(out[2] + out[1]*x)*exp(-u*1i*p))
}
with p <- 2 and (for example) u <- 3 works.
Can you help me spot the mistake?

It seems to be a vectorization problem in the integrand input u. If I understand correctly, PSI should be a number for each calculation and not a vector of numbers (which will give a dimensional problem between PSI and PHI. Hence
integrand <- Vectorize(integrand)
should resolve your issue. From ?integrate:
f must accept a vector of inputs and produce a vector of function evaluations at those points.
However, this leads to a different error.
pdfSRP(p = 2)
## Error in integrate(f = integrand, lower = -Inf, upper = Inf) :
## the integral is probably divergent
If we plot the integrand, we may spot the divergence problem
p <- 2
par(mfrow = c(1,2))
curve(integrand,-1e3,1e3,n = 100)
curve(integrand,-1e3,1e3,n = 1e3)
Assuming the integrand converges sufficiently fast to zero in both tails, the divergence of the integral could be a result from numerical imprecision. We can increase precision by increasing the number of subintervals for the integral, which does give a result - I suppose, as expected by heuristically looking at the plot.
pdfSRP <- function (p) {
int <- integrate(f = integrand, lower = -Inf, upper = Inf,
subdivisions = 1e3)
int$value/(2*pi)
}
## [1] 2.482281e-06

efficient sampling from trucnated gamma distribution in R

After searching in the forum, I did not find similar questions. If I missed it, please let me know. I would really appreciate.
I need to generate N (can be 10000 or more) sample points from gamma distribution wth given shape and scale parameters and lower/upper bound in R.
I know how to do it by "for loop" but, it is not efficient.
library(distr)
get_sample_gamma(shape, scale, lb, ub)
{
v <- rgamma(n = 10000, shape, scale)
# check the elements of v to be located [lb, ub]
# if not in the range, count the number of points in the range as M
# generate the remaining N - M points until all N points are got.
}
This is not efficient.
Any more efficient solutions would be apprecaited.

See R Programs for Truncated Distributions by Saralees Nadarajah and Samuel Kotz.
Using their code on page 4:
qtrunc <- function(p, spec, a = -Inf, b = Inf, ...) {
tt <- p
G <- get(paste("p", spec, sep = ""), mode = "function")
Gin <- get(paste("q", spec, sep = ""), mode = "function")
tt <- Gin(G(a, ...) + p*(G(b, ...) - G(a, ...)), ...)
return(tt)
}
rtrunc <- function(n, spec, a = -Inf, b = Inf, ...) {
x <- u <- runif(n, min = 0, max = 1)
x <- qtrunc(u, spec, a = a, b = b,...)
return(x)
}
Now v <- rtrunc(10000, "gamma", lb, ub, shape=shape, scale=scale) should do the job.

fminsearch in R is worse than in Matlab

There is my data (x and y columns are relevant):
https://www.dropbox.com/s/b61a7enhoa0p57p/Simple1.csv
What I need is to fit the data with the polyline. Matlab code that does this is:
spline_fit.m:
function [score, params] = spline_fit (points, x, y)
min_f = min(x)-1;
max_f = max(x);
points = [min_f points max_f];
params = zeros(length(points)-1, 2);
score = 0;
for i = 1:length(points)-1
in = (x > points(i)) & (x <= points(i+1));
if sum(in) > 2
p = polyfit(x(in), y(in), 1);
pred = p(1)*x(in) + p(2);
score = score + norm(pred - y(in));
params(i, :) = p;
else
params(i, :) = nan;
end
end
test.m:
%Find the parameters
r = [100,250,400];
p = fminsearch('spline_fit', r, [], x, y)
[score, param] = spline_fit(p, x, y)
%Plot the result
y1 = zeros(size(x));
p1 = [-inf, p, inf];
for i = 1:size(param, 1)
in = (x > p1(i)) & (x <= p1(i+1));
y1(in) = x(in)*param(i,1) + param(i,2);
end
[x1, I] = sort(x);
y1 = y1(I);
plot(x,y,'x',x1,y1,'k','LineWidth', 2)
And this does work fine, producing following optimization: [102.9842, 191.0006, 421.9912]
I've implemented the same idea in R:
library(pracma);
spline_fit <- function(x, xx, yy) {
min_f = min(xx)-1;
max_f = max(xx);
points = c(min_f, x, max_f)
params = array(0, c(length(points)-1, 2));
score = 0;
for( i in 1:length(points)-1)
{
inn <- (xx > points[i]) & (xx <= points[i+1]);
if (sum(inn) > 2)
{
p <- polyfit(xx[inn], yy[inn], 1);
pred <- p[1]*xx[inn] + p[2];
score <- score + norm(as.matrix(pred - yy[inn]),"F");
params[i,] <- p;
}
else
params[i,] <- NA;
}
score
}
But I get very bad results:
> fminsearch(spline_fit,c(100,250,400), xx = Simple1$x, yy = Simple1$y)
$xval
[1] 100.1667 250.0000 400.0000
$fval
[1] 4452.761
$niter
[1] 2
As you can see, it stops after 2 iterations and doesn't produce good points.
I'll be very glad for any help in resolving this issue.
Also, if anyone knows how to implement this in C# using any free library, it will be even better. I know whereto get polyfit, but not fminsearch.

The problem here is that the likelihood surface is very badly behaved -- there are both multiple minima and discontinuous jumps -- which will make the results you get with different optimizers almost arbitrary. I will admit that MATLAB's optimizers are remarkably robust, but I would say that it's pretty much a matter of chance (and where you start) whether an optimizer will get to the global minimum for this case, unless you use some form of stochastic global optimization such as simulated annealing.
I chose to use R's built-in optimizer (which uses Nelder-Mead by default) rather than fminsearch from the pracma package.
spline_fit <- function(x, xx = Simple1$x, yy=Simple1$y) {
min_f = min(xx)-1
max_f = max(xx)
points = c(min_f, x, max_f)
params = array(0, c(length(points)-1, 2))
score = 0
for( i in 1:(length(points)-1))
{
inn <- (xx > points[i]) & (xx <= points[i+1]);
if (sum(inn) > 2)
{
p <- polyfit(xx[inn], yy[inn], 1);
pred <- p[1]*xx[inn] + p[2];
score <- score + norm(as.matrix(pred - yy[inn]),"F");
params[i,] <- p;
}
else
params[i,] <- NA;
}
score
}
library(pracma) ## for polyfit
Simple1 <- read.csv("Simple1.csv")
opt1 <- optim(fn=spline_fit,c(100,250,400), xx = Simple1$x, yy = Simple1$y)
## [1] 102.4365 201.5835 422.2503
This is better than the fminsearch results, but still different from the MATLAB results, and worse than them:
## Matlab results:
matlab_fit <- c(102.9842, 191.0006, 421.9912)
spline_fit(matlab_fit, xx = Simple1$x, yy = Simple1$y)
## 3724.3
opt1$val
## 3755.5 (worse)
The bbmle package offers an experimental/not very well documented set of tools for exploring optimization surfaces:
library(bbmle)
ss <- slice2D(fun=spline_fit,opt1$par,nt=51)
library(lattice)
A 2D "slice" around the optim-estimated parameters. The circles show the optim fit (solid) and the minimum value within each slice (open).
png("splom1.png")
print(splom(ss))
dev.off()
A 'slice' between the matlab and optim fits shows that the surface is quite rugged:
ss2 <- bbmle:::slicetrans(matlab_fit,opt1$par,spline_fit)
png("slice1.png")
print(plot(ss2))
dev.off()