R match.fun does not see function from imported package - r

I've imported package called "stabledist". It includes function "rstable"
When I do this
my_fun <- function(function_from_library)
{
function_from_library <- match.fun(function_from_library)
print(some_data <- function_from_library)
}
my_fun (5, rstable(5, alpha = 1.7, beta = 0, gamma = 1.0, delta = 1.0))
I get error:" Error in match.fun(some_distr) : 'rstable(5, alpha = 1.7, beta = 0, gamma = 1, delta = 1)' is not a function, character or symbol "
Everything works fine, whrn match.fun is deleted. Is there anyway to import library that it can be visible to others? Or I can just skip match.fun?

This is how I implemented my suggestion:
library(stabledist )
my_fun <- function(function_from_library, ...)
{
function_from_library <- match.fun(function_from_library)
print(some_data <- function_from_library(...))
}
my_fun ( rstable, 5, alpha = 1.7, beta = 0, gamma = 1.0, delta = 1.0)
#[1] 1.4600308688 -0.0004999279 1.9301805374 -1.3276383194 0.9137183709
It does require also knowing how to use the ellipsis-mechanism for passing lists of arbitrary length to functions as Roland had additionally commented. The print mechanism will not actually create a data-vector of values. To do that you would need to assign ("<-") the result "outside" the function body (and so the print() call is not needed either).
library(stabledist )
my_fun <- function(function_from_library, ...)
{
function_from_library <- match.fun(function_from_library)
function_from_library(...) }
some_data <- my_fun ( rstable, 5, alpha = 1.7, beta = 0, gamma = 1.0, delta = 1.0)
some_data
# 5 random values are printed at console.

Related

How to create a function within function

I have a problem in creating my function to impute my generated missing values. I have a generating data function, generating missing values function and imputing missing values function. But how can I combine them into one function?
# generate data
data <- function (n,alpha,kappa,miu){
X = rvm(n,alpha,kappa)
delta = rvm(n, 0, kappa)
epsilon = rvm(n, 0, kappa)
x = (X + delta)%%(2*pi)
Y = (alpha + X)%%(2*pi)
y = (Y + epsilon)%%(2*pi)
sample = cbind(x,y)
return(sample)
}
#generate missing values
misVal <- ampute(data=data(10,0.7854,5,0),prop=0.25,bycases=FALSE)
#impute missing values
impData <- mice(misVal,m=5,maxit=50,meth='pmm',seed=500)
summary(impData)
Combine all three functions into one large custom function...
nameYourFunction <- function(n,alpha,kappa,miu){
X = rvm(n,alpha,kappa)
delta = rvm(n, 0, kappa)
epsilon = rvm(n, 0, kappa)
x = (X + delta)%%(2*pi)
Y = (alpha + X)%%(2*pi)
y = (Y + epsilon)%%(2*pi)
sample = cbind(x,y)
#generate missing values
misVal <- ampute(data=sample,prop=0.25,bycases=FALSE)
#impute missing values
impData <- mice(misVal,m=5,maxit=50,meth='pmm',seed=500)
return(impData)
}
Then to run...
final_data <- nameYourFunction(n = 10, alpha = 0.7854, kappa = 5, miu = 0)
summary(final_data)
Obviously you may want to rename the function based on your own preferences.
If you wanted something more flexible, like to be able to easily supply arguments for the other function called within nameYourFunction, then you would add them to the list of arguments provided in the first line of code. So it might end up looking more like...
nameYourFunction <- function(n,alpha,kappa,miu,prop,m,maxit,meth,seed){...}
Then supplying those values to the function call like...
final_data <- nameYourFunction(n = 10, alpha = 0.7854, kappa = 5, miu = 0, prop = 0.25, m = 5, maxit = 50, meth = 'pmm', seed = 500)
And removing the hard coded values from within the custom function. I would probably recommend against this though as that is a lot of arguments to keep track of!

R: Numerical integration returns a wrong result for a smooth function but does not fail completely

I got a very unlikely, but a very dangerous numerical error while integrating thousands of sufficiently well-behaved functions in R using the built-in integrate function.
Story (can be skipped). My problem is connected with maximum likelihood and is based on a highly non-linear function (of 10–20 parameters) for which the analytical expression does not exist, so it requires computing thousands of integrals for one evaluation. I have produced the MWE that contained this error. For the optimisation of this function, due to multiple local optima, I am trying 1000 points for 1000 iterations (with derivative-free methods like particle swarm from hydroPSO and differential evolution from DEoptim), so just for one model, I have to compute more than a billion integrals (!), and there are 200 candidate models, each of which requires later hot-start reëstimation, so the total number of integrals is way over trillion. I would like to find the fastest solution that gives sufficient accuracy.
The function is a product of two density functions (gamma or similar) times some positive expression, and the joint density is being computed according to the formula f_{X+Y}(z) = int_{supp Y} f_{X+Y}(z-y, y) dy. I cannot use convolution because X and Y are not independent in the general case. The support of Y in my case is (-Inf, 0]. The scale parameter of the distribution is very small (the model is GARCH-like), so very often, the standard integration routine would fail to integrate a function that is non-zero on a very small section of the negative line (like [-0.02, -0.01] where it takes huge values and 0 everywhere else where it is trying hard to compute the quadrature), and R’s integrate would often return the machine epsilon because is could not find points in that range where the function took values much greater than zero. In order to combat this problem, I stretch the function around the zero by an inverse of the scale parameter, compute the integral, and then divide it by the scale, i. e. integrate(f(x/scale)/scale$value). However, sometimes, this re-scaling also failed, so I implemented a safety check to see if the value of the scaled function is suspiciously low (i. e. <1e-8), and then recompute the integral. The rescaled function was working like a charm, returning nice values where the non-scaled one failed, and in the rare cases the rescaled function returned a machine epsilon, the non-rescaled one worked.
Until today when integration of the rescaled function suddenly yielded a value of 1.5 instead of 3.5. Of course the function passed the safety check (because this is a plausible value, not machine epsilon, and some other values were less than this, so it was in the common range). It turned out, roughly in 0.1% of all cases, integrate under-estimated the function. The MWE is below.
First, we define the function of x and an optional parameter numstab that defines the scaling.
cons <- -0.020374721416129591
sc <- 0.00271245601724757383
sh <- 5.704
f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh, scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab
Next, we plot it to make sure that the scaling works correctly.
curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n")
curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n")
And then we check this integral by summation:
sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 # True value, 3.575294
sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294
str(integrate(f, -Inf, 0)) # Gives 3.575294
# $ value : num 3.58
# $ abs.error : num 1.71e-06
# $ subdivisions: int 10
str(integrate(f, -Inf, 0, numstab = sc))
# $ value : num 1.5 # WTF?!
# $ abs.error : num 0.000145 # WTF?!
# $ subdivisions: int 2
It stop at just two subdivisions! Now, in order to see what is going on during the integration, we create a global object and update it every time the integration routine does something.
global.eval.f <- list()
f.trace <- function(x, numstab = 1) {
this.f <- f(x, numstab)
global.eval.f[[length(global.eval.f) + 1]] <<- list(x = x, f = this.f)
return(this.f)
}
integrate(f.trace, -Inf, 0)
Now, we visualise this integration process.
library(animation)
l <- length(global.eval.f)
mycols <- rainbow(l, end = 0.72, v = 0.8)
saveGIF({
for (i in 1:l) {
par(mar = c(4, 4, 2, 0.3))
plot(xgrid <- seq(-0.1, -0.01, length.out = 301), f(xgrid), type = "l", bty = "n", xlab = "x", ylab = "f(x)", main = "Function without stabilisation")
for (j in 1:(l2 <- length(this.x <- global.eval.f[[i]]$x))) lines(rep(this.x[j], 2), c(0, global.eval.f[[i]]$f[j]), col = mycols[i], type = "b", pch = 16, cex = 0.6)
legend("topleft", paste0("Quadrature: ", i), bty = "n")
text(rep(-0.1, l2), seq(325, 25, length.out = l2), labels = formatC(sort(this.x), format = "e", digits = 2), adj = 0, col = ifelse(sort(this.x) > -0.1 & sort(this.x) < -0.01, mycols[i], "black"), cex = 0.9)
}
}, movie.name = "stab-off-quad.gif", interval = 1 / 3, ani.width = 400, ani.height = 300)
And the same thing, but on a different scale.
global.eval.f <- list()
integrate(f.trace, -Inf, 0, numstab = sc)
l <- length(global.eval.f)
mycols <- rainbow(l, end = 0.7, v = 0.8)
saveGIF({
for (i in 1:l) {
par(mar = c(4, 4, 2, 0.3))
plot(xgrid <- seq(-0.1 / sc, -0.01 / sc, length.out = 301), f(xgrid, sc), type = "l", bty = "n", xlab = "x", ylab = "f(x)", main = "Function with stabilisation")
for (j in 1:(l2 <- length(this.x <- global.eval.f[[i]]$x))) lines(rep(this.x[j], 2), c(0, global.eval.f[[i]]$f[j]), col = mycols[i], type = "b", pch = 16, cex = 0.6)
legend("topleft", paste0("Quadrature: ", i), bty = "n")
text(rep(-0.1 / sc, l2), seq(325 * sc, 25 * sc, length.out = l2), labels = formatC(sort(this.x), format = "e", digits = 2), adj = 0, col = ifelse(sort(this.x) > -0.1 / sc & sort(this.x) < -0.01 / sc, mycols[i], "black"), cex = 0.9)
}
}, movie.name = "stab-on-quad.gif", interval = 1 / 3, ani.width = 400, ani.height = 300)
The problem is, I cannot try various stabilising multipliers for the function because I have to compute this integral a trillion times, so even in the super-computer cluster, this takes weeks. Besides that, reducing the rel.tol just to 1e-5 helped a bit, but I am not sure whether this guarantees success (and reducing it to 1e-7 slowed down the computations in some cases). And I have looked at the Fortran code of the quadrature just to see the integration rule.
The timings can be seen below (I added an extra attempt with a lower tolerance).
How can I make sure that the integration routine will not produce such wrong results for such a function, and the integration will still be fast?

Passing arguments and default arguments in nested functions

I have a function, x_pdf, that is supposed to calculate x*dfun(x|params) where dfun is a probability density function and params is a list of named parameters. It is defined inside of another function, int_pdf, that is supposed to integrate x_pdf between specified bounds:
int_pdf <- function(lb = 0, ub = Inf, dfun, params){
x_pdf <- function(X, dfun, params){X * do.call(function(X){dfun(x=X)}, params)}
out <- integrate(f = x_pdf, lower=lb, upper=ub, subdivisions = 100L)
out
}
Note that, given my defaults for the lower and upper bound of integration, I expect that when the function is run with only the params specified it will return the mean of x.
I have a second function, int_gb2, that is a wrapper for int_pdf intended to specialize it to the generalized beta disttribution of the second kind.
library(GB2)
int_gb2 <- function(lb = 0, ub = Inf, params){
int_pdf(lb, ub, dfun = dgb2, params = get("params"))
}
When I run the function as follows:
GB2_params <- list(shape1 = 3.652, scale = 65797, shape2 = 0.3, shape3 = 0.8356)
int_gb2(params = GB2_params)
I get:
Error in do.call(what = function(X) { :
argument "params" is missing, with no default
I have spent multiple hours tweaking this, and I have nanaged to generate tome alternative error messages, but always with respect to a missing x, X, or params.
There seems to be two problems here, both related to passing arguments: in the first one there are too many arguments being passed, and in the second one, too few.
First off, in your x_pdf definition, you use an anonymous function that takes a single argument (function(X){dfun(x=X)}), but you also try to pass additional arguments (the params list) to said anonymous function with do.call, which will throw an error. That part should instead look something like this:
do.call(dfun, c(list(x = X), params))
Now, you've defined x_pdf to require 3 arguments: X, dfun, and params; but when you call x_pdf with integrate you're not passing the dfun and params arguments, which again will throw an error. You could get around that by passing dfun and params, too:
integrate(f = x_pdf, lower=lb, upper=ub, subdivisions = 100L, dfun, params)
But perhaps a neater solution would be to just remove the additional arguments from the definition of x_pdf (since dfun and params are already defined in the enclosing environment), for a more compact result:
int_pdf <- function(lb = 0, ub = Inf, dfun, params){
x_pdf <- function(X) X * do.call(dfun, c(list(x = X), params))
integrate(f = x_pdf, lower = lb, upper = ub, subdivisions = 100L)
}
With this definition of int_pdf, everything should work as you expect:
GB2_params <- list(shape1 = 3.652, scale = 65797, shape2 = 0.3, shape3 = 0.8356)
int_gb2(params = GB2_params)
#> Error in integrate(f = x_pdf, lower = lb, upper = ub, subdivisions = 100L):
#> the integral is probably divergent
Oh. Are the example parameters missing a decimal point from the scale argument?
GB2_params$scale <- 6.5797
int_gb2(params = GB2_params)
#> 4.800761 with absolute error < 0.00015
Extra bits
We could also use some functional programming to create a function factory to make it easy to create functions for finding moments other than the first one:
moment_finder <- function(n, c = 0) {
function(f, lb = -Inf, ub = Inf, params = NULL, ...) {
integrand <- function(x) {
(x - c) ^ n * do.call(f, c(list(x = x), params))
}
integrate(f = integrand, lower = lb, upper = ub, ...)
}
}
To find the mean, you would just create a function to find the first moment:
find_mean <- moment_finder(1)
find_mean(dnorm, params = list(mean = 2))
#> 2 with absolute error < 1.2e-05
find_mean(dgb2, lb = 0, params = GB2_params)
#> 4.800761 with absolute error < 0.00015
For variance, you'd have to find the second central moment:
find_variance <- function(f, ...) {
mean <- find_mean(f, ...)$value
moment_finder(2, c = mean)(f, ...)
}
find_variance(dnorm, params = list(mean = 2, sd = 4))
#> 16 with absolute error < 3.1e-07
find_variance(dgb2, lb = 0, params = GB2_params)
#> 21.67902 with absolute error < 9.2e-05
Alternatively we could just generalise further, and find the expected value
of any transformation, rather than just moments:
ev_finder <- function(transform = identity) {
function(f, lb = -Inf, ub = Inf, params = NULL, ...) {
integrand <- function(x) {
transform(x) * do.call(f, c(list(x = x), params))
}
integrate(f = integrand, lower = lb, upper = ub, ...)
}
}
Now moment_finder would be a special case:
moment_finder <- function(n, c = 0) {
ev_finder(transform = function(x) (x - c) ^ n)
}
Created on 2018-02-17 by the reprex package (v0.2.0).
If you've read this far, you might also enjoy Advanced R by Hadley Wickham.
More extra bits
#andrewH I understood from your comment that you might be looking to find means of truncated distributions, e.g. find the mean for the part of the distribution above the mean of the entire distribution.
To do that, it's not enough to just integrate the first moment's integrand up from the mean value: you'll also have to rescale the PDF in the integrand, to make it a proper PDF again, after the truncation (make up for the lost probability mass, if you will, in a "hand wave-y" figure of speech). You can do that by dividing with the integral of the original PDF over the support of the truncated one.
Here's the code to better convey what I mean:
library(purrr)
library(GB2)
find_mass <- moment_finder(0)
find_mean <- moment_finder(1)
GB2_params <- list(shape1 = 3.652, scale = 6.5797, shape2 = 0.3, shape3 = 0.8356)
dgb2p <- invoke(partial, GB2_params, ...f = dgb2) # pre-apply parameters
# Mean value
(mu <- find_mean(dgb2p, lb = 0)$value)
#> [1] 4.800761
# Mean for the truncated distribution below the mean
(lower_mass <- find_mass(dgb2p, lb = 0, ub = mu)$value)
#> [1] 0.6108409
(lower_mean <- find_mean(dgb2p, lb = 0, ub = mu)$value / lower_mass)
#> [1] 2.40446
# Mean for the truncated distribution above the mean
(upper_mass <- find_mass(dgb2p, lb = mu)$value)
#> [1] 0.3891591
(upper_mean <- find_mean(dgb2p, lb = mu)$value / upper_mass)
#> [1] 8.562099
lower_mean * lower_mass + upper_mean * upper_mass
#> [1] 4.800761

integrate quadratic b-splines in R

I am working with a function that depends on quadratic B-spline interpolation estimated up front by the the cobs function in the same R package. The estimated knots and corresponding coefficients are given in code.
Further on, I require the integral of this function from 0 to some value, for example 0.6 or 0.7. Since my function is strictly positive, the integral value should increase if the upper bound of the integral increases. However this is not the case for some values, as shown when using 0.6 and 0.7
library(cobs)
b <- 0.6724027
xi1 <- 0.002541667
xi2 <- 2.509625
knots <- c(5.000010e-06, 8.700000e-05, 3.420000e-04, 1.344000e-03, 5.292000e-03, 2.082900e-02, 8.198800e-02, 3.227180e-01, 1.270272e+00, 5.000005e+00)
coef <- c(2.509493, 2.508141, 2.466733, 2.378368, 2.239769, 2.063977, 1.874705, 1.601780, 1.288163, 1.262683, 1.432729)
fn <- function(x) {
z <- (2 - b) * (cobs:::.splValue(2, knots, coef, x, 0) - 2 * x * xi1) / xi2 - b
return (z)
}
x <- seq(0, 0.7, 0.0001)
plot(x, fn(x), type = 'l')
integrate(f = fn, 0, 0.6)
# 0.1049019 with absolute error < 1.2e-15
integrate(f = fn, 0, 0.7)
# 0.09714124 with absolute error < 1.1e-15
I know I could integrate directly on the cobs:::.splValue function, and transform the results correspondingly. However, I am interested to know why this strange behaviour occurs.
I think that the algorithm used by the function "integrate" is not behaving well for those conditions. For example, if you modify the lower limits, it works as expected:
> integrate(f = fn, 0.1, 0.6)
0.06794357 with absolute error < 7.5e-16
> integrate(f = fn, 0.1, 0.7)
0.07432096 with absolute error < 8.3e-16
This is common with numerical integration methods, you have to choose on a case by case basis.
I'm using the trapezoidal rule to integrate over the same region and works well original code
composite.trapezoid <- function(f, a, b, n) {
if (is.function(f) == FALSE) {
stop('f must be a function with one parameter (variable)')
}
h <- (b - a) / n
j <- 1(:n - 1)
xj <- a + j * h
approx <- (h / 2) * (f(a) + 2 * sum(f(xj)) + f(b))
return(approx)
}
> composite.trapezoid(f = fn, 0, 0.6, 10000)
[1] 0.1079356
> composite.trapezoid(f = fn, 0, 0.7, 10000)
[1] 0.1143195
If we analyze the behavior of the integral close to the 0.65 region, we can see that there is a problem with the first approach (it is not smooth):
tst = sapply(seq(0.5, 0.8, length.out = 100), function(upper) {
integrate(f = fn, 0, upper)[[1]]
})
plot(seq(0.5, 0.8, length.out = 100), tst)
and that the trapezoid rule behaves better:
tst2 = sapply(seq(0.5, 0.8, length.out = 100), function(upper) {
composite.trapezoid(f = fn, 0, upper, 10000)[[1]]
})
plot(seq(0.5, 0.8, length.out = 100), tst2)

R use apply when arguments contain arrays and scalars

I want to avoid the following loop:
for(i in 1:2){
vectVal[i] = myFunc(M[,,i],S[,,i],phi2, sig2)
}
by using the apply function.
The problem is that the arguments passed to the apply function contain arrays (--> M and S) and scalars (--> phi2 and sig2).
I tried the following:
apply(M,3,myFunc, S = S, phi2 = phi2, sig2 = sig2)
which resulted in an error message because S is an array and not a matrix as required in myFunc (see below):
Here is a reproducible code:
M = array(data = c(
0.5, 0.7, 0.45,
0.5, 0.3, 0.45,
0.5, 0.7, 0.3,
0.5, 0.3, 0.7,
0.5, 0.7, 0.45,
0.5, 0.3, 0.55),
dim = c(3,2,2),
)
S = array(data = c(
0.7723229, -0.2149794, -0.2159068,
-0.2149794, 0.7723229, -0.2083123,
-0.2159068, -0.2083123, 0.7723229,
0.7723229, -0.2149794, -0.2149794,
-0.2149794, 0.7723229, -0.1783025,
-0.2149794, -0.1783025, 0.7723229,
0.7723229, -0.2149794, -0.2176665,
-0.2149794, 0.7723229, -0.2111496,
-0.2176665, -0.2111496, 0.7723229),
dim = c(3,3,2)
)
phi2 = 0.5
sig2 = 0.3
myFunc = function(M, S, phi2, sig2){
valMult = M[,1]%*%diag(S)
valEnd = valMult + phi2 - sig2
return(valEnd)
}
vectVal = vector(length = 2)
for(i in 1:2){
vectVal[i] = myFunc(M[,,i],S[,,i],phi2, sig2)
}
vectVal
Does someone has an idea?
One (not particularly efficient) way would be to use plyr to split your arrays into lists (each element of the lists are the third dimension of your arrays). You could then use mapply to run your function like so:
require( plyr)
ml <- alply( M , 3 )
sl <- alply( S , 3 )
mapply( myFunc , ml , sl , phi2 , sig2 )
# 1 2
#1.474333 1.358484
Update:
A more vectorised alternative (but still not as fast as for and %*% [see #JorisMeys comment below]) is to get the diag of S and then use colSums and matrix multiplication like so to achieve the same result:
s <- apply(S,3,diag)
colSums( M[,1,] * s ) + phi2 - sig2
# [1] 1.474333 1.358484
Update, update:
#JorisMeys has written a vectorised extractor function for getting the diagonal elements of 3D square arrays. Check this out.

Resources