I was working with a project and I used the VaR() function from the PerformanceAnalytics package to calculate Value-at-risk. I wanted to find out the probability of a stock generating making a loss of 1% or more. I found a solution to the problem by plugging numbers in to the probability variable, and controlling to see if it was approaching -1%. However, I was curious if it was possible to flip the formula so that I can just plug in the output and then the function will produce what would have been the input.
Produced the loss at 97.5% probability:
VaR(DNOlog, p = 0.975)
Produced a loss of -1% by changing the probability until it fit:
VaR(DNOlog, p = 0.6512184)
Let's get a reproducible example to demonstrate how you would go about this:
library(PerformanceAnalytics)
set.seed(2)
returns <- rnorm(1000, sd = 0.01)
This gives us a sensible result from VaR
VaR(returns, p = 0.975)
#> [,1]
#> VaR -0.01893631
To reverse this, we can use uniroot. This is a function which uses an iterative approach to finding the input value that makes a function return 0:
inverse_VaR <- function(x, target) {
f <- function(p) VaR(x, p)[1, 1] - target
uniroot(f, c(0.6, 0.99999), tol = .Machine$double.eps)$root
}
In our example, if we want to find the p value that makes VaR give an output of -0.01 with our vector returns, we can do:
inverse_VaR(returns, -0.01)
#> [1] 0.848303
And to show this works, we can do:
VaR(returns, 0.848303)
#> [,1]
#> VaR -0.009999999
Created on 2022-04-16 by the reprex package (v2.0.1)
What you want is the inverse function. If it is not too expensive to compute a lot of values of your function, then you can get a good approximation of this by computing many x-y pairs and then getting y as a function of x. Since you don't really say what your function is, I will use a simple function y = x + sin(x) as an example.
x = seq(0,6, 0.01)
y = x + sin(x)
InverseFunction = approxfun(y,x)
## Test with an example
InverseFunction(4) ## gives 4.967601
x1 = 4.967601
x1 + sin(x1) ## 3.999991
If you want more accuracy, use a smaller spacing between the x's.
Related
Please consider the following:
My aim is to draw random survival times from a flexible parametric multi-state model fitted with flexsurvreg (more specifically msfit.flexsurvreg) and applying some hazard ratio (HR, in this example set to 0.2).
I found an example to generate random survival times using any hazard function here. This is also were I apply the HR.
Problem
With the actual data, I receive an error once the HR is below the value of 0.2: Error in uniroot((function(x) { : no sign change found in 1000 iterations.
This does not happen in the reproducible example below.
Questions
Is there another, better way than the one below to draw random survival times while applying a HR?
Can someone indicate why the "no sign change" error may occur and how this can be fixed?
Any help is greatly appreciated!
Minimal reproducible example
# Load package
library(flexsurv)
#> Loading required package: survival
# Load data
data("bosms4")
# Define hazard ratio
hr <- 0.2
# Fit model (weibull)
crwei <- flexsurvreg(formula = Surv(years, status) ~ trans + shape(trans),
data = bosms3, dist = "weibull")
# Create transition matrix
Q <- rbind(c(NA,1,2),c(NA,NA,3), c(NA,NA,NA))
# Capture parameters
pars <- pars.fmsm(crwei, trans=Q, newdata=data.frame(trans=1:3))
# Code from https://eurekastatistics.com/generating-random-survival-times-from-any-hazard-function/ ----
inverse = function(fn, min_x, max_x){
# Returns the inverse of a function for a given range.
# E.g. inverse(sin, 0, pi/2)(sin(pi/4)) equals pi/4 because 0 <= pi/4 <= pi/2
fn_inv = function(y){
uniroot((function(x){fn(x) - y}), lower=min_x, upper=max_x)[1]$root
}
return(Vectorize(fn_inv))
}
integrate_from_0 = function(fn, t){
int_fn = function(t) integrate(fn, 0, t)
result = sapply(t, int_fn)
value = unlist(result["value",])
msg = unlist(result["message",])
value[which(msg != "OK")] = NA
return(value)
}
random_survival_times = function(hazard_fn, n, max_time=10000){
# Given a hazard function, returns n random time-to-event observations.
cumulative_density_fn = function(t) 1 - exp(-integrate_from_0(hazard_fn, t))
inverse_cumulative_density_fn = inverse(cumulative_density_fn, 0, max_time)
return(inverse_cumulative_density_fn(runif(n)))
}
# Run with data ----
random_survival_times(hazard_fn = function(t){crwei$dfns$h(t, pars[[1]][1], pars[[1]][2]) * hr}, n = 100)
#> Error in integrate(fn, 0, t): non-finite function value
# Adapt random_survival time function replacing 0 with 0.1 ----
random_survival_times <- function(hazard_fn, n, max_time=10000){
# Given a hazard function, returns n random time-to-event observations.
cumulative_density_fn = function(t) 1 - exp(-integrate_from_0(hazard_fn, t))
inverse_cumulative_density_fn = inverse(cumulative_density_fn, 0.1, max_time)
return(inverse_cumulative_density_fn(runif(n)))
}
# Run again with data ----
random_survival_times(hazard_fn = function(t){crwei$dfns$h(t, pars[[1]][1], pars[[1]][2]) * hr}, n = 100)
#> Error in uniroot((function(x) {: f() values at end points not of opposite sign
# Adapt inverse adding extendedInt = "yes" ----
inverse <- function(fn, min_x, max_x){
# Returns the inverse of a function for a given range.
# E.g. inverse(sin, 0, pi/2)(sin(pi/4)) equals pi/4 because 0 <= pi/4 <= pi/2
fn_inv <- function(y){
uniroot((function(x){fn(x) - y}), lower=min_x, upper=max_x,
extendInt = "yes" # extendInt added because of error on some distributions: "Error in uniroot((function(x) { : f() values at end points not of opposite sign. Solution found here: https://stackoverflow.com/questions/38961221/uniroot-solution-in-r
)[1]$root
}
return(Vectorize(fn_inv))
}
# Run again with data ----
res <- random_survival_times(hazard_fn = function(t){crwei$dfns$h(t, pars[[1]][1], pars[[1]][2]) * hr}, n = 100)
res[1:5]
#> [1] 1.074281 13.688663 30.896637 159.643827 15.805103
Created on 2022-10-18 with reprex v2.0.2
This method of sampling survival times basically works by sampling a random uniform(0,1) number p and finding x for which the survival probability is p. The uniroot step is used to solve S(x) = p by a numerical search. In your case, it is having difficulty finding a solution after 1000 steps.
I've seen this happen, and fixed by adding, e.g. uniroot(..., maxiter=10000) to tell it to try a bit harder to find the solution. That's always been enough in my tests, though those may be limited. If that doesn't work, I'd advise digging in by hand and examining the survival curve that you are trying to invert - it may be invalid due to some parameter value being extreme.
(This kind of thing is done in the function qgeneric in the flexsurv package, though it borrows a vectorised version of uniroot from the rstpm2 package which is faster.)
I used Kernel estimation to get a non parametric probability density function. Then, I want to compare the tails 'distance' between two Kernel distribution of continuous variables, using Kullback-leiber divergence. I have tried the following code:
kl_l <- function(x,y) {
integrand <- function(x,y) {
f.x <- fitted(density(x, bw="nrd0"))
f.y <- fitted(density(y, bw="nrd0"))
return((log(f.x)-log(f.y))*f.x)
}
return(integrate(integrand, lower=-Inf,upper=quantile(density(x, bw="nrd0"),0.25))$value)
#the Kullback-leiber equation
}
When I run kl_l(a,b) for a, b = 19 continuous variables, it returns a warning
Error in density(y, bw = "nrd0") : argument "y" is missing, with no default
Is there any way to calculate this?
(If anyone wants to see the actual equation: https://www.bankofengland.co.uk/-/media/boe/files/working-paper/2019/attention-to-the-tails-global-financial-conditions-and-exchange-rate-risks.pdf page 13.)
In short, I think you just need to move the f.x and f.y outside the integrand (and possibly replace fitted with approxfun):
kl_l <- function(x, y) {
f.x <- approxfun(density(x, bw = "nrd0"))
f.y <- approxfun(density(y, bw = "nrd0"))
integrand <- function(z) {
return((log(f.x(z)) - log(f.y(z))) * f.x(z))
}
return(integrate(integrand, lower = -Inf, upper = quantile(density(x, bw="nrd0"), 0.25))$value)
#the Kullback-leiber equation
}
Expanding a little:
Looking at the paper you referenced, it appears as though you need to first create the two fitted distributions f and g. So if your variable a contains observations under the 1-standard-deviation increase in global financial conditions, and b contains the observations under average global financial conditions, you can create two functions as in your example:
f <- approxfun(density(a))
g <- approxfun(density(b))
Then define the integrand:
integrand <- function(x) log(f(x) / g(x)) * f(x)
The upper bound:
upper <- quantile(density(b, bw = "nrd0"), 0.25)
And finally do the integration on x within the specified bounds. Note that each value of x in the numerical computation has to go into both f and g; in your function kl_l, the x and y were separately going into the integrand, which I think is incorrect; and in any case, integrate will only have operated on the first variable.
integrate(integrand, lower = -Inf, upper = upper)$value
One thing to check for is that approxfun returns NA for values outside the range specified in the density, which can mess up your operation, so you'll need to adjust for those (if you expect the density to go to zero, for example).
I have the following likelihood function which I used in a rather complex model (in practice on a log scale):
library(plyr)
dcustom=function(x,sd,L,R){
R. = (log(R) - log(x))/sd
L. = (log(L) - log(x))/sd
ll = pnorm(R.) - pnorm(L.)
return(ll)
}
df=data.frame(Range=seq(100,500),sd=rep(0.1,401),L=200,U=400)
df=mutate(df, Likelihood = dcustom(Range, sd,L,U))
with(df,plot(Range,Likelihood,type='l'))
abline(v=200)
abline(v=400)
In this function, the sd is predetermined and L and R are "observations" (very much like the endpoints of a uniform distribution), so all 3 of them are given. The above function provides a large likelihood (1) if the model estimate x (derived parameter) is in between the L-R range, a smooth likelihood decrease (between 0 and 1) near the bounds (of which the sharpness is dependent on the sd), and 0 if it is too much outside.
This function works very well to obtain estimates of x, but now I would like to do the inverse: draw a random x from the above function. If I would do this many times, I would generate a histogram that follows the shape of the curve plotted above.
The ultimate goal is to do this in C++, but I think it would be easier for me if I could first figure out how to do this in R.
There's some useful information online that helps me start (http://matlabtricks.com/post-44/generate-random-numbers-with-a-given-distribution, https://stats.stackexchange.com/questions/88697/sample-from-a-custom-continuous-distribution-in-r) but I'm still not entirely sure how to do it and how to code it.
I presume (not sure at all!) the steps are:
transform likelihood function into probability distribution
calculate the cumulative distribution function
inverse transform sampling
Is this correct and if so, how do I code this? Thank you.
One idea might be to use the Metropolis Hasting Algorithm to obtain a sample from the distribution given all the other parameters and your likelihood.
# metropolis hasting algorithm
set.seed(2018)
n_sample <- 100000
posterior_sample <- rep(NA, n_sample)
x <- 300 # starting value: I chose 300 based on your likelihood plot
for (i in 1:n_sample){
lik <- dcustom(x = x, sd = 0.1, L = 200, R =400)
# propose a value for x (you can adjust the stepsize with the sd)
x.proposed <- x + rnorm(1, 0, sd = 20)
lik.proposed <- dcustom(x = x.proposed, sd = 0.1, L = 200, R = 400)
r <- lik.proposed/lik # this is the acceptance ratio
# accept new value with probablity of ratio
if (runif(1) < r) {
x <- x.proposed
posterior_sample[i] <- x
}
}
# plotting the density
approximate_distr <- na.omit(posterior_sample)
d <- density(approximate_distr)
plot(d, main = "Sample from distribution")
abline(v=200)
abline(v=400)
# If you now want to sample just a few values (for example, 5) you could use
sample(approximate_distr,5)
#[1] 281.7310 371.2317 378.0504 342.5199 412.3302
I want to find the optimum of a function fitted through a scatter plot with poly.
Example data:
x <- c(32,64,96,118,126,144,152.5,158)
y <- c(99.5,104.8,108.5,100,86,64,35.3,15)
I get the function with
poly(lm(y ~ poly(x, 3)))
But when I want to use optimize,
o <- optimize(f = lm(y ~ poly(x, 3, raw=TRUE)), interval=c(0,150))
I get
Error in (function (arg) : could not find function "f"
How do I need to call optimize and possible helper functions to get the optimum (maximum in this case)?
Optimization: usually looking for a minimum
Optimize function states:
The function optimize searches the interval from lower to upper for a minimum or maximum of the function f with respect to its first argument.
and
maximum logical. Should we maximize or minimize (the default)
So your formula would look for the minimum.
Optimization of a function
lmdoes not return a function of x, it returns a list of elements (coefficients, intercepts, etc.), which you can use for your polynom.
What you need to do is to create a function evaluate.polynom
Which will return the value of P(x) knowing the coefficients returned by lm
Edit: Checking results and caveats of optimize
Result
X2<- x^2; X3<-x^3; df= data.frame(y = y, x = x, X2 = X2, X3 = X3)
L<-lm(y ~ X3 + X2 + x, data = df ) ### not being familiar with poly I prefer to do this
P<-function(x){ L$coefficients[1] + x^3 * L$coefficients[2] + x^2*L$coefficients[3] + x*L$coefficients[4] }
o<- optimize(f = P, interval = c(0,150), maximum = TRUE)
It says that maximum is 92.
Is it correct?
library(ggplot2);qplot(x = 0:150, y = P(0:150), geom = "line")+theme_bw()
We can clearly see that our polynom reaches it maximal value on the edge, but there is a local maximum that is found by optimize. If you really want the maximum on your interval, I suggest evaluating your polynom on the edges of the interval too.
Is the fit good?
The fit we calculated with lm is correct, we did not make a mistake there.
I'm trying to get a 95% confidence interval around some predicted values, but am not capable of achieving this.
Basically, I estimated a growth curve like this:
set.seed(123)
dat=data.frame(size=rnorm(50,10,3),age=rnorm(50,5,2))
S <- function(t,ts,C,K) ((C*K)/(2*pi))*sin(2*pi*(t-ts))
sommers <- function(t,Linf,K,t0,ts,C)
Linf*(1-exp(-K*(t-t0)-S(t,ts,C,K)+S(t0,ts,C,K)))
model <- nls(size~sommers(age,Linf,K,t0,ts,C),data=dat,
start=list(Linf=10,K=4.7,t0=2.2,C=0.9,ts=0.1))
I have independent size measurements, for which I would like to predict the age. Therefore, the inverse of the function, which is not very straightforward, I calculated like this:
model.out=coef(model)
S.out <- function(t)
((model.out[[4]]*model.out[[2]])/(2*pi))*sin(2*pi*(t-model.out[[5]]))
sommers.out <- function(t)
model.out[[1]]*(1-exp(-model.out[[2]]*(t-model.out[[3]])-S.out(t)+S.out(model.out[[3]])))
inverse = function (f, lower = -100, upper = 100) {
function (y) uniroot((function (x) f(x) - y), lower = lower, upper = upper)[1]
}
sommers.inverse = inverse(sommers.out, 0, 25)
x= sommers.inverse(10) #this works with my complete dataset, but not with this fake one
Although this works fine, I need to know the confidence interval (95%) around this estimate (x). For linear models there is for example "predict(... confidence=)". I could also bootstrap the function somehow to get the quantiles associated with the parameters (didn't find how), to then use the extremes of those to calculate the maximum and minimum values predictable. But that doesn't really look like the good way of doing this....
Any help would be greatly appreciated.
EDIT after answer:
So this worked (explained in the book of Ben Bolker, see answer):
vmat = mvrnorm(1000, mu = coef(mfit), Sigma = vcov(mfit))
dist = numeric(1000)
for (i in 1:1000) {dist[i] = sommers_inverse(9.938,vmat[i,])}
quantile(dist, c(0.025, 0.975))
On the rather bad fake data I gave, this works of course rather horrible. But on the real data (which I have a problem recreating), this is ok!
Unless I'm mistaken, you're going to have to use either regular (parametric) bootstrapping or a method called either "population predictive intervals" (e.g., see section 5 of chapter 7 of Bolker 2008), which assumes that the sampling distributions of your parameters are multivariate Normal. However, I think you may have bigger problems, unless I've somehow messed up your model in adapting it ...
Generate data (note that random data may actually bad for testing your model - see below ...)
set.seed(123)
dat <- data.frame(size=rnorm(50,10,3),age=rnorm(50,5,2))
S <- function(t,ts,C,K) ((C*K)/(2*pi))*sin(2*pi*(t-ts))
sommers <- function(t,Linf,K,t0,ts,C)
Linf*(1-exp(-K*(t-t0)-S(t,ts,C,K)+S(t0,ts,C,K)))
Plot the data and the initial curve estimate:
plot(size~age,data=dat,ylim=c(0,16))
agevec <- seq(0,10,length=1001)
lines(agevec,sommers(agevec,Linf=10,K=4.7,t0=2.2,ts=0.1,C=0.9))
I had trouble with nls so I used minpack.lm::nls.lm, which is slightly more robust. (There are other options here, e.g. calculating the derivatives and providing the gradient function, or using AD Model Builder or Template Model Builder, or using the nls2 package.)
For nls.lm we need a function that returns the residuals:
sommers_fn <- function(par,dat) {
with(c(as.list(par),dat),size-sommers(age,Linf,K,t0,ts,C))
}
library(minpack.lm)
mfit <- nls.lm(fn=sommers_fn,
par=list(Linf=10,K=4.7,t0=2.2,C=0.9,ts=0.1),
dat=dat)
coef(mfit)
## Linf K t0 C ts
## 10.6540185 0.3466328 2.1675244 136.7164179 0.3627371
Here's our problem:
plot(size~age,data=dat,ylim=c(0,16))
lines(agevec,sommers(agevec,Linf=10,K=4.7,t0=2.2,ts=0.1,C=0.9))
with(as.list(coef(mfit)), {
lines(agevec,sommers(agevec,Linf,K,t0,ts,C),col=2)
abline(v=t0,lty=2)
abline(h=c(0,Linf),lty=2)
})
With this kind of fit, the results of the inverse function are going to be extremely unstable, as the inverse function is many-to-one, with the number of inverse values depending sensitively on the parameter values ...
sommers_pred <- function(x,pars) {
with(as.list(pars),sommers(x,Linf,K,t0,ts,C))
}
sommers_pred(6,coef(mfit)) ## s(6)=9.93
sommers_inverse <- function (y, pars, lower = -100, upper = 100) {
uniroot(function(x) sommers_pred(x,pars) -y, c(lower, upper))$root
}
sommers_inverse(9.938, coef(mfit)) ## 0.28
If I pick my interval very carefully I can get back the correct answer ...
sommers_inverse(9.938, coef(mfit), 5.5, 6.2)
Maybe your model will be better behaved with more realistic data. I hope so ...