Find optimum of function fitted with poly - r

I want to find the optimum of a function fitted through a scatter plot with poly.
Example data:
x <- c(32,64,96,118,126,144,152.5,158)
y <- c(99.5,104.8,108.5,100,86,64,35.3,15)
I get the function with
poly(lm(y ~ poly(x, 3)))
But when I want to use optimize,
o <- optimize(f = lm(y ~ poly(x, 3, raw=TRUE)), interval=c(0,150))
I get
Error in (function (arg) : could not find function "f"
How do I need to call optimize and possible helper functions to get the optimum (maximum in this case)?

Optimization: usually looking for a minimum
Optimize function states:
The function optimize searches the interval from lower to upper for a minimum or maximum of the function f with respect to its first argument.
and
maximum logical. Should we maximize or minimize (the default)
So your formula would look for the minimum.
Optimization of a function
lmdoes not return a function of x, it returns a list of elements (coefficients, intercepts, etc.), which you can use for your polynom.
What you need to do is to create a function evaluate.polynom
Which will return the value of P(x) knowing the coefficients returned by lm
Edit: Checking results and caveats of optimize
Result
X2<- x^2; X3<-x^3; df= data.frame(y = y, x = x, X2 = X2, X3 = X3)
L<-lm(y ~ X3 + X2 + x, data = df ) ### not being familiar with poly I prefer to do this
P<-function(x){ L$coefficients[1] + x^3 * L$coefficients[2] + x^2*L$coefficients[3] + x*L$coefficients[4] }
o<- optimize(f = P, interval = c(0,150), maximum = TRUE)
It says that maximum is 92.
Is it correct?
library(ggplot2);qplot(x = 0:150, y = P(0:150), geom = "line")+theme_bw()
We can clearly see that our polynom reaches it maximal value on the edge, but there is a local maximum that is found by optimize. If you really want the maximum on your interval, I suggest evaluating your polynom on the edges of the interval too.
Is the fit good?
The fit we calculated with lm is correct, we did not make a mistake there.

Related

Determine what is the break point for the slope change in R [migrated]

I'm trying to implement a "change point" analysis, or a multiphase regression using nls() in R.
Here's some fake data I've made. The formula I want to use to fit the data is:
$y = \beta_0 + \beta_1x + \beta_2\max(0,x-\delta)$
What this is supposed to do is fit the data up to a certain point with a certain intercept and slope ($\beta_0$ and $\beta_1$), then, after a certain x value ($\delta$), augment the slope by $\beta_2$. That's what the whole max thing is about. Before the $\delta$ point, it'll equal 0, and $\beta_2$ will be zeroed out.
So, here's my function to do this:
changePoint <- function(x, b0, slope1, slope2, delta){
b0 + (x*slope1) + (max(0, x-delta) * slope2)
}
And I try to fit the model this way
nls(y ~ changePoint(x, b0, slope1, slope2, delta),
data = data,
start = c(b0 = 50, slope1 = 0, slope2 = 2, delta = 48))
I chose those starting parameters, because I know those are the starting parameters, because I made the data up.
However, I get this error:
Error in nlsModel(formula, mf, start, wts) :
singular gradient matrix at initial parameter estimates
Have I just made unfortunate data? I tried fitting this on real data first, and was getting the same error, and I just figured that my initial starting parameters weren't good enough.
(At first I thought it could be a problem resulting from the fact that max is not vectorized, but that's not true. It does make it a pain to work with changePoint, wherefore the following modification:
changePoint <- function(x, b0, slope1, slope2, delta) {
b0 + (x*slope1) + (sapply(x-delta, function (t) max(0, t)) * slope2)
}
This R-help mailing list post describes one way in which this error may result: the rhs of the formula is overparameterized, such that changing two parameters in tandem gives the same fit to the data. I can't see how that is true of your model, but maybe it is.
In any case, you can write your own objective function and minimize it. The following function gives the squared error for data points (x,y) and a certain value of the parameters (the weird argument structure of the function is to account for how optim works):
sqerror <- function (par, x, y) {
sum((y - changePoint(x, par[1], par[2], par[3], par[4]))^2)
}
Then we say:
optim(par = c(50, 0, 2, 48), fn = sqerror, x = x, y = data)
And see:
$par
[1] 54.53436800 -0.09283594 2.07356459 48.00000006
Note that for my fake data (x <- 40:60; data <- changePoint(x, 50, 0, 2, 48) + rnorm(21, 0, 0.5)) there are lots of local maxima depending on the initial parameter values you give. I suppose if you wanted to take this seriously you'd call the optimizer many times with random initial parameters and examine the distribution of results.
Just wanted to add that you can do this with many other packages. If you want to get an estimate of uncertainty around the change point (something nls cannot do), try the mcp package.
# Simulate the data
df = data.frame(x = 1:100)
df$y = c(rnorm(20, 50, 5), rnorm(80, 50 + 1.5*(df$x[21:100] - 20), 5))
# Fit the model
model = list(
y ~ 1, # Intercept
~ 0 + x # Joined slope
)
library(mcp)
fit = mcp(model, df)
Let's plot it with a prediction interval (green line). The blue density is the posterior distribution for the change point location:
# Plot it
plot(fit, q_predict = T)
You can inspect individual parameters in more detail using plot_pars(fit) and summary(fit).

Is it possible to flip a formula in R?

I was working with a project and I used the VaR() function from the PerformanceAnalytics package to calculate Value-at-risk. I wanted to find out the probability of a stock generating making a loss of 1% or more. I found a solution to the problem by plugging numbers in to the probability variable, and controlling to see if it was approaching -1%. However, I was curious if it was possible to flip the formula so that I can just plug in the output and then the function will produce what would have been the input.
Produced the loss at 97.5% probability:
VaR(DNOlog, p = 0.975)
Produced a loss of -1% by changing the probability until it fit:
VaR(DNOlog, p = 0.6512184)
Let's get a reproducible example to demonstrate how you would go about this:
library(PerformanceAnalytics)
set.seed(2)
returns <- rnorm(1000, sd = 0.01)
This gives us a sensible result from VaR
VaR(returns, p = 0.975)
#> [,1]
#> VaR -0.01893631
To reverse this, we can use uniroot. This is a function which uses an iterative approach to finding the input value that makes a function return 0:
inverse_VaR <- function(x, target) {
f <- function(p) VaR(x, p)[1, 1] - target
uniroot(f, c(0.6, 0.99999), tol = .Machine$double.eps)$root
}
In our example, if we want to find the p value that makes VaR give an output of -0.01 with our vector returns, we can do:
inverse_VaR(returns, -0.01)
#> [1] 0.848303
And to show this works, we can do:
VaR(returns, 0.848303)
#> [,1]
#> VaR -0.009999999
Created on 2022-04-16 by the reprex package (v2.0.1)
What you want is the inverse function. If it is not too expensive to compute a lot of values of your function, then you can get a good approximation of this by computing many x-y pairs and then getting y as a function of x. Since you don't really say what your function is, I will use a simple function y = x + sin(x) as an example.
x = seq(0,6, 0.01)
y = x + sin(x)
InverseFunction = approxfun(y,x)
## Test with an example
InverseFunction(4) ## gives 4.967601
x1 = 4.967601
x1 + sin(x1) ## 3.999991
If you want more accuracy, use a smaller spacing between the x's.

How to calculate Kullback-leiber divergence of Kernel estimation in R

I used Kernel estimation to get a non parametric probability density function. Then, I want to compare the tails 'distance' between two Kernel distribution of continuous variables, using Kullback-leiber divergence. I have tried the following code:
kl_l <- function(x,y) {
integrand <- function(x,y) {
f.x <- fitted(density(x, bw="nrd0"))
f.y <- fitted(density(y, bw="nrd0"))
return((log(f.x)-log(f.y))*f.x)
}
return(integrate(integrand, lower=-Inf,upper=quantile(density(x, bw="nrd0"),0.25))$value)
#the Kullback-leiber equation
}
When I run kl_l(a,b) for a, b = 19 continuous variables, it returns a warning
Error in density(y, bw = "nrd0") : argument "y" is missing, with no default
Is there any way to calculate this?
(If anyone wants to see the actual equation: https://www.bankofengland.co.uk/-/media/boe/files/working-paper/2019/attention-to-the-tails-global-financial-conditions-and-exchange-rate-risks.pdf page 13.)
In short, I think you just need to move the f.x and f.y outside the integrand (and possibly replace fitted with approxfun):
kl_l <- function(x, y) {
f.x <- approxfun(density(x, bw = "nrd0"))
f.y <- approxfun(density(y, bw = "nrd0"))
integrand <- function(z) {
return((log(f.x(z)) - log(f.y(z))) * f.x(z))
}
return(integrate(integrand, lower = -Inf, upper = quantile(density(x, bw="nrd0"), 0.25))$value)
#the Kullback-leiber equation
}
Expanding a little:
Looking at the paper you referenced, it appears as though you need to first create the two fitted distributions f and g. So if your variable a contains observations under the 1-standard-deviation increase in global financial conditions, and b contains the observations under average global financial conditions, you can create two functions as in your example:
f <- approxfun(density(a))
g <- approxfun(density(b))
Then define the integrand:
integrand <- function(x) log(f(x) / g(x)) * f(x)
The upper bound:
upper <- quantile(density(b, bw = "nrd0"), 0.25)
And finally do the integration on x within the specified bounds. Note that each value of x in the numerical computation has to go into both f and g; in your function kl_l, the x and y were separately going into the integrand, which I think is incorrect; and in any case, integrate will only have operated on the first variable.
integrate(integrand, lower = -Inf, upper = upper)$value
One thing to check for is that approxfun returns NA for values outside the range specified in the density, which can mess up your operation, so you'll need to adjust for those (if you expect the density to go to zero, for example).

Storing coefficients and vectors using the boot package in r

I am estimating a model of the kind y= x + s(z) where s(z) is a non parametric function. I want to use bootstrap to get the standard error for the coefficient on x and the confidence bands for the function s(z). Basically the result of my estimation gives a coefficient for x, therefore a 1x1 object, and a vector nx1 for s(z). You can do that by using the gam package (gam) function. For my needs I am using a hand-written function which returns a list named resultfrom which I have result$betaxas the coefficient on x, and result$curve which stores the vector values (the estimation of s(z) gives a set of values corresponding to a curve). I am bootstrapping using the boot package as follows
result.boot <- boot(data, myfunction, R=3, sim = "parametric",
ran.gen = myfunction.sim, mle = myfunction.mle)
I obtain the following error message
Error in boot(pdata, myfunction, R = 3, sim = "parametric",
ran.gen = myfunction.sim, : incorrect number of subscripts on matrix
I guess it should instead gives out a vector of cofficient on x, on which I will compute the standard error, and a matrix nxnof s(z) values, on which I will compute a s.e. for each row, allowing me to have confidence interval for the s(z) curve. I suppose this is related to the fact that the output of my function is given by
est <- list("betax" = betax, "curve" = s.z, "residuals"=res)
return(est)
How coul I solve this?
To reproduce the issue it is possible to use the gamfunction
y = runif(16, min=0, max=1)
x = runif(16, min=0, max=0.5)
z = runif(16, min=0, max=0.3)
require(gam)
est <- gam(y ~ x + s(z))
while I am doing
est <- myfunction(y, x, z)
The solution implies vectorizing the output of the hand-written function and, therefore, making it compatible with the boot procedure which requires results to be stored in a vector.
est <- myfunction(y, x, z)
good.output <- matrix(c(betax, s.z), ncol=1)
This will let the boot function working properly. Then you just extract the corresponding elements of result.boot$t and you compute the statistics you like

Given a random variable with probability density function f(x), how to compute the expected value of this random variable in R?

Given a random variable with probability density function f(x), how to compute the expected value of this random variable in R?
If you want to compute the expected value, just compute :
E(X) = Integral of xf(x)dx over the whole domain of X.
The integration can easily be done using the function integrate().
Say you're having a normal density function (you can easily define your own density function) :
f <- function(x){
1/sqrt(2*pi)*exp((-1/2)*x^2)
}
You calculate the expected value simply by:
f2 <- function(x){x*f(x)}
integrate(f2,-Inf,Inf )
Pay attention, sometimes you need to use Vectorize() for your function. This is necessary to get integrate to work. For more info, see the help pages of integrate() and Vectorize().
Does it help to know that the expectation E is the integral of x*f(x) dx for x in (-inf, inf)?
You could also use the inverse sampling transformation. All you need is the cumulate density function F(x) of your random variable X. It utilises the fact that the random variable U = F(X) is uniform (with pdf f(x)). You then have that X = F^-1(U). This means that you can sample from a uniform variable and then transform it through F^-1(U) to get a sample from X. You can then take the mean of your sample.
Here is an example for the exponential distribution with parameter lambda = 5, mean = 1/5, F(x) = 1 - exp(-lambda * x) and F^-1(u) = -log(1 - x) / lambda.
sample_exp = function(n, lambda = 5){
u = runif(n)
y = -log(1 - u) / lambda
mean(y)
}
n = seq(10, 4000, 10)
res = sapply(n, sample_exp)
plot(n, res, type = "l", xlab = "sample size",
ylab = "Estimated mean", main = "True mean = 0.2")
Below is a plot of the estimated mean as a function of the sample size:

Resources