R minimize absolute error - r

Here's my setup
obs1<-c(1,1,1)
obs2<-c(0,1,2)
obs3<-c(0,0,3)
absoluteError<-function(obs,x){
return(sum(abs(obs-x)))
}
Example:
> absoluteError(obs2,1)
[1] 2
For a random vector of observations, I'd like to find a minimizer, x, which minimizes the absolute error between the observation values and a vector of all x. For instance, clearly the argument that minimizes absoluteError(obs1,x) is x=1 because this results in an error of 0. How do I find a minimizer for a random vector of observations? I'd imagine this is a linear programming problem, but I've never implemented one in R before.

The median of obs is a minimizer for the absolute error. The following is a sketch of how one might try proving this:
Let the median of a set of n observations, obs, be m. Call the absolute error between obs and m f(obs,m).
Case n is odd:
Consider f(obs,m+delta) where delta is some non zero number. Suppose delta is positive - then there are (n-1)/2 +1 observations whose error is delta more than f(obs,m). The remaining (n-1)/2 observations' error is at most delta less than f(obs,m). So f(obs,m+delta)-f(obs,m)>=delta. (The same argument can be made if delta is negative.) So the median is the only minimizer in this case. Thus f(obs,m+delta)>f(obs,m) for any non zero delta so m is a minimizer for f.
Case n is even:
Basically the same logic as above, except in this case any number between the two inner most numbers in the set will be a minimizer.

I am not sure this answer is correct, and even if it is I am not sure this is what you want. Nevertheless, I am taking a stab at it.
I think you are talking about 'Least absolute deviations', a form of regression that differs from 'Least Squares'.
If so, I found this R code for solving Least absolute deviations regression:
fabs=function(beta0,x,y){
b0=beta0[1]
b1=beta0[2]
n=length(x)
llh=0
for(i in 1:n){
r2=(y[i]-b0-b1*x[i])
llh=llh + abs(r2)
}
llh
}
g=optim(c(1,1),fabs,x=x,y=y)
I found the code here:
http://www.stat.colostate.edu/~meyer/hw12ans.pdf
Assuming you are talking about Least absolute deviations, you might not be interested in the above code if you want a solution in R from scratch rather than a solution that uses optim.
The above code is for a regression line with an intercept and one slope. I modified the code as follows to handle a regression with just an intercept:
y <- c(1,1,1)
x <- 1:length(y)
fabs=function(beta0,x,y){
b0=beta0[1]
b1=0
n=length(x)
llh=0
for(i in 1:n){
r2=(y[i]-b0-b1*x[i])
llh=llh + abs(r2)
}
llh
}
# The commands to get the estimator
g = optim(c(1),fabs,x=x,y=y, method='Brent', lower = (min(y)-5), upper = (max(y)+5))
g
I was not familiar with (i.e., had not heard of) Least absolute deviations until tonight. So, hopefully my modifications are fairly reasonable.
With y <- c(1,1,1) the parameter estimate is 1 (which I think you said is the correct answer):
$par
[1] 1
$value
[1] 1.332268e-15
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
With y <- c(0,1,2) the parameter estimate is 1:
$par
[1] 1
$value
[1] 2
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
With y <- c(0,0,3) the parameter estimate is 0 (which you said is the correct answer):
$par
[1] 8.613159e-10
$value
[1] 3
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
If you want R code from scratch, there is additional R code in the file at the link above which might be helpful.
Alternatively, perhaps it might be possible to extract the relevant code from the source file.
Alternatively, perhaps someone else can provide the desired code (and correct any errors on my part) in the next 24 hours.
If you come up with code from scratch please post it as an answer as I would love to see it myself.

lad=function(x,y){
SAD = function(beta, x, y) {
return(sum(abs(y - (beta[1] + beta[2] * x))))
}
d=lm(y~x)
ans1 = optim(par=c(d$coefficients[1], d$coefficients[2]),method = "Nelder-Mead",fn=SAD, x=x, y=y)
coe=setNames(ans1$par,c("(Intercept)",substitute(x)))
fitted=setNames(ans1$par[1]+ans1$par[2]*x,c(1:length(x)))
res=setNames(y-fitted,c(1:length(x)))
results = list(coefficients=coe, fitted.values=fitted, residuals=res)
class(results)="lad"
return(results)
}

Related

Maximum-Likelihood Estimation of three parameter reverse Weibull model implementation in R

I'm implementing a Maximum-Likelihood estimation in R for a three parameter reverse Weibull model and have some troubles to get plausible results, which include:
Bad optimization results, unwanted optimx behaviour. Beside these I wonder, how I could make use of parscale in this model.
Here is my implementation attempt:
To generate data I use the probabilty integral transform:
#Generate N sigma*RWei(alph)-mu distributed points
gen.wei <- function(N, theta) {
alph <- theta[1]
mu <- theta[2]
sigma <- theta[3]
return(
mu - sigma * (- log (runif(N)))**(1/alph)
)
}
Now I define the Log-Likelihood and negative Log-Likelihood to use optimx optimization:
#LL----
ll.wei <- function(theta,x) {
N <- length(x)
alph <- theta[1]
mu <- theta[2]
sigma <- theta[3]
val <- sum(ifelse(
x <= mu,
log(alph/sigma) + (alph-1) * log( (mu-x)/sigma) - ( (mu-x)/sigma)**(alph-1),
-Inf
))
return(val)
}
#Negative LL----
nll.wei <- function(theta,x) {
return(-ll.wei(theta=theta, x=x))
}
Afterwards I define the analytical gradient of the negative LL. Remark: There are points at which the negative LL isn't differentiable (the upper end-point mu)
gradnll.wei <- function(theta,x) {
N <- length(x)
alph <- theta[1]
mu <- theta[2]
sigma <- theta[3]
argn <- (mu-x)/sigma
del.alph <- sum(ifelse(x <= mu,
1/alph + log(argn) - log(argn) * argn**(alph-1),
0
))
del.mu <- sum(ifelse(x <= mu,
(alph-1)/(mu-x) - (alph-1)/sigma * argn**(alph-2),
0))
del.sigma <- sum(ifelse(x <= mu,
((alph-1)*argn**(alph-1)-alph)/sigma,
0))
return (-c(del.alph, del.mu, del.sigma))
}
Finally I try to optimize using the optimx package and the methods Nelder-Mead (derivative free) and BFGS (my LL is kinda smooth, there's just one point, which is problematic).
#MLE for Weibull
mle.wei <- function(start,sample) {
optimx(
par=start,
fn = nll.wei,
gr = gradnll.wei,
method = c("BFGS"),
x = sample
)
}
theta.s <- c(4,1,1/2) #test for parameters
sample <- gen.wei(100, theta.s) #generate 100 data points distributed like theta.s
mle.wei(start=c(8,4, 2), sample) #MLE Estimation
To my surprise I get the following error:
Error in optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower, :
Cannot evaluate function at initial parameters
I checked manually: Both nll and gradnll are finite at the initial parameters...
If i switch to optim instead of optimx I get a result, but a pretty bad one:
$par
[1] 8.178674e-01 9.115766e-01 1.745724e-06
$value
[1] -1072.786
$counts
function gradient
574 100
$convergence
[1] 1
$message
NULL
So it doesn't converge. If I don't supply the gradient to BFGS, there isn't a result. If I use Nelder-Mead instead:
$par
[1] 1.026393e+00 9.649121e-01 9.865624e-18
$value
[1] -3745.039
$counts
function gradient
502 NA
$convergence
[1] 1
$message
NULL
So it is also very bad...
My questions are:
Should I instead of defining the ll outside of the support as -Inf give it a very high negative value like -1e20 to circumvent -Inf errors or does it not matter?
Like the first one but for the gradient: technically the ll isn't defined outside of the support but since the likelihood is 0 albeit constant outside of the support, is it smart to define the gradnll as 0 outside?
3.I checked the implementation of the MLE estimator fgev from the evd package and saw that they use the BFGS method but don't supply the gradient even though the gradient does exist. Therefore my question is, whether there are situations where it is contraproductive to supply the gradient since it isn't defined everywhere (like my and the evd case)?
I got an error of "argument x matches multiple formal arguments" type in optimx but not in optim, which surprised me. What am I doing wrong in supplying my functions and data to the optimx function?
Thank you very much in advance!
Re 3: That's kind of a bug in optimx, but one that's hard to avoid. It uses x as a variable name when calculating a numerical gradient; you also use it as an "additional parameter" to your functions. You can work around that by renaming your argument, e.g. call it xdata in your functions.
Re 1 & 2: There are several techniques to handle boundary problems in optimization. Setting to a big constant value tends not to work: if the optimizer goes out of bounds, it finds the objective function really flat. If the exact boundary is legal, then pushing the parameter to the boundary and adding a penalty sometimes works. If the exact boundary is illegal, you might be able to reflect: e.g. if mu > 0 is a requirement, sometimes replacing mu by abs(mu) in the objective function gets things to work. Sometimes the best solution is to get rid of the boundary by transforming the parameters.
Edited to add some more details:
For this problem, it looks to me as though transformations of the parameters might work. I think alpha and sigma must both be positive. Setting alpha <- exp(theta[1]) and sigma <- exp(theta[3]) will guarantee that. Limits on mu are harder, but I think mu > max(xdata) is needed, so mu <- max(xdata) + exp(theta[2]) should keep it in bounds. Of course, making these changes messes up your gradient formula and starting values.
As to resources: I'm afraid I don't know any. This advice is based on years of painful experience.
https://web.ncf.ca/nashjc/optimx202112/ has a version of the package that deals with at least some variable clashes in the dot args.
There are some separate cleanups to be done before this goes on CRAN, but
the package should be more or less robust at the moment.
JN

GRG Nonlinear R

I want to transform my excel solver model into a model in R. I need to find 3 sets of coordinates which minimizes the distance to the 5 other given coordinates. I've made a program which calculates a distance matrix which outputs the minimal distance from each input to the given coordinates. I want to minimize this function by changing the input. Id est, I want to find the coordinates such that the sum of minimal distances are minimized. I tried several methods to do so, see the code below (Yes my distance matrix function might be somewhat cluncky, but this is because I had to reduce the input to 1 variable in order to run some algorithms such as nloprt (would get warnings otherwise). I've also seen some other questions (such as GRG Non-Linear Least Squares (Optimization)) but they did not change/improve the solution.
# First half of p describes x coordinates, second half the y coordinates # yes thats cluncky
p<-c(2,4,6,5,3,2) # initial points
x_given <- c(2,2.5,4,4,5)
y_given <- c(9,5,7,1,2)
f <- function(Coordinates){
# Predining
Term_1 <- NULL
Term_2 <- NULL
x <- NULL
Distance <- NULL
min_prob <- NULL
l <- length(Coordinates)
l2 <- length(x_given)
half_length <- l/2
s <- l2*half_length
Distance_Matrix <- matrix(c(rep(1,s)), nrow=half_length)
# Creating the distance matrix
for (k in 1:half_length){
for (i in 1:l2){
Term_1[i] <- (Coordinates[k]-x_given[i])^2
Term_2[i] <- (Coordinates[k+half_length]-y_given[i])^2
Distance[i] <- sqrt(Term_1[i]+Term_2[i])
Distance_Matrix[k,i] <- Distance[i]
}
}
d <- Distance_Matrix
# Find the minimum in each row, thats what we want to obtain ánd minimize
for (l in 1:nrow(d)){
min_prob[l] <- min(d[l,])
}
som<-sum(min_prob)
return(som)
}
# Minimise
sol<-optim(p,f)
x<-sol$par[1:3]
y<-sol$par[4:6]
plot(x_given,y_given)
points(x,y,pch=19)
The solution however is clearly not that optimal. I've tried to use the nloptr function, but I'm not sure which algorithm to use. Which algorithm can I use or can I use/program another function which solves this problem? Thanks in advance (and sorry for the detailed long question)
Look at the output of optim. It reached the iteration limit and had not yet converged.
> optim(p, f)
$`par`
[1] 2.501441 5.002441 5.003209 5.001237 1.995857 2.000265
$value
[1] 0.009927249
$counts
function gradient
501 NA
$convergence
[1] 1
$message
NULL
Although the result is not that different you will need to increase the number of iterations to get convergence. If that is still unacceptable then try different starting values.
> optim(p, f, control = list(maxit = 1000))
$`par`
[1] 2.502806 4.999866 5.000000 5.003009 1.999112 2.000000
$value
[1] 0.005012449
$counts
function gradient
755 NA
$convergence
[1] 0
$message
NULL

Optim: non-finite finite-difference value in L-BFGS-B

I'm trying to maximize a likelihood with R's 'optim.' I get the error "non-finite finite-difference value."
I'm using L-BFGS-B because I have to constrain the 11th parameter (Bernoulli "p") to be 0<=p<=1. Since I need this constraint, I can't use a nongradient method like "Nelder-Mead." Any thoughts on how I can fix this? It worked fine with simulated data!
Note that I'm using a floor function in here because discrete values are needed for the "Trials" parameters (params 1 through 10).
library(rmutil)
Nhat<-c(14335,15891,2700,1218,2213,10985,4985,8738,13878)
sdNhat<-
sqrt(c(26915344,6574096,175561,51529,71824,12166144,145924,2808976,3319684))
C<-c(313,410,38,30,69,175,132,193,240)
LL1<-vector()
LL2<-vector()
NLL<-function(data,par){
for (i in 1:length(Nhat)){
LL1[i]<-dnorm(Nhat[i],par[i],sdNhat[i],log=TRUE)
LL2[i]<-dbetabinom(C[i],floor(par[i]),par[length(Nhat)+1],par[length(Nhat)+2],log=TRUE)
}
-1*(sum(LL1)+sum(LL2))
}
out<-optim(par=c(floor(Nhat*runif(length(Nhat),0.9,1.1)),0.02,3),
fn=NLL,data=list(Nhat=Nhat,sdNhat=sdNhat,C=C),
method='L-BFGS-B',
lower=c(rep(min(Nhat),length(Nhat)),0.0001,1),
upper=c(rep(min(Nhat),length(Nhat)),0.9999,2))
You are getting an error, because the boundaries you are setting for the parameters 1 to 9 are identical. Thus, you have to adjust upper=c(rep(min(Nhat),length(Nhat)),0.9999,2)) (or lower) to be an interval.
You said that only the 10th (you actually wrote 11th, but I guess that's a typo) has to be bounded between 0 and 1, so this would work:
set.seed(1)
out<-optim(par=c(floor(Nhat*runif(length(Nhat),0.9,1.1)),0.02,3),
fn=NLL,data=list(Nhat=Nhat,sdNhat=sdNhat,C=C),
method='L-BFGS-B',
lower=c(rep(-Inf,length(Nhat)),0,-Inf),
upper=c(rep(Inf,length(Nhat)),1,Inf))
out
# $par
# [1] 13660.61522882 15482.96819195 2730.66273051 1310.04511624 2077.45269032 11857.94955470
# [7] 5417.09464008 9016.57472573 14234.22972586 0.02165253 826.21691430
#
# $value
# [1] 116.2657

What are the results in the dt function?

Cans someone explain the results in a typical dt function? The help page says that I should receive the density function. However, in my code below, what does the first value ".2067" represent?The second value?
x<-seq(1,10)
dt(x, df=3)
[1] 0.2067483358 0.0675096607 0.0229720373 0.0091633611 0.0042193538 0.0021748674
[7] 0.0012233629 0.0007369065 0.0004688171 0.0003118082
Two things were confused here:
dt gives you the density, this is why it decreases for large numbers:
x<-seq(1,10)
dt(x, df=3)
[1] 0.2067483358 0.0675096607 0.0229720373 0.0091633611 0.0042193538 0.0021748674
[7] 0.0012233629 0.0007369065 0.0004688171 0.0003118082
pt gives the distribution function. This is the probability of being smaller or equal x.
This is why the values go to 1 as x increases:
pt(x, df=3)
[1] 0.8044989 0.9303370 0.9711656 0.9859958 0.9923038 0.9953636 0.9970069 0.9979617 0.9985521 0.9989358
A "probability density" is not really a true probability, since probabilities are bounded in [0,1] while densities are not. The integral of densities across their domain is normalized to exactly 1. So densities are really the first derivatives of the probability function. This code may help:
plot( x= seq(-10,10,length=100),
y=dt( seq(-10,10,length=100), df=3) )
The value of 0.207 for dt at x=1 says that at x=1 that the probability is increasing at a rate of 0.207 per unit increase in x. (And since the t-distribution is symmetric that is also the value of dt with 3 df at -1.)
A bit of coding to instantiate the dt(x,df=3) function (see ?dt) and then integrate it:
> dt3 <- function(x) { gamma((4)/2)/(sqrt(3*pi)*gamma(3/2))*(1+x^2/3)^-((3+1)/2) }
> dt3(1)
[1] 0.2067483
> integrate(dt3, -Inf, Inf)
1 with absolute error < 7.2e-08

Solving equations in R similar to the Excel solver parameters function

I have a question concerning the possibility to solve functions in R, and doing the same using excel.
However I want to do it with R to show that R is better for my colleagues :)
Here is the equation:
f0<-1e-9
t_pw<-30e-9
a<-30.7397582453682
c<-6.60935546184612
P<-1-exp((-t_pw)*f0*exp(-a*(1-b/c)^2))
I want to find the b value for P<-0.5. In Excel we can do it by selecting P value column and setting it to 0.5 and then by using the solver parameters function.
I don't know which method is the best? Or any other way to do it?
Thankx.
I have a strong suspicion that your equation was supposed to include -t_pw/f0, not -t_pw*f0, and that t_pw was supposed to be 3.0e-9, not 30e-9.
Pfun <- function(b,f0=1e-9,t_pw=3.0e-9,
a=30.7397582453682,
c=6.60935546184612) {
1-exp((-t_pw)/f0*exp(-a*(1-b/c)^2))
}
Then #Lyzander's uniroot() suggestion works fine:
u1 <- uniroot(function(x) Pfun(x)-0.5,c(6,10))
The estimated value here is 8.05.
par(las=1,bty="l")
curve(Pfun,from=0,to=10,xname="b")
abline(h=0.5,lty=2)
abline(v=u1$root,lty=3)
If you want to solve an equation the simplest thing is to do is to use uniroot which is in base-R.
f0<-1e-9
t_pw<-30e-9
a<-30.7397582453682
c<-6.60935546184612
func <- function(b) {
1-exp((-t_pw)*f0*exp(-a*(1-b/c)^2)) - 0.5
}
#interval is the range of values of b to look for a solution
#it can be -Inf, Inf
> uniroot(func, interval=c(-1000, 1000), extendInt='yes')
Error in uniroot(func, interval = c(-1000, 1000), extendInt = "yes") :
no sign change found in 1000 iterations
As you see above my unitroot function fails. This is because there is no single solution to your equation which is easy to see as well. exp(-0.0000000000030 * <positive number between 0-1>) is practically (very close to) 1 so your equation becomes 1 - 1 - 0.5 = 0 which doesn't hold. You can see the same with a plot as well:
curve(func) #same result for curve(func, from=-1000, to=1000)
In this function the result will be -0.5 for any b.
So one way to do it fast, is uniroot but probably for a different equation.
And a working example:
myfunc2 <- function(x) x - 2
> uniroot(myfunc2, interval=c(0,10))
$root
[1] 2
$f.root
[1] 0
$iter
[1] 1
$init.it
[1] NA
$estim.prec
[1] 8

Resources