Calculating the maximum of a PDF - r

I have the following PDF in the form
f(x) = 3(1-x)^2 for x [0,1]
I would like to calculate the maximum of f(x).
I've done the following:
integral<-function(x)
{3*x*(1-x)^2
}
max(integral(x))
I can't figure it out where I've gone wrong...

The error is due to the numerical approximation. If you reduce the delta you will find your result:
max(integral(seq(0,1,by=.1))) # your first answer
max(integral(seq(0,1,by=.0001))) # what you are looking for
However, I'd choose an optimisation procedure to converge towards the maximum :
optimise(integral, lower=0, upper=1, maximum = TRUE)
$maximum
[1] 0.2499993 # this is the value of x causing f to be at its max
$objective
[1] 2.109375 # this is the max value

Related

Optim: non-finite finite-difference value in L-BFGS-B

I'm trying to maximize a likelihood with R's 'optim.' I get the error "non-finite finite-difference value."
I'm using L-BFGS-B because I have to constrain the 11th parameter (Bernoulli "p") to be 0<=p<=1. Since I need this constraint, I can't use a nongradient method like "Nelder-Mead." Any thoughts on how I can fix this? It worked fine with simulated data!
Note that I'm using a floor function in here because discrete values are needed for the "Trials" parameters (params 1 through 10).
library(rmutil)
Nhat<-c(14335,15891,2700,1218,2213,10985,4985,8738,13878)
sdNhat<-
sqrt(c(26915344,6574096,175561,51529,71824,12166144,145924,2808976,3319684))
C<-c(313,410,38,30,69,175,132,193,240)
LL1<-vector()
LL2<-vector()
NLL<-function(data,par){
for (i in 1:length(Nhat)){
LL1[i]<-dnorm(Nhat[i],par[i],sdNhat[i],log=TRUE)
LL2[i]<-dbetabinom(C[i],floor(par[i]),par[length(Nhat)+1],par[length(Nhat)+2],log=TRUE)
}
-1*(sum(LL1)+sum(LL2))
}
out<-optim(par=c(floor(Nhat*runif(length(Nhat),0.9,1.1)),0.02,3),
fn=NLL,data=list(Nhat=Nhat,sdNhat=sdNhat,C=C),
method='L-BFGS-B',
lower=c(rep(min(Nhat),length(Nhat)),0.0001,1),
upper=c(rep(min(Nhat),length(Nhat)),0.9999,2))
You are getting an error, because the boundaries you are setting for the parameters 1 to 9 are identical. Thus, you have to adjust upper=c(rep(min(Nhat),length(Nhat)),0.9999,2)) (or lower) to be an interval.
You said that only the 10th (you actually wrote 11th, but I guess that's a typo) has to be bounded between 0 and 1, so this would work:
set.seed(1)
out<-optim(par=c(floor(Nhat*runif(length(Nhat),0.9,1.1)),0.02,3),
fn=NLL,data=list(Nhat=Nhat,sdNhat=sdNhat,C=C),
method='L-BFGS-B',
lower=c(rep(-Inf,length(Nhat)),0,-Inf),
upper=c(rep(Inf,length(Nhat)),1,Inf))
out
# $par
# [1] 13660.61522882 15482.96819195 2730.66273051 1310.04511624 2077.45269032 11857.94955470
# [7] 5417.09464008 9016.57472573 14234.22972586 0.02165253 826.21691430
#
# $value
# [1] 116.2657

R Estimating parameters of binomial distribution

I'm trying estimate parameters n and p from Binomial Distribution by Maximum Likelihood in R.
I'm using the function optim from stats package, but there is an error.
That is my code:
xi = rbinom(100, 20, 0.5) # Sample
n = length(xi) # Sample size
# Log-Likelihood
lnlike <- function(theta){
log(prod(choose(theta[1],xi))) + sum(xi*log(theta[2])) +
(n*theta[1] - sum(xi))*log(1-theta[2])
}
# Optimizing
optim(theta <- c(10,.3), lnlike, hessian=TRUE)
Error in optim(theta <- c(10, 0.3), lnlike, hessian = TRUE) :
function cannot be evaluated at initial parameters
Anyone done this? Which function used?
tl;dr you're going to get a likelihood of zero (and thus a negative-infinite log-likelihood) if the response variable is greater than the binomial N (which is the theoretical maximum value of the response). In most practical problems, N is taken as known and just the probability is estimated. If you do want to estimate N, you need to (1) constrain it to be >= the largest value in the sample; (2) do something special to optimize over a parameter that must be discrete (this is an advanced/tricky problem).
First part of this answer shows debugging strategies for identifying the problem, second illustrates a strategy for optimizing N and p simultaneously (by brute force over a reasonable range of N).
Setup:
set.seed(101)
n <- 100
xi <- rbinom(n, size=20, prob=0.5) # Sample
Log-likelihood function:
lnlike <- function(theta){
log(prod(choose(theta[1],xi))) + sum(xi*log(theta[2])) +
(n*theta[1] - sum(xi))*log(1-theta[2])
}
Let's break this down.
theta <- c(10,0.3) ## starting values
lnlike(c(10,0.3)) ## -Inf
OK, the log-likelihood is -Inf at the starting value. Not surprising that optim() can't work with that.
Let's work through the terms.
log(prod(choose(theta[1],xi))) ## -Inf
OK, we're already in trouble on the first term.
prod(choose(theta[1],xi)) ## 0
The product is zero ... why?
choose(theta[1],xi)
## [1] 120 210 10 0 0 10 120 210 0 0 45 210 1 0
Lots of zeros. Why? What are the values of xi that are problematic?
## [1] 7 6 9 12 11 9 7 6
Aha! We're OK for 7, 6, 9 ... but in trouble with 12.
badvals <- (choose(theta[1],xi)==0)
all(badvals==(xi>10)) ## TRUE
If you really want to do this, you can do it by brute-force enumeration over reasonable values of n ...
## likelihood function
llik2 <- function(p,n) {
-sum(dbinom(xi,prob=p,size=n,log=TRUE))
}
## possible N values (15 to 50)
nvec <- max(xi):50
Lvec <- numeric(length(nvec))
for (i in 1:length(nvec)) {
## optim() wants method="Brent"/lower/upper for 1-D optimization
Lvec[i] <- optim(par=0.5,fn=llik2,n=nvec[i],method="Brent",
lower=0.001,upper=0.999)$val
}
nvec[which.min(Lvec)] ## 20
par(las=1,bty="l")
plot(nvec,Lvec,type="b")
Why you get into trouble?
If you do lnlike(c(10, 0.3)), you get -Inf. That's why your error message is complaining lnlike, rather than optim.
Often, n is known, and only p needs be estimated. In this situation, either moment estimator or maximum likelihood estimator is in closed form, and no numerical optimization is needed. So, it is really weird to estimate n.
If you do want to estimate, you have to be aware that it is constrained. Check
range(xi) ## 5 15
You observations have range [5, 15], therefore, it is required that n >= 15. How can you pass an initial value 10? The searching direction for n, should be from a large starting value, and then gradually searching downward till it reaches max(xi). So, you might try 30 as the initial value for n.
Additionally, you don't need to define lnlike in the current way. Do this:
lnlike <- function(theta, x) -sum(dbinom(x, size = theta[1], prob = theta[2], log = TRUE))
optim is often used for minimization (though it can do maximization). I have put a minus sign in the function to get negative log likelihood. In this way, you are minimizing lnlike w.r.t. theta.
You should also pass your observations xi as additional argument to lnlike, rather than taking it from global environment.
Naive try with optim:
In my comment, I already said that I don't believe using optim to estimate n will work, because n must be integers while optim is used for continuous variables. These errors and warnings shall convince you.
optim(c(30,.3), fn = lnlike, x = xi, hessian = TRUE)
Error in optim(c(30, 0.3), fn = lnlike, x = xi, hessian = TRUE) :
non-finite finite-difference value [1]
In addition: There were 15 or more warnings (use warnings() to see the
first 15
> warnings()
Warning messages:
1: In dbinom(x, size = theta[1], prob = theta[2], log = TRUE) : NaNs produced
2: In dbinom(x, size = theta[1], prob = theta[2], log = TRUE) : NaNs produced
3: In dbinom(x, size = theta[1], prob = theta[2], log = TRUE) : NaNs produced
4: In dbinom(x, size = theta[1], prob = theta[2], log = TRUE) : NaNs produced
5: In dbinom(x, size = theta[1], prob = theta[2], log = TRUE) : NaNs produced
Solution?
Ben has provided you a way. Instead of letting optim to estimate n, we manually do a grid search for n. For each candidate n, we perform a univariate optimization w.r.t. p. (Oops, in fact, there is no need to do numerical optimization here.) In this way, you are getting a profile likelihood of n. Then, we find n on the grid to minimize this profile likelihood.
Ben has provided you full details, and I shall not repeat that. Nice (and swift) work, Ben!

What are the results in the dt function?

Cans someone explain the results in a typical dt function? The help page says that I should receive the density function. However, in my code below, what does the first value ".2067" represent?The second value?
x<-seq(1,10)
dt(x, df=3)
[1] 0.2067483358 0.0675096607 0.0229720373 0.0091633611 0.0042193538 0.0021748674
[7] 0.0012233629 0.0007369065 0.0004688171 0.0003118082
Two things were confused here:
dt gives you the density, this is why it decreases for large numbers:
x<-seq(1,10)
dt(x, df=3)
[1] 0.2067483358 0.0675096607 0.0229720373 0.0091633611 0.0042193538 0.0021748674
[7] 0.0012233629 0.0007369065 0.0004688171 0.0003118082
pt gives the distribution function. This is the probability of being smaller or equal x.
This is why the values go to 1 as x increases:
pt(x, df=3)
[1] 0.8044989 0.9303370 0.9711656 0.9859958 0.9923038 0.9953636 0.9970069 0.9979617 0.9985521 0.9989358
A "probability density" is not really a true probability, since probabilities are bounded in [0,1] while densities are not. The integral of densities across their domain is normalized to exactly 1. So densities are really the first derivatives of the probability function. This code may help:
plot( x= seq(-10,10,length=100),
y=dt( seq(-10,10,length=100), df=3) )
The value of 0.207 for dt at x=1 says that at x=1 that the probability is increasing at a rate of 0.207 per unit increase in x. (And since the t-distribution is symmetric that is also the value of dt with 3 df at -1.)
A bit of coding to instantiate the dt(x,df=3) function (see ?dt) and then integrate it:
> dt3 <- function(x) { gamma((4)/2)/(sqrt(3*pi)*gamma(3/2))*(1+x^2/3)^-((3+1)/2) }
> dt3(1)
[1] 0.2067483
> integrate(dt3, -Inf, Inf)
1 with absolute error < 7.2e-08

Finding the Maximum of a Function with numerical derivatives in R

I wish to numerically find the maximum of the function multiplied by Beta 3 shown on p346 of the following link when tau=30:
http://www.ssc.upenn.edu/~fdiebold/papers/paper49/Diebold-Li.pdf
They give the answer on p347 as 0.0609.
I would like to confirm this numerically in R. I.e. to take the derivative and find the value where it reaches zero.
library(numDeriv)
x <- 30
testh <- function(lambda){ ((1-exp(-lambda*30))/(lambda*30)) - exp(-lambda*30) }
grad_h <- function(lambda){
val <- grad(testh, lambda)
return(val^2)
}
OptLam <- optimize(f=grad_h, interval=c(0.0001,120), tol=0.0000000000001)
I take the square of the gradient as I want the minimum to be at zero.
Unfortunately, the answer comes back as Lambda=120!! With lambda at 120 the value of the objective function is 5.36e-12.
By working by hand I can func a lower value of the numerical derivative that is closer to zero (it is also close to the analytical value given above):
grad_h(0.05977604)
## [1] 4.24494e-12
Why is the function above not finding this lower value? I have set the tolerance very high so it should be able to find such this optimal value?
Is it possible to correct the existing method so that it gives the correct answer?
Is there a better way to find the maximum gradient of a function numerically in R?
For example is there an optimizer that looks for zero rather than trying to find a minimum of maximum?
You can use uniroot to find where the derivative is 0. This might work for you,
grad_h <- function(lambda){
val=grad(testh,lambda)
return(val)
}
## The root
res <- uniroot(grad_h, c(0,120), tol=1e-10)
## see it
ls <- seq(0.001, 1, length=1000)
plot(ls, testh(ls), col="salmon")
abline(v=res$root, col="steelblue", lwd=2, lty=2)
text(x=res$root, y=testh(res$root),
labels=sprintf("(%f, %s)", res$root,
format(testh(res$root), scientific = T)), adj=-0.1)

R minimize absolute error

Here's my setup
obs1<-c(1,1,1)
obs2<-c(0,1,2)
obs3<-c(0,0,3)
absoluteError<-function(obs,x){
return(sum(abs(obs-x)))
}
Example:
> absoluteError(obs2,1)
[1] 2
For a random vector of observations, I'd like to find a minimizer, x, which minimizes the absolute error between the observation values and a vector of all x. For instance, clearly the argument that minimizes absoluteError(obs1,x) is x=1 because this results in an error of 0. How do I find a minimizer for a random vector of observations? I'd imagine this is a linear programming problem, but I've never implemented one in R before.
The median of obs is a minimizer for the absolute error. The following is a sketch of how one might try proving this:
Let the median of a set of n observations, obs, be m. Call the absolute error between obs and m f(obs,m).
Case n is odd:
Consider f(obs,m+delta) where delta is some non zero number. Suppose delta is positive - then there are (n-1)/2 +1 observations whose error is delta more than f(obs,m). The remaining (n-1)/2 observations' error is at most delta less than f(obs,m). So f(obs,m+delta)-f(obs,m)>=delta. (The same argument can be made if delta is negative.) So the median is the only minimizer in this case. Thus f(obs,m+delta)>f(obs,m) for any non zero delta so m is a minimizer for f.
Case n is even:
Basically the same logic as above, except in this case any number between the two inner most numbers in the set will be a minimizer.
I am not sure this answer is correct, and even if it is I am not sure this is what you want. Nevertheless, I am taking a stab at it.
I think you are talking about 'Least absolute deviations', a form of regression that differs from 'Least Squares'.
If so, I found this R code for solving Least absolute deviations regression:
fabs=function(beta0,x,y){
b0=beta0[1]
b1=beta0[2]
n=length(x)
llh=0
for(i in 1:n){
r2=(y[i]-b0-b1*x[i])
llh=llh + abs(r2)
}
llh
}
g=optim(c(1,1),fabs,x=x,y=y)
I found the code here:
http://www.stat.colostate.edu/~meyer/hw12ans.pdf
Assuming you are talking about Least absolute deviations, you might not be interested in the above code if you want a solution in R from scratch rather than a solution that uses optim.
The above code is for a regression line with an intercept and one slope. I modified the code as follows to handle a regression with just an intercept:
y <- c(1,1,1)
x <- 1:length(y)
fabs=function(beta0,x,y){
b0=beta0[1]
b1=0
n=length(x)
llh=0
for(i in 1:n){
r2=(y[i]-b0-b1*x[i])
llh=llh + abs(r2)
}
llh
}
# The commands to get the estimator
g = optim(c(1),fabs,x=x,y=y, method='Brent', lower = (min(y)-5), upper = (max(y)+5))
g
I was not familiar with (i.e., had not heard of) Least absolute deviations until tonight. So, hopefully my modifications are fairly reasonable.
With y <- c(1,1,1) the parameter estimate is 1 (which I think you said is the correct answer):
$par
[1] 1
$value
[1] 1.332268e-15
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
With y <- c(0,1,2) the parameter estimate is 1:
$par
[1] 1
$value
[1] 2
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
With y <- c(0,0,3) the parameter estimate is 0 (which you said is the correct answer):
$par
[1] 8.613159e-10
$value
[1] 3
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
If you want R code from scratch, there is additional R code in the file at the link above which might be helpful.
Alternatively, perhaps it might be possible to extract the relevant code from the source file.
Alternatively, perhaps someone else can provide the desired code (and correct any errors on my part) in the next 24 hours.
If you come up with code from scratch please post it as an answer as I would love to see it myself.
lad=function(x,y){
SAD = function(beta, x, y) {
return(sum(abs(y - (beta[1] + beta[2] * x))))
}
d=lm(y~x)
ans1 = optim(par=c(d$coefficients[1], d$coefficients[2]),method = "Nelder-Mead",fn=SAD, x=x, y=y)
coe=setNames(ans1$par,c("(Intercept)",substitute(x)))
fitted=setNames(ans1$par[1]+ans1$par[2]*x,c(1:length(x)))
res=setNames(y-fitted,c(1:length(x)))
results = list(coefficients=coe, fitted.values=fitted, residuals=res)
class(results)="lad"
return(results)
}

Resources