I have the following function that draws some data from a chi-squared distribution and compares the distribution of X to a known chi-squared distribution using maximum likelihood. This procedure is simulated nSims times. (I compare these results to results from a permutation test, but that code is excluded.)
chi2c <- function(xdf=2, yObs=100, xObs=100, nSims=1000, nPerm=500, alpha=0.05){
simResults <- sapply(1:nSims, function(x){
# Draw variables
x <- rchisq(xObs, df=xdf)
# Other variables not relevant here
# [[snip]]
# Permutation test
# [[snip]]
# Calculate the statistics necessary for maximum likelihood
n <<- length(x)
sumlx <<- sum(log(x))
sumx <<- sum(x)
# Calculate the maximum likelihood estimate
dfhat <- optimize(f=c2ll, interval=c(1, 10), maximum=TRUE)$maximum
# Calculate the test statistic: -2 times the log likelihood ratio
llr <- -2 * (c2ll(2) - c2ll(dfhat))
# Compare the test statistic to its asymptotic dist: chi-squared
lReject <- llr > qchisq(1 - alpha, df=1)
# Provide the results
# [[snip]]
})
# Calcuate means across simulations
rowMeans(simResults)
}
This function calls c2ll, the chi-squared likelihood function
c2ll <- function(dfHat){
-n * log(gamma(dfHat/2)) - n * (dfHat/2) * log(2) +
(dfHat/2 - 1) * sumlx - sumx/2
}
This function does just what I would like and is accurate, but I don't understand why I have to set the maximum likelihood statistics (n, sumlx, and sumx) globally to get it to work; optimize doesn't find them if I only set them inside the function using <-. I tried setting them inside of optimize, but that didn't work either. Thanks for your help.
Charlie
R has lexical scoping, which means that functions look for variables in the environment in which they were defined. c2ll is defined in the global environment, so it doesn't see your definitions of n, sumx, and sum1x inside the function. S on the other hand uses dynamic scoping, which behaves as you expect (i.e. it looks for variables in the scope in which it was called). Computer scientists generally believe that dynamic scoping was a dead-end bad idea, and that lexical scoping is the way to go.
As a practical matter, what can you do about this?
Well, there are a couple options...
First, you can define your function locally:
n <- length(x)
sumlx <- sum(log(x))
sumx <- sum(x)
c2ll <- function(dfHat){
-n * log(gamma(dfHat/2)) - n * (dfHat/2) * log(2) + (dfHat/2 - 1) * sumlx - sumx/2
}
dfhat <- optimize(f=c2ll, interval=c(1, 10), maximum=TRUE)$maximum
Second, you can have c2ll take additional parameters, and pass those parameters through optimize.
#in global env
c2ll <- function(dfHat,n,sum1x,sumx){
-n * log(gamma(dfHat/2)) - n * (dfHat/2) * log(2) +
(dfHat/2 - 1) * sumlx - sumx/2
}
#...
#in function
n <- length(x)
sumlx <- sum(log(x))
sumx <- sum(x)
dfhat <- optimize(f=c2ll, interval=c(1, 10), n=n,sum1x=sum1x,sumx=sumx, maximum=TRUE)$maximum
Both are clean options that preserve the encapsulation of your functions.
Your simResults function returns a logical vector. Rather than using rowMeans, just use mean(simResults) and the results look reasonably sensible ... at least to the extent that:
> chi2c(alpha=0.05)
[1] 0.057
> chi2c(alpha=0.5)
[1] 0.503
Your problem stems from the lexical scoping rules that R uses. See more in the language definitions manual. In short your function c2ll is looking for the variables in the environment it was defined.
To avoid that problem you have to pass the n, sum1x and sum2x explicetly as arguments to your function, or define your function locally in the ch2c function directly.
This is quite a common question, there are a lot of interesting examples on SO.
Related
I am having two related problems with the Deriv() function from the Deriv package (from CRAN).
Problem (1): I believe that the code in the rule for differentiating dbinom() w.r.t "prob" is incorrect. Evidence for this (and my proposed correction) is shown in the following code. I would have preferred to attach text files containing the code, but as far as I can see there is no way to do this.
#
# Script demo01.R.
#
library(Deriv)
# Plot dbinom as a function of probabiity.
plot(function(p){dbinom(3,8,p)},from=0,to=1,xlab="parameter \"prob\"",
ylab="binomial probability",main="dbinom")
abline(v=3/8,col="red")
readline("Go? ")
# Plot the derivative of dbinom, with respect to prob, calculated by
# Deriv(), as a function of probability.
Ddb <- Deriv(dbinom,"prob")
plot(function(p){Ddb(3,8,p)},from=0,to=1,xlab="parameter \"prob\"",
ylab="",main="derivative of dbinom")
readline("Go? ")
# Replace what I believe to be incorrect code for the rule for
# differentiating dbinom() with what I believe to be correct code.
# This rule should, strictly speaking, be placed in a new environment
# rather than over-writing the existing rule, but this seems to
# break down when second derivatives are taken.
drule[["dbinom"]] <- alist(x=NULL,size=NULL,prob={
.e1 <- 1 - prob
.e2 <- size - 1
if (x == 0)
-(x * .e1^(x - 1))
else if (x == size)
prob^.e2 * size
else size*(dbinom(x-1,.e2,prob) - dbinom(x,.e2,prob)) *
(if (log) dbinom(x, size, prob) else 1)
})
# Plot the derivative of dbinom, with respect to prob, calculated by
# the corrected version of Deriv(), as a function of probability.
Ddb <- Deriv(dbinom,"prob")
plot(function(p){Ddb(3,8,p)},from=0,to=1,xlab="parameter \"prob\"",
ylab="",main="derivative of dbinom, corrected")
abline(v=3/8,col="red")
abline(h=0,col="blue")
You will observe that the derivative of dbinom()
should be positive for prob < 3/8 and negative for prob > 3/8. My corrected version has this property, whereas the derivative produced by the uncorrected version is negative everywhere.
Can anyone confirm that I am right about about there being a bug in the Deriv package? (I.e. that I am not making some sort of stupid mistake?)
Problem (2). I "crosschecked" the calculations performed by Deriv() by applying this function to a "roll-your-own" version of dbinom() for which no special rule is needed. I also applied the (corrected version) of Deriv() to dbinom(). The code that I used is as follows:
#
# Script demo02.
#
library(Deriv)
# Replace what I believe to be incorrect code for the rule for
# differentiating dbinom() with what I believe to be correct code.
# This rule should, strictly speaking, be placed in a new environment
# rather than over-writing the existing rule, but this seems to
# break down when second derivatives are taken.
drule[["dbinom"]] <- alist(x=NULL,size=NULL,prob={
.e1 <- 1 - prob
.e2 <- size - 1
if (x == 0)
-(x * .e1^(x - 1))
else if (x == size)
prob^.e2 * size
else size*(dbinom(x-1,.e2,prob) - dbinom(x,.e2,prob)) *
(if (log) dbinom(x, size, prob) else 1)
},log=NULL)
fooB1 <- function(x,prob,size) {
dbinom(x,size,prob)
}
fooB2 <- function(x,prob,size) {
choose(size,x)*prob^x*(1-prob)^(size-x)
}
dfooB1 <- Deriv(fooB1,"prob")
dfooB2 <- Deriv(fooB2,"prob")
d2fooB1 <- Deriv(fooB1,"prob",nderiv=2)
d2fooB2 <- Deriv(fooB2,"prob",nderiv=2)
vB1 <- fooB1(x=3,prob=0.6,size=8)
vB2 <- fooB2(x=3,prob=0.6,size=8)
dB1 <- dfooB1(x=3,prob=0.6,size=8)
dB2 <- dfooB2(x=3,prob=0.6,size=8)
d2B1 <- d2fooB1(x=3,prob=0.6,size=8)
d2B2 <- d2fooB2(x=3,prob=0.6,size=8)
If you run this code you will see that the function values (vB1 and vB2) agree, both having the value 0.123863. Likewise the first derivative values dB1 and dB2 agree: -0.9289728.
However the second derivatives disagree. The value of d2B1 is 1.769472, whereas the value of d2B2 is 2.064384. I have no idea which (if either) of these answers is correct.
Something (the chain rule?) is not working as it should.
Is there any action that I can take to resolve this discrepancy?
I am learning how to draw a log-likelihood graph. Please allow me briefly introduce what I want to do specifically:
Assume we have the data/vector as below:
set.seed(123)
sample <- rpois(50, 1.65)
And the log_like function is given as below:
log_like_graph <- function(lambda){
X <- as.matrix(sample) # not sure whether this is necessary for one-parameter distribution.
N <- nrow(X)
logLik <- N*log(lambda) - lambda*N*mean(X)
return(loglik)
}
log_like_graph <- Vectorize(log_like_graph)
# set range of lambda
lambda_vals <- seq(-10,10,by=1)
log_vals <- outer(lambda_vals,log_like_graph)
Based on the above lambda_vals and log_vals, I expect to produce a plot like below:
However, when I excute the last command: log_vals <- outer(lambda_vals,log_like_graph), I got the error hint
Error in as.vector(x, mode) :
cannot coerce type 'closure' to vector of type 'any'
Could you please help me solve this problem? Thank you very much!
(FYI: I mainly follow the youtube video https://www.youtube.com/watch?v=w3drLH-DFpE&ab_channel=CalebLikesR that teaches to draw the curve for a log-likelihood function, although it uses normal distribution for demonstration.)
A couple of things I see; no need to vectorise log_like_graph as you can just pass lambda values into it with sapply rather than outer, you are passing lambda_vals < 0 but the support of lambda is >= 0, and I don't think your log-likelihood function is correct (I think it should be -N * lambda - sum(lfactorial(sample)) + log(lambda) * sum(X) but it is easier/more accurate to use dpois(..., log=TRUE)).
So fixing these things
# data
set.seed(123)
samples <- rpois(50, 1.65)
# The log-likelihood becomes
log_like_graph <- function(X, lambda){
N <- NROW(X)
logLik <- -N * lambda - sum(lfactorial(X)) + log(lambda) * sum(X)
return(logLik)
}
# set lambda >= 0 and take smaller steps (0.01) for a smoother curve
lambda_vals <- seq(0,10,by=0.01)
# loop through lambda values calculating the log-likehood at each value
ll1 <- sapply(lambda_vals, function(i) log_like_graph(samples, i))
plot(lambda_vals, ll1, type="l")
This can also be done with dpois(..., log=TRUE) :
ll2 <- sapply(lambda_vals, function(i) sum(dpois(samples, lambda=i, log=TRUE)))
all.equal(ll1, ll2)
# [1] TRUE
I'm implementing a Maximum-Likelihood estimation in R for a three parameter reverse Weibull model and have some troubles to get plausible results, which include:
Bad optimization results, unwanted optimx behaviour. Beside these I wonder, how I could make use of parscale in this model.
Here is my implementation attempt:
To generate data I use the probabilty integral transform:
#Generate N sigma*RWei(alph)-mu distributed points
gen.wei <- function(N, theta) {
alph <- theta[1]
mu <- theta[2]
sigma <- theta[3]
return(
mu - sigma * (- log (runif(N)))**(1/alph)
)
}
Now I define the Log-Likelihood and negative Log-Likelihood to use optimx optimization:
#LL----
ll.wei <- function(theta,x) {
N <- length(x)
alph <- theta[1]
mu <- theta[2]
sigma <- theta[3]
val <- sum(ifelse(
x <= mu,
log(alph/sigma) + (alph-1) * log( (mu-x)/sigma) - ( (mu-x)/sigma)**(alph-1),
-Inf
))
return(val)
}
#Negative LL----
nll.wei <- function(theta,x) {
return(-ll.wei(theta=theta, x=x))
}
Afterwards I define the analytical gradient of the negative LL. Remark: There are points at which the negative LL isn't differentiable (the upper end-point mu)
gradnll.wei <- function(theta,x) {
N <- length(x)
alph <- theta[1]
mu <- theta[2]
sigma <- theta[3]
argn <- (mu-x)/sigma
del.alph <- sum(ifelse(x <= mu,
1/alph + log(argn) - log(argn) * argn**(alph-1),
0
))
del.mu <- sum(ifelse(x <= mu,
(alph-1)/(mu-x) - (alph-1)/sigma * argn**(alph-2),
0))
del.sigma <- sum(ifelse(x <= mu,
((alph-1)*argn**(alph-1)-alph)/sigma,
0))
return (-c(del.alph, del.mu, del.sigma))
}
Finally I try to optimize using the optimx package and the methods Nelder-Mead (derivative free) and BFGS (my LL is kinda smooth, there's just one point, which is problematic).
#MLE for Weibull
mle.wei <- function(start,sample) {
optimx(
par=start,
fn = nll.wei,
gr = gradnll.wei,
method = c("BFGS"),
x = sample
)
}
theta.s <- c(4,1,1/2) #test for parameters
sample <- gen.wei(100, theta.s) #generate 100 data points distributed like theta.s
mle.wei(start=c(8,4, 2), sample) #MLE Estimation
To my surprise I get the following error:
Error in optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower, :
Cannot evaluate function at initial parameters
I checked manually: Both nll and gradnll are finite at the initial parameters...
If i switch to optim instead of optimx I get a result, but a pretty bad one:
$par
[1] 8.178674e-01 9.115766e-01 1.745724e-06
$value
[1] -1072.786
$counts
function gradient
574 100
$convergence
[1] 1
$message
NULL
So it doesn't converge. If I don't supply the gradient to BFGS, there isn't a result. If I use Nelder-Mead instead:
$par
[1] 1.026393e+00 9.649121e-01 9.865624e-18
$value
[1] -3745.039
$counts
function gradient
502 NA
$convergence
[1] 1
$message
NULL
So it is also very bad...
My questions are:
Should I instead of defining the ll outside of the support as -Inf give it a very high negative value like -1e20 to circumvent -Inf errors or does it not matter?
Like the first one but for the gradient: technically the ll isn't defined outside of the support but since the likelihood is 0 albeit constant outside of the support, is it smart to define the gradnll as 0 outside?
3.I checked the implementation of the MLE estimator fgev from the evd package and saw that they use the BFGS method but don't supply the gradient even though the gradient does exist. Therefore my question is, whether there are situations where it is contraproductive to supply the gradient since it isn't defined everywhere (like my and the evd case)?
I got an error of "argument x matches multiple formal arguments" type in optimx but not in optim, which surprised me. What am I doing wrong in supplying my functions and data to the optimx function?
Thank you very much in advance!
Re 3: That's kind of a bug in optimx, but one that's hard to avoid. It uses x as a variable name when calculating a numerical gradient; you also use it as an "additional parameter" to your functions. You can work around that by renaming your argument, e.g. call it xdata in your functions.
Re 1 & 2: There are several techniques to handle boundary problems in optimization. Setting to a big constant value tends not to work: if the optimizer goes out of bounds, it finds the objective function really flat. If the exact boundary is legal, then pushing the parameter to the boundary and adding a penalty sometimes works. If the exact boundary is illegal, you might be able to reflect: e.g. if mu > 0 is a requirement, sometimes replacing mu by abs(mu) in the objective function gets things to work. Sometimes the best solution is to get rid of the boundary by transforming the parameters.
Edited to add some more details:
For this problem, it looks to me as though transformations of the parameters might work. I think alpha and sigma must both be positive. Setting alpha <- exp(theta[1]) and sigma <- exp(theta[3]) will guarantee that. Limits on mu are harder, but I think mu > max(xdata) is needed, so mu <- max(xdata) + exp(theta[2]) should keep it in bounds. Of course, making these changes messes up your gradient formula and starting values.
As to resources: I'm afraid I don't know any. This advice is based on years of painful experience.
https://web.ncf.ca/nashjc/optimx202112/ has a version of the package that deals with at least some variable clashes in the dot args.
There are some separate cleanups to be done before this goes on CRAN, but
the package should be more or less robust at the moment.
JN
in numerical analysis we students are obligated to implement code in R that given a function f(x) finds its Fourier interpolation tN(x) and computes the interpolation error
$||f(x)-t_{N}(x)||=\int_{0}^{2\pi}$ $|f(x)-t_{N}(x)|^2$
or a variety of different $N$
I first tried to compute the d-coefficients according to this formular:
$d = \frac 1N M y$
with M denoting the DFT matrix and y denoting a series of equidistant function values with
$y_j = f(x_j)$ and
$x_j = e^{\frac{2*pi*i}N*j}$
for $j = 1,..,N-1$.
My goal was to come up with a sum that can be described by:
$t_{N}(x) = \Sigma_{k=0}^{N-1} d_k * e^{i*k*x}$
Which would be easier to later integrate in sort of a subsequently additive notation.
f <- function(x) 3/(6+4*cos(x)) #first function to compare with
g <- function(x) sin(32*x) #second one
xj <- function(x,n) 2*pi*x/n
M <- function(n){
w = exp(-2*pi*1i/n)
m = outer(0:(n-1),0:(n-1))
return(w^m)
}
y <- function(n){
f(xj(0:(n-1),n))
}
transformFunction <- function(n, f){
d = 1/n * t(M(n)) %*% f(xj(0:(n-1),n))
script <- paste(d[1])
for(i in 2:n)
script <- paste0(script,paste0("+",d[i],"*exp(1i*x*",i,")"))
#trans <- sum(d[1:n] * exp(1i*x*(0:(n-1))))
return(script)
}
The main purpose of the transform function was, initially, to return a function - or rather: a mathematical expression - which could then be used in order to declarate my Fourier Interpolation Function. Problem is, based on my fairly limited knowledge, that I cannot integrate functions that still have sums nested in them (which is why I commented the corresponding line in the code).
Out of absolute desperation I then tried to paste each of the summands in form of text subsequently, only to parse them again as an expression.
So the main question that remains is: how do I return mathmatical expressions in a manner that allow me to use them as a function and later on integrate them?
I am sincerely sorry for any misunderstanding or confusion, as well as my seemingly amateurish coding.
Thanks in advance!
A function in R can return any class, so specifically also objects of class function. Hence, you can make trans a function of x and return that.
Since the integrate function requires a vectorized function, we use Vectorize before outputting.
transformFunction <- function(n, f){
d = 1/n * t(M(n)) %*% f(xj(0:(n-1),n))
## Output function
trans <- function(x) sum(d[1:n] * exp(1i*x*(0:(n-1))))
## Vectorize output for the integrate function
Vectorize(trans)
}
To integrate, now simply make a new variable with the output of transformFunction:
myint <- transformFunction(n = 10,f = f)
Test: (integrate can only handle real-valued functions)
integrate(function(x) Re(myint(x)),0,2)$value +
1i*integrate(function(x) Im(myint(x)),0,2)$value
# [1] 1.091337-0.271636i
im now performing Location Model using non-parametric smoothing to estimate the paramneters.....one of the smoothed paramater is the lamdha that i have to optimize...
so in that case, i decide to use "nlminb function" to achieve it.....
however, my programing give me the same "$par" value even though it was iterate 150 time and make 200 evaluation (by default)..... which is it choose "the start value as $par" (that is 0.000001 ...... i think, there must be something wrong with my written program....
my programing look like:- (note: w is the parameter that i want to optimize and LOO is
stand for leave-one-out
BEGIN
Myfunc <- function(w, n1, n2, v1, v2, g)
{ ## open loop for main function
## DATA generation
# generate data from group 1 and 2
# for each group: discretise the continuous to binary
# newdata <- combine the groups 1 and 2
## MODEL construction
countError <- 0
n <- nrow(newdata)
for (k in 1:n)
{# open loop for leave-one-out
# construct model based on n-1 object using smoothing method
# classify omitted object
countError <- countError + countE
} # close loop for LOO process
Error <- countError / n # error rate counted from LOO procedure
return(Error) # The Average ERROR Rate from LOO procedure
} # close loop for Myfunc
library(stats)
nlminb(start=0.000001, Myfunc, lower=0.000001, upper=0.999999,
control=list(eval.max=100, iter.max=100))
END
could someone help me......
your concerns and guidances is highly appreciated and really100 needed......
Hashibah,
Statistic PhD Student
In your question, provide a nlminb with a univariate starting value. If you are doing univariate optimisation, it is probably worth looking at optimize. If your function is multivariate, then you need to call nlminb slightly differently.
You need define the objective function such that you provide the parameters to optimize over as a vector which is the first argument. Other inputs to the objective function should be provided as subsequent arguments.
For example (modified from the nlminb help page):
X <- rnbinom(100, mu = 10, size = 10)
hdev <- function(par, x) {
-sum(dnbinom(x, mu = par[1], size = par[2], log = TRUE))
}
nlminb(start = c(9, 12), hdev, x = X)