Prioblems with the "Deriv" R package - r

I am having two related problems with the Deriv() function from the Deriv package (from CRAN).
Problem (1): I believe that the code in the rule for differentiating dbinom() w.r.t "prob" is incorrect. Evidence for this (and my proposed correction) is shown in the following code. I would have preferred to attach text files containing the code, but as far as I can see there is no way to do this.
#
# Script demo01.R.
#
library(Deriv)
# Plot dbinom as a function of probabiity.
plot(function(p){dbinom(3,8,p)},from=0,to=1,xlab="parameter \"prob\"",
ylab="binomial probability",main="dbinom")
abline(v=3/8,col="red")
readline("Go? ")
# Plot the derivative of dbinom, with respect to prob, calculated by
# Deriv(), as a function of probability.
Ddb <- Deriv(dbinom,"prob")
plot(function(p){Ddb(3,8,p)},from=0,to=1,xlab="parameter \"prob\"",
ylab="",main="derivative of dbinom")
readline("Go? ")
# Replace what I believe to be incorrect code for the rule for
# differentiating dbinom() with what I believe to be correct code.
# This rule should, strictly speaking, be placed in a new environment
# rather than over-writing the existing rule, but this seems to
# break down when second derivatives are taken.
drule[["dbinom"]] <- alist(x=NULL,size=NULL,prob={
.e1 <- 1 - prob
.e2 <- size - 1
if (x == 0)
-(x * .e1^(x - 1))
else if (x == size)
prob^.e2 * size
else size*(dbinom(x-1,.e2,prob) - dbinom(x,.e2,prob)) *
(if (log) dbinom(x, size, prob) else 1)
})
# Plot the derivative of dbinom, with respect to prob, calculated by
# the corrected version of Deriv(), as a function of probability.
Ddb <- Deriv(dbinom,"prob")
plot(function(p){Ddb(3,8,p)},from=0,to=1,xlab="parameter \"prob\"",
ylab="",main="derivative of dbinom, corrected")
abline(v=3/8,col="red")
abline(h=0,col="blue")
You will observe that the derivative of dbinom()
should be positive for prob < 3/8 and negative for prob > 3/8. My corrected version has this property, whereas the derivative produced by the uncorrected version is negative everywhere.
Can anyone confirm that I am right about about there being a bug in the Deriv package? (I.e. that I am not making some sort of stupid mistake?)
Problem (2). I "crosschecked" the calculations performed by Deriv() by applying this function to a "roll-your-own" version of dbinom() for which no special rule is needed. I also applied the (corrected version) of Deriv() to dbinom(). The code that I used is as follows:
#
# Script demo02.
#
library(Deriv)
# Replace what I believe to be incorrect code for the rule for
# differentiating dbinom() with what I believe to be correct code.
# This rule should, strictly speaking, be placed in a new environment
# rather than over-writing the existing rule, but this seems to
# break down when second derivatives are taken.
drule[["dbinom"]] <- alist(x=NULL,size=NULL,prob={
.e1 <- 1 - prob
.e2 <- size - 1
if (x == 0)
-(x * .e1^(x - 1))
else if (x == size)
prob^.e2 * size
else size*(dbinom(x-1,.e2,prob) - dbinom(x,.e2,prob)) *
(if (log) dbinom(x, size, prob) else 1)
},log=NULL)
fooB1 <- function(x,prob,size) {
dbinom(x,size,prob)
}
fooB2 <- function(x,prob,size) {
choose(size,x)*prob^x*(1-prob)^(size-x)
}
dfooB1 <- Deriv(fooB1,"prob")
dfooB2 <- Deriv(fooB2,"prob")
d2fooB1 <- Deriv(fooB1,"prob",nderiv=2)
d2fooB2 <- Deriv(fooB2,"prob",nderiv=2)
vB1 <- fooB1(x=3,prob=0.6,size=8)
vB2 <- fooB2(x=3,prob=0.6,size=8)
dB1 <- dfooB1(x=3,prob=0.6,size=8)
dB2 <- dfooB2(x=3,prob=0.6,size=8)
d2B1 <- d2fooB1(x=3,prob=0.6,size=8)
d2B2 <- d2fooB2(x=3,prob=0.6,size=8)
If you run this code you will see that the function values (vB1 and vB2) agree, both having the value 0.123863. Likewise the first derivative values dB1 and dB2 agree: -0.9289728.
However the second derivatives disagree. The value of d2B1 is 1.769472, whereas the value of d2B2 is 2.064384. I have no idea which (if either) of these answers is correct.
Something (the chain rule?) is not working as it should.
Is there any action that I can take to resolve this discrepancy?

Related

Solve non-linear equations using "nleqslv" package

I tried to solve the these non-linear equations by using nleqslv. However it does not work well. I do know the reason why it does not because I didn't separate the two unknowns to different sides of the equation.
My questions are: 1, Are there any other packages that could solve this kind of
equations?
2, Is there any effective way in R that could help me rearrange
the equation so that it meets the requirement of the package
nleqslv?
Thank you guys.
Here are the codes and p[1] and p[2] are the two unknowns I want to solve.
dslnex<-function(p){
p<-numeric(2)
0.015=sum(exp(Calib2$Median_Score*p[1]+p[2])*weight_pd_bad)
cum_dr<-0
for (i in 1:length(label)){
cum_dr[i]<-exp(Calib2$Median_Score*p[1]+p[2][1:i]*weight_pd_bad[1:i]/0.015
}
mid<-0
for (i in 1:length(label)){
mid[i]<-sum(cum_dr[1:i])/2
}
0.4=(sum(mid*weight_pd_bad)-0.5)/(0.5*(1-0.015))
}
pstart<-c(-0.000679354,-4.203065891)
z<- nleqslv(pstart, dslnex, jacobian=TRUE,control=list(btol=.01))
Following up on my comment I have rewritten your function as follows correcting errors and inefficiencies.
Errors and other changes are given as inline comments.
# no need to use dslnex as name for your function
# dslnex <- function(p){
# any valid name will do
f <- function(p) {
# do not do this
# you are overwriting p as passed by nleqslv
# p<-numeric(2)
# declare retun vector
y <- numeric(2)
y[1] <- 0.015 - (sum(exp(Calib2$Median_Score*p[1]+p[2])*weight_pd_bad))
# do not do this
# cum_dr is initialized as a scalar and will be made into a vector
# which will be grown as a new element is inserted (can be very inefficient)
# cum_dr<-0
# so declare cum_dr to be a vector with length(label) elements
cum_dr <- numeric(length(label))
for (i in 1:length(label)){
cum_dr[i]<-exp(Calib2$Median_Score*p[1]+p[2][1:i]*weight_pd_bad[1:i]/0.015
}
# same problem as above
# mid<-0
mid <- numeric(length(label))
for (i in 1:length(label)){
mid[i]<-sum(cum_dr[1:i])/2
}
y[2] <- 0.4 - (sum(mid*weight_pd_bad)-0.5)/(0.5*(1-0.015))
# return vector y
y
}
pstart <-c(-0.000679354,-4.203065891)
z <- nleqslv(pstart, dslnex, jacobian=TRUE,control=list(btol=.01))
nleqslv is intended for solving systems of equations of the form f(x) = 0 which must be square.
So a function must return a vector with the same size as the x-vector.
You should now be able to proceed provided your system of equations has a solution. And provided there are no further errors in your equations. I have my doubles about the [1:i] in the expression for cum_dr and the expression for mid[i]. The loop calculating mid possibly can be written as a single statement: mid <- cumsum(cum_dr)/2. Up to you.

Vectorising two similar functions in R works for one

Today, I came across a problem: two almost identical functions work as intended before vectorisation, but after it, one works fine, and another one returns an error.
I am examining the robustness of various estimators with respect to different transformations of residuals and aggregating functions. Quantile Regression and Least Median of Squares are particular cases of what I am doing.
So I wrote the following code to see how the Least Trimean of Squares is going to work and found out that it works fine if model parameters are supplied as different arguments, but fails if they come in a vector. For instance, I need the first function for plotting (it is convenient to use outer(...) to get a value matrix for persp or just supply f(x, y) to persp3d from library(rgl), but the second one for estimation (R optimisers are expecting a vector of inputs as the first argument over which the minimisation is going to be done).
MWE:
set.seed(105)
N <- 204
x <- rlnorm(N)
y <- 1 + x + rnorm(N)*sqrt(.1+.2*x+.3*x^2)
# A simple linear model with heteroskedastic errors
resfun <- function(x) return(x^2)
# Going to minimise a function of squared residuals...
distfun <- function(x) return(mean(quantile(x, c(0.25, 0.5, 0.5, 0.75))))
# ...which in this case is the trimean
penalty <- function(theta0, theta1) {
r <- y - theta0 - theta1*x
return(distfun(resfun(r)))
}
pen2 <- function(theta) {
r <- y - theta[1] - theta[2]*x
return(distfun(resfun(r)))
}
penalty(1, 1) # 0.5352602
pen2(c(1, 1)) # 0.5352602
vpenalty <- Vectorize(penalty)
vpen2 <- Vectorize(pen2)
vpenalty(1, 1) # 0.5352602
vpen2(c(1, 1))
Error in quantile.default(x, c(0.25, 0.5, 0.5, 0.75)) :
missing values and NaN's not allowed if 'na.rm' is FALSE
Why does vpen2, being vectorised pen2, choke even on a single input?
As jogo pointed out, vpen2 reads the elements of the input vector and tries to take the first one. The right way to go is to use something like
a <- matrix(..., ncol=2)
apply(a, 1, pen2)
This will return a vector of values from vpar2 evaluated at each row of the matrix.

To find the distance between two roots in R

Suppose I have a function f(x) that is well defined on an interval I. I want to find the greatest and smallest roots of f(x), then taking the difference of them. What is a good way to program it?
To be precise, f can at worst be a rational function like (1+x)/(1-x). It should be a (high degree) polynomial most of the times. I only need to know the result numerically to some precision.
I am thinking about the following:
Convert f(x) into a form recognizable by R. (I can do)
Use R to list all roots of f(x) on I (I found the uniroot function only give me one root)
Use R to to find the maximum and minimum elements in the list (should be possible once I converted it to a vector)
Taking the difference of the two roots. (should be trivial)
I am stuck on step (2) and I do not know what to do. My professor give a brutal force solution, suggesting me to do:
Divide interval I into one million pieces.
Evaluate f on each end points, find the end points where f>=0.
Choose the maximum and minimum elements from the set formed in step 2.
Take the difference between them.
I feel this way is not very efficient and might not work for all f in general, but I am having trouble to implement it even for quadratics. I do not know how to do step (2) as well. So I want to ask for a hint or some toy examples.
At this point I am trying to implement the following code:
Y=rep(0,200)
dim(Y)=c(100,2)
for(i in 1:100){
X=rnorm(9,0,1)
Z=rnorm(16,0,1)
a=0.64
b=a*sum(Z^2)/sum(X^2)
root_intervals <- function(f, interval, n = 1e6) {
xvals <- seq(interval[1], interval[2], length = n)
yvals <- f(xvals)
ypos <- yvals > 0
x1 <- which(tail(ypos, -1) != head(ypos, -1))
x2 <- x1 + 1
## so all the zeroes we can see are between x1 and x2
return(cbind(xvals[x1], xvals[x2]))
}
at here everything is okay, but when I try to extract the roots to Y[i,1], Y[i,2] by
Y[i,1]=(ri<-root intervals(function(x)(x/(a*x+b))^{9/2}*(1/((1-a)+a*(1-a)/b*x))^4-0.235505, c(0,40),n=1e6)[1]
I found I cannot evaluate it anymore. R keep telling me
Error: unexpected symbol in:
"}
Y[i,1]=(ri<-root intervals"
and I got stuck. I really appreciate everyone's help as I am feeling lost.
I checked the function's expression many times using the plot function and it has no grammar mistakes. Also I believe it is well defined for all X in the interval.
This should give you a good start on the brute force solution. You're right, it's not elegant, but for relatively simple univariate functions, evaluating 1 million points is trivial.
root_intervals <- function(f, interval, n = 1e6) {
xvals <- seq(interval[1], interval[2], length = n)
yvals <- f(xvals)
ypos <- yvals > 0
x1 <- which(ypos[-1] != head(ypos, -1))
x2 <- x1 + 1
## so all the zeroes we can see are between x1 and x2
return(cbind(xvals[x1], xvals[x2]))
}
This function returns a two column matrix of x values, where the function changes sign between column 1 and column 2:
f1 <- function (x) 0.05 * x^5 - 2 * x^4 + x^3 - x^2 + 1
> (ri <- root_intervals(f1, c(-10, 10), n = 1e6))
[,1] [,2]
[1,] -0.6372706 -0.6372506
[2,] 0.8182708 0.8182908
> f1(ri)
[,1] [,2]
[1,] -3.045326e-05 6.163467e-05
[2,] 2.218895e-05 -5.579081e-05
Wolfram Alpha confirms results on the specified interval.
The top and bottom rows will be the min and max intervals found. These intervals (over which the function changes sign) are precisely what uniroot wants for it's interval, so you could use it to solve for the (more) exact roots. Of course, if the function changes sign twice within one interval (or any even number of times), it won't be picked up, so choose a big n!
Response to edited question:
Looks like your trying to define a bunch of functions, but your edits have syntax errors. Here's what I think you're trying to do: (this first part might take some more work to work right)
my_funs <- list()
Y=rep(0,200)
dim(Y)=c(100,2)
for(i in 1:100){
X=rnorm(9,0,1)
Z=rnorm(16,0,1)
a=0.64
b=a*sum(Z^2)/sum(X^2)
my_funs[[i]] <- function(x){(x/(a*x+b))^{9/2}*(1/((1-a)+a*(1-a)/b*x))^4-0.235505}
}
Here's using the root_intervals on the first of your generated functions.
> root_intervals(my_funs[[1]], interval = c(0, 40))
[,1] [,2]
[1,] 0.8581609 0.8582009
[2,] 11.4401314 11.4401714
Notice the output, a matrix, with the roots of the function being between the first and second columns. Being a matrix, you can't assign it to a vector. If you want a single root, use uniroot using each row to set the upper and lower bounds. This is left as an exercise to the reader.

Errors when attempting constrained optimisation using optim()

I have been using the Excel solver to handle the following problem
solve for a b and c in the equation:
y = a*b*c*x/((1 - c*x)(1 - c*x + b*c*x))
subject to the constraints
0 < a < 100
0 < b < 100
0 < c < 100
f(x[1]) < 10
f(x[2]) > 20
f(x[3]) < 40
where I have about 10 (x,y) value pairs. I minimize the sum of abs(y - f(x)). And I can constrain both the coefficients and the range of values for the result of my function at each x.
I tried nls (without trying to impose the constraints) and while Excel provided estimates for almost any starting values I cared to provide, nls almost never returned an answer.
I switched to using optim, but I'm having trouble applying the constraints.
This is where I have gotten so far-
best = function(p,x,y){sum(abs(y - p[1]*p[2]*p[3]*x/((1 - p[3]*x)*(1 - p[3]*x + p[2]*p[3]*x))))}
p = c(1,1,1)
x = c(.1,.5,.9)
y = c(5,26,35)
optim(p,best,x=x,y=y)
I did this to add the first set of constraints-
optim(p,best,x=x,y=y,method="L-BFGS-B",lower=c(0,0,0),upper=c(100,100,100))
I get the error ""ERROR: ABNORMAL_TERMINATION_IN_LNSRCH"
and end up with a higher value of the error ($value). So it seems like I am doing something wrong. I couldn't figure out how to apply my other set of constraints at all.
Could someone provide me a basic idea how to solve this problem that a non-statistician can understand? I looked at a lot of posts and looked in a few R books. The R books stopped at the simplest use of optim.
The absolute value introduces a singularity:
you may want to use a square instead,
especially for gradient-based methods (such as L-BFGS).
The denominator of your function can be zero.
The fact that the parameters appear in products
and that you allow them to be (arbitrarily close to) zero
can also cause problems.
You can try with other optimizers
(complete list on the optimization task view),
until you find one for which the optimization converges.
x0 <- c(.1,.5,.9)
y0 <- c(5,26,35)
p <- c(1,1,1)
lower <- 0*p
upper <- 100 + lower
f <- function(p,x=x0,y=y0) sum(
(
y - p[1]*p[2]*p[3]*x / ( (1 - p[3]*x)*(1 - p[3]*x + p[2]*p[3]*x) )
)^2
)
library(dfoptim)
nmkb(p, f, lower=lower, upper=upper) # Converges
library(Rvmmin)
Rvmmin(p, f, lower=lower, upper=upper) # Does not converge
library(DEoptim)
DEoptim(f, lower, upper) # Does not converge
library(NMOF)
PSopt(f, list(min=lower, max=upper))[c("xbest", "OFvalue")] # Does not really converge
DEopt(f, list(min=lower, max=upper))[c("xbest", "OFvalue")] # Does not really converge
library(minqa)
bobyqa(p, f, lower, upper) # Does not really converge
As a last resort, you can always use a grid search.
library(NMOF)
r <- gridSearch( f,
lapply(seq_along(p), function(i) seq(lower[i],upper[i],length=200))
)

'optimize' not finding variables inside a function call

I have the following function that draws some data from a chi-squared distribution and compares the distribution of X to a known chi-squared distribution using maximum likelihood. This procedure is simulated nSims times. (I compare these results to results from a permutation test, but that code is excluded.)
chi2c <- function(xdf=2, yObs=100, xObs=100, nSims=1000, nPerm=500, alpha=0.05){
simResults <- sapply(1:nSims, function(x){
# Draw variables
x <- rchisq(xObs, df=xdf)
# Other variables not relevant here
# [[snip]]
# Permutation test
# [[snip]]
# Calculate the statistics necessary for maximum likelihood
n <<- length(x)
sumlx <<- sum(log(x))
sumx <<- sum(x)
# Calculate the maximum likelihood estimate
dfhat <- optimize(f=c2ll, interval=c(1, 10), maximum=TRUE)$maximum
# Calculate the test statistic: -2 times the log likelihood ratio
llr <- -2 * (c2ll(2) - c2ll(dfhat))
# Compare the test statistic to its asymptotic dist: chi-squared
lReject <- llr > qchisq(1 - alpha, df=1)
# Provide the results
# [[snip]]
})
# Calcuate means across simulations
rowMeans(simResults)
}
This function calls c2ll, the chi-squared likelihood function
c2ll <- function(dfHat){
-n * log(gamma(dfHat/2)) - n * (dfHat/2) * log(2) +
(dfHat/2 - 1) * sumlx - sumx/2
}
This function does just what I would like and is accurate, but I don't understand why I have to set the maximum likelihood statistics (n, sumlx, and sumx) globally to get it to work; optimize doesn't find them if I only set them inside the function using <-. I tried setting them inside of optimize, but that didn't work either. Thanks for your help.
Charlie
R has lexical scoping, which means that functions look for variables in the environment in which they were defined. c2ll is defined in the global environment, so it doesn't see your definitions of n, sumx, and sum1x inside the function. S on the other hand uses dynamic scoping, which behaves as you expect (i.e. it looks for variables in the scope in which it was called). Computer scientists generally believe that dynamic scoping was a dead-end bad idea, and that lexical scoping is the way to go.
As a practical matter, what can you do about this?
Well, there are a couple options...
First, you can define your function locally:
n <- length(x)
sumlx <- sum(log(x))
sumx <- sum(x)
c2ll <- function(dfHat){
-n * log(gamma(dfHat/2)) - n * (dfHat/2) * log(2) + (dfHat/2 - 1) * sumlx - sumx/2
}
dfhat <- optimize(f=c2ll, interval=c(1, 10), maximum=TRUE)$maximum
Second, you can have c2ll take additional parameters, and pass those parameters through optimize.
#in global env
c2ll <- function(dfHat,n,sum1x,sumx){
-n * log(gamma(dfHat/2)) - n * (dfHat/2) * log(2) +
(dfHat/2 - 1) * sumlx - sumx/2
}
#...
#in function
n <- length(x)
sumlx <- sum(log(x))
sumx <- sum(x)
dfhat <- optimize(f=c2ll, interval=c(1, 10), n=n,sum1x=sum1x,sumx=sumx, maximum=TRUE)$maximum
Both are clean options that preserve the encapsulation of your functions.
Your simResults function returns a logical vector. Rather than using rowMeans, just use mean(simResults) and the results look reasonably sensible ... at least to the extent that:
> chi2c(alpha=0.05)
[1] 0.057
> chi2c(alpha=0.5)
[1] 0.503
Your problem stems from the lexical scoping rules that R uses. See more in the language definitions manual. In short your function c2ll is looking for the variables in the environment it was defined.
To avoid that problem you have to pass the n, sum1x and sum2x explicetly as arguments to your function, or define your function locally in the ch2c function directly.
This is quite a common question, there are a lot of interesting examples on SO.

Resources