Analytical gradient for bisection method nested within objective function - r

I'm attempting to fit parameters to a data set using optim() in R. The objective function requires iterative root-solving for equation G so that the predicted values p brings the values for G (nested within the objective function) to 0 (or as close as 0 to possible; I use 50 iterations of the bisection method for stability).
Here is the problem: I would really prefer to include an analytical gradient for optim(), but I suspect it isn't possible for an iterated function. However, before I give up on the analytical gradient, I wanted to run this problem by everyone here and see if there might be a solution I'm overlooking. Any thoughts?
Note: before settling on the bisection method, I tried other root-solving methods, but all non-bracketing methods (Newton, etc.) seem to be unstable.
Below is a reproducible example of the problem. With the provided data set and the starting values for optim(), the algorithm converges just fine without an analytical gradient, but it doesn't perform so well for other sets of data and starting values.
#the data set includes two input variables (x1 and x2)
#the response values are k successes out of n trials
x1=c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1.5, 1.5, 1.5, 1.5, 1.75, 1.75, 1.75, 1.75, 2, 2,
2, 2, 2.25, 2.25, 2.25, 2.25, 2.5, 2.5, 2.5, 2.5, 2.75, 2.75,
2.75, 2.75, 3, 3, 3, 3, 3.25, 3.25, 3.25, 3.25, 3.5, 3.5, 3.5,
3.5, 3.75, 3.75, 3.75, 3.75, 4, 4, 4, 4, 4.25, 4.25, 4.25, 4.25,
4.5, 4.5, 4.5, 4.5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1.5, 1.5, 1.5, 1.5,
1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5,
1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.75, 1.75,
1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75,
1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75,
1.75, 1.75, 1.75, 1.75, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2.25, 2.25, 2.25,
2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25,
2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25,
2.25, 2.25, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,
2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,
2.5, 2.5, 2.5, 2.5, 2.5, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75,
2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75,
2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25,
3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25,
3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.5, 3.5,
3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5,
3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5,
3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75,
3.75, 3.75, 3.75, 3.75, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25,
4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25,
4.25, 4.25, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5,
4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5)
x2=c(0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15,
0.15, 0.2, 0.2, 0.2, 0.2, 0.233, 0.233, 0.233, 0.267, 0.267,
0.267, 0.267, 0.3, 0.3, 0.3, 0.3, 0.333, 0.333, 0.333, 0.333,
0.367, 0.367, 0.367, 0.367, 0.4, 0.4, 0.4, 0.4, 0.433, 0.433,
0.433, 0.433, 0.467, 0.467, 0.467, 0.5, 0.5, 0.5, 0.5, 0.55,
0.55, 0.55, 0.55, 0.6, 0.6, 0.6, 0.6, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.267,
0.267, 0.267, 0.267, 0.333, 0.333, 0.333, 0.333, 0.4, 0.4, 0.4,
0.4, 0.467, 0.467, 0.467, 0.467, 0.55, 0.55, 0.55, 0.55, 0.15,
0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3, 0.3, 0.3,
0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433, 0.433,
0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2,
0.2, 0.2, 0.267, 0.267, 0.267, 0.267, 0.333, 0.333, 0.333, 0.333,
0.4, 0.4, 0.4, 0.4, 0.467, 0.467, 0.467, 0.467, 0.55, 0.55, 0.55,
0.55, 0.15, 0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3,
0.3, 0.3, 0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433,
0.433, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.1, 0.1, 0.1,
0.1, 0.2, 0.2, 0.2, 0.267, 0.267, 0.267, 0.267, 0.333, 0.333,
0.333, 0.333, 0.4, 0.4, 0.4, 0.4, 0.467, 0.467, 0.467, 0.467,
0.55, 0.55, 0.55, 0.55, 0.15, 0.15, 0.15, 0.15, 0.233, 0.233,
0.233, 0.233, 0.3, 0.3, 0.3, 0.3, 0.367, 0.367, 0.367, 0.367,
0.433, 0.433, 0.433, 0.433, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6,
0.6, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.267, 0.267, 0.267,
0.267, 0.333, 0.333, 0.333, 0.333, 0.4, 0.4, 0.4, 0.4, 0.467,
0.467, 0.467, 0.467, 0.55, 0.55, 0.55, 0.55, 0.15, 0.15, 0.15,
0.15, 0.233, 0.233, 0.233, 0.3, 0.3, 0.3, 0.3, 0.367, 0.367,
0.367, 0.367, 0.433, 0.433, 0.433, 0.433, 0.5, 0.5, 0.5, 0.6,
0.6, 0.6, 0.6, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.267,
0.267, 0.267, 0.267, 0.333, 0.333, 0.333, 0.333, 0.4, 0.4, 0.4,
0.4, 0.467, 0.467, 0.467, 0.467, 0.55, 0.55, 0.55, 0.55, 0.15,
0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3, 0.3, 0.3,
0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433, 0.433,
0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.1, 0.1, 0.1, 0.1, 0.2,
0.2, 0.2, 0.2, 0.267, 0.267, 0.267, 0.267, 0.333, 0.333, 0.333,
0.15, 0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3, 0.3,
0.3, 0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433, 0.433,
0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.267, 0.267, 0.267,
0.267, 0.333, 0.333, 0.333, 0.333, 0.4, 0.4, 0.4, 0.4, 0.15,
0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3, 0.3, 0.3,
0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433, 0.433)
k=c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 1, 3,
3, 3, 3, 3, 3, 4, 2, 5, 3, 4, 7, 8, 5, 4, 5, 5, 4, 5, 5, 5, 6,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 2, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 3, 2, 4, 1, 2,
3, 4, 2, 2, 4, 4, 3, 1, 2, 0, 3, 4, 5, 5, 0, 0, 0, 0, 0, 0, 1,
0, 0, 1, 1, 2, 1, 2, 2, 0, 3, 1, 0, 2, 4, 6, 5, 5, 4, 5, 5, 5,
1, 0, 0, 0, 2, 1, 0, 1, 3, 2, 1, 1, 3, 4, 3, 4, 5, 5, 5, 5, 8,
6, 7, 6, 6, 5, 7, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 2, 1, 1, 3, 3,
2, 1, 3, 6, 2, 5, 3, 3, 5, 6, 5, 5, 5, 1, 0, 1, 1, 2, 1, 1, 1,
3, 4, 2, 5, 5, 3, 4, 4, 6, 4, 6, 5, 6, 5, 5, 5, 5, 4, 5, 5, 0,
0, 0, 0, 0, 2, 0, 2, 3, 3, 3, 2, 3, 3, 1, 4, 4, 4, 4, 3, 5, 6,
5, 5, 5, 5, 5, 1, 4, 1, 2, 2, 3, 4, 2, 5, 5, 5, 5, 5, 4, 5, 7,
6, 7, 6, 5, 5, 5, 7, 5, 5, 5, 5, 5, 0, 1, 0, 0, 3, 2, 3, 3, 1,
2, 2, 2, 4, 2, 3, 2, 5, 5, 5, 5, 4, 6, 5, 6, 5, 5, 6, 5, 3, 5,
2, 4, 5, 3, 5, 5, 6, 4, 4, 5, 5, 5, 6, 6, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 0, 0, 2, 0, 3, 2, 3, 2, 3, 4, 3, 4, 5, 5, 5, 5, 6, 4,
6, 4, 5, 7, 5, 5, 5, 6, 5, 5, 2, 3, 4, 4, 4, 4, 5, 5, 5, 6, 5,
5, 5, 5, 5, 4, 6, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 1, 0, 2, 0,
3, 5, 2, 2, 4, 5, 4, 5, 6, 6, 4, 5, 4, 5, 4, 5, 5, 5, 5, 5, 5,
6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 1, 4, 1, 4, 4, 4, 4, 4, 3, 6, 5,
4, 3, 5, 4, 5, 6, 6, 5, 6, 5, 4, 5, 5, 5, 6, 5, 5, 5, 11, 5,
12, 5, 5, 5, 5, 4, 5, 5, 5)
n=c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5,
6, 5, 6, 5, 5, 5, 5, 7, 5, 6, 8, 8, 6, 5, 6, 5, 5, 5, 5, 5, 6,
5, 5, 5, 5, 7, 11, 8, 7, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5,
4, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 4, 5, 5, 5, 6, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 6, 6, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 7, 6, 7, 6, 5, 5, 5, 5,
5, 5, 5, 5, 5, 6, 5, 6, 6, 5, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5,
8, 6, 7, 6, 6, 5, 7, 5, 5, 5, 5, 6, 5, 5, 5, 7, 7, 6, 5, 6, 5,
5, 5, 5, 6, 6, 4, 6, 6, 5, 5, 6, 6, 5, 5, 5, 5, 5, 5, 7, 5, 5,
4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 4, 6, 5, 6, 5, 5, 5, 5, 4, 5, 5,
5, 5, 6, 6, 5, 6, 5, 4, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 5,
6, 5, 5, 5, 5, 5, 5, 6, 5, 6, 7, 4, 6, 5, 5, 5, 5, 5, 5, 4, 5,
7, 6, 7, 6, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 5,
5, 5, 5, 5, 5, 4, 5, 6, 5, 5, 5, 5, 5, 7, 5, 6, 5, 5, 6, 5, 5,
5, 5, 5, 5, 5, 5, 6, 6, 5, 5, 5, 5, 5, 6, 6, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6,
5, 6, 5, 6, 7, 5, 5, 5, 6, 5, 5, 4, 5, 5, 5, 5, 6, 5, 5, 5, 6,
5, 5, 5, 5, 5, 5, 6, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 4, 5, 5, 5, 5, 5, 5, 5, 7, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6,
5, 5, 5, 5, 5, 5, 6, 6, 5, 6, 5, 5, 5, 5, 5, 6, 5, 5, 5, 11,
5, 12, 5, 5, 5, 5, 4, 5, 5, 5)
#low_high contains the lower and upper bounds for the bisection method
low_high=vector("list",2)
low_high[["low"]]=rep(0,length(x1))
low_high[["high"]]=rep(1,length(x1))
low_high_list=rep(list(low_high),50)
ll=function(theta)
{
names(theta)=c("b1","m1","b2","m2")
b1=theta[["b1"]]
m1=theta[["m1"]]
b2=theta[["b2"]]
m2=theta[["m2"]]
#bisection function is used to find y which makes G=0
bisection_function=function(prv,nxt)
{
low_high=prv
#G and y are both vectors of the length of the data set (in this example, 469)
y=(low_high[["low"]]+low_high[["high"]])/2
G=-1+(x1/((log(-y/(y-1))-b1)/m1))+(x2/((log(-y/(y-1))-b2)/m2))
low_high[["low"]][G>0]=y[G>0]
low_high[["high"]][G<0]=y[G<0]
return(low_high)
}
#Reduce is the fastest method I've found so far
#(in other words, if there is a better way, I'm certainly open to suggestions!)
low_high=Reduce(bisection_function,low_high_list)
p=(low_high[["low"]]+low_high[["high"]])/2
#sum of log likelihood for binomial distribution
loglik=sum(log((gamma(1+n)/(gamma(1+k)*(gamma(1+n-k))))*((p^k)*((1-p)^(n-k)))))
return(loglik)
}
theta.start=c(b1=-10,m1=10,b2=-10,m2=10)
mle=optim(theta.start,ll,control=list(fnscale=-1),hessian=TRUE)
Thanks!!

Using Vincent's suggestions, I was able to supply an analytic gradient via implicit differentiation. In case anyone else has a similar problem, I have included reproducible code below (to be added after the code included in the question).
Gexpression=parse(text="-1+(x1/((log(-p/(p-1))-b1)/m1))+(x2/((log(-p/(p-1))-b2)/m2))")
nested=function(theta)
{
names(theta)=c("b1","m1","b2","m2")
b1=theta[["b1"]]
m1=theta[["m1"]]
b2=theta[["b2"]]
m2=theta[["m2"]]
#bisection function is used to find y which makes G=0
bisection_function=function(prv,nxt)
{
low_high=prv
#G and y are both vectors of the length of the data set (in this example, 469)
y=(low_high[["low"]]+low_high[["high"]])/2
G=-1+(x1/((log(-y/(y-1))-b1)/m1))+(x2/((log(-y/(y-1))-b2)/m2))
low_high[["low"]][G>0]=y[G>0]
low_high[["high"]][G<0]=y[G<0]
return(low_high)
}
low_high=Reduce(bisection_function,low_high_list)
p=(low_high[["low"]]+low_high[["high"]])/2
return(p)
}
gr=function(theta)
{
names(theta)=c("b1","m1","b2","m2")
b1=theta[["b1"]]
m1=theta[["m1"]]
b2=theta[["b2"]]
m2=theta[["m2"]]
p=nested(theta)
# dll is the derivative of the loglik function, which takes the partial derivative
# of any parameter
dll=function(d_any) (((k / p) * d_any) - (((n - k) / (1 - p))*d_any))
#fd_any takes the partial derivative of the with respect to any parameter
fd_any=function(any) eval(parse(text=paste("-((",as.character(list(D(Gexpression,any))),")/(",as.character(list(D(Gexpression,'p'))),"))",sep="")))
DLb1=dll(fd_any("b1"))
DLb2=dll(fd_any("b2"))
DLm1=dll(fd_any("m1"))
DLm2=dll(fd_any("m2"))
DLb1[is.na(DLb1)]=0
DLb2[is.na(DLb2)]=0
DLm1[is.na(DLm1)]=0
DLm2[is.na(DLm2)]=0
colSums(cbind(b1=DLb1,m1=DLm1,b2=DLb2,m2=DLm2))
}
hs=function(theta)
{
names(theta)=c("b1","m1","b2","m2")
b1=theta[["b1"]]
m1=theta[["m1"]]
b2=theta[["b2"]]
m2=theta[["m2"]]
p=nested(theta)
fd_any_fun=function(any) paste("(-((",as.character(list(D(Gexpression,any))),")/(",as.character(list(D(Gexpression,'p'))),")))",sep="")
dll_fun=function(d_any_fun) paste("((k / p) * (",d_any_fun,")) - (((n - k) / (1 - p))*(",d_any_fun,"))",sep="")
hb1b1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b1"))),"b1")))
hb1m1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b1"))),"m1")))
hb1b2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b1"))),"b2")))
hb1m2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b1"))),"m2")))
hb1b1[is.na(hb1b1)]=0
hb1m1[is.na(hb1m1)]=0
hb1b2[is.na(hb1b2)]=0
hb1m2[is.na(hb1m2)]=0
hb1b1=sum(hb1b1)
hb1m1=sum(hb1m1)
hb1b2=sum(hb1b2)
hb1m2=sum(hb1m2)
h1=c(hb1b1,hb1m1,hb1b2,hb1m2)
hm1b1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m1"))),"b1")))
hm1m1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m1"))),"m1")))
hm1b2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m1"))),"b2")))
hm1m2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m1"))),"m2")))
hm1b1[is.na(hm1b1)]=0
hm1m1[is.na(hm1m1)]=0
hm1b2[is.na(hm1b2)]=0
hm1m2[is.na(hm1m2)]=0
hm1b1=sum(hm1b1)
hm1m1=sum(hm1m1)
hm1b2=sum(hm1b2)
hm1m2=sum(hm1m2)
h2=c(hm1b1,hm1m1,hm1b2,hm1m2)
hb2b1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b2"))),"b1")))
hb2m1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b2"))),"m1")))
hb2b2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b2"))),"b2")))
hb2m2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b2"))),"m2")))
hb2b1[is.na(hb2b1)]=0
hb2m1[is.na(hb2m1)]=0
hb2b2[is.na(hb2b2)]=0
hb2m2[is.na(hb2m2)]=0
hb2b1=sum(hb2b1)
hb2m1=sum(hb2m1)
hb2b2=sum(hb2b2)
hb2m2=sum(hb2m2)
h3=c(hb2b1,hb2m1,hb2b2,hb2m2)
hm2b1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m2"))),"b1")))
hm2m1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m2"))),"m1")))
hm2b2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m2"))),"b2")))
hm2m2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m2"))),"m2")))
hm2b1[is.na(hm2b1)]=0
hm2m1[is.na(hm2m1)]=0
hm2b2[is.na(hm2b2)]=0
hm2m2[is.na(hm2m2)]=0
hm2b1=sum(hm2b1)
hm2m1=sum(hm2m1)
hm2b2=sum(hm2b2)
hm2m2=sum(hm2m2)
h4=c(hm2b1,hm2m1,hm2b2,hm2m2)
h=rbind(h1,h2,h3,h4)
return(h)
}
The gradient seems to work fine. For some reason, the estimated Hessian matrix from optim() is different than the gradient calculated in hs(). The resulting standard errors are of the same order of magnitude, at least:
# Standard errors from optim Hessian
sqrt(abs(diag(solve(mle$hessian))))
# Standard errors from analytic Hessian
sqrt(abs(diag(solve(hs(mle$par)))))

Related

rows get deleted when using match to sort dataframe

I have the original dataframe of many many rows (i know they are replicated)
> dput(DATA)
structure(list(N_b = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), N_l = c(4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3), S = c(12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 15,
15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 15, 15,
15, 15, 15, 15, 15, 15, 15, 15, 15, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9), Proposed.Girder3 = c(0.52, 0.52, 0.52, 0.52, 0.52, 0.52,
0.52, 0.52, 0.52, 0.52, 0.52, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65,
0.65, 0.65, 0.65, 0.65, 0.65, 0.51, 0.51, 0.51, 0.51, 0.51, 0.51,
0.51, 0.51, 0.51, 0.51, 0.51, 0.52, 0.52, 0.52, 0.52, 0.52, 0.52,
0.52, 0.52, 0.52, 0.52, 0.52, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65,
0.65, 0.65, 0.65, 0.65, 0.65, 0.51, 0.51, 0.51, 0.51, 0.51, 0.51,
0.51, 0.51, 0.51, 0.51, 0.51, 0.52, 0.52, 0.52, 0.52, 0.52, 0.52,
0.52, 0.52, 0.52, 0.52, 0.52, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65,
0.65, 0.65, 0.65, 0.65, 0.65, 0.51, 0.51, 0.51, 0.51, 0.51, 0.51,
0.51, 0.51, 0.51, 0.51, 0.51, 0.52, 0.52, 0.52, 0.52, 0.52, 0.52,
0.52, 0.52, 0.52, 0.52, 0.52, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65,
0.65, 0.65, 0.65, 0.65, 0.65, 0.51, 0.51, 0.51, 0.51, 0.51, 0.51,
0.51, 0.51, 0.51, 0.51, 0.51, 0.52, 0.52, 0.52, 0.52, 0.52, 0.52,
0.52, 0.52, 0.52, 0.52, 0.52, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65,
0.65, 0.65, 0.65, 0.65, 0.65, 0.51, 0.51, 0.51, 0.51, 0.51, 0.51,
0.51, 0.51, 0.51, 0.51, 0.51, 0.52, 0.52, 0.52, 0.52, 0.52, 0.52,
0.52, 0.52, 0.52, 0.52, 0.52, 0.65, 0.65, 0.65, 0.65, 0.65, 0.65,
0.65, 0.65, 0.65, 0.65, 0.65, 0.51, 0.51, 0.51, 0.51, 0.51, 0.51,
0.51, 0.51, 0.51, 0.51, 0.51), Lanes = c(4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3), UG = c(100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 108, 108, 108,
108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108,
108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108,
108, 108, 108, 108, 116, 116, 116, 116, 116, 116, 116, 116, 116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 124, 124,
124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124,
124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124,
124, 124, 124, 124, 124, 84, 84, 84, 84, 84, 84, 84, 84, 84,
84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84,
84, 84, 84, 84, 84, 84, 84, 84, 92, 92, 92, 92, 92, 92, 92, 92,
92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92,
92, 92, 92, 92, 92, 92, 92, 92, 92), CSi = c(0.498857761911128,
0.506171857609652, 0.491697098095741, 0.5060648860829, 0.51602587099039,
0.49808311021839, 0.484326916022697, 0.486261403372008, 0.484696645284676,
0.542438052075464, 0.501306385491762, 0.634937543079967, 0.642078412670016,
0.618943708143363, 0.642001779473278, 0.658268730337476, 0.630133634378208,
0.61289410586889, 0.615963132516221, 0.615769133902813, 0.686518342284576,
0.63848257046785, 0.477839632977349, 0.481308189937141, 0.466821213798956,
0.484416044616133, 0.495362194700848, 0.47320175377938, 0.46075484570102,
0.462933293434182, 0.46296944030225, 0.519970813725933, 0.478798800223883,
0.499649613847278, 0.50349372475143, 0.490922567156329, 0.506660508807011,
0.514932254618641, 0.497406049605651, 0.483910162470329, 0.484700178543721,
0.483690038460146, 0.541097742382397, 0.49864706679875, 0.638103261594521,
0.645030188324246, 0.622321358649241, 0.644932774331382, 0.661080914216008,
0.633266424051986, 0.616403425794446, 0.619662818975923, 0.619368626094409,
0.689062534462536, 0.641640617456748, 0.481809699169021, 0.484780552887199,
0.471041353094871, 0.489040154106175, 0.499309812059152, 0.477094533923277,
0.46479660834156, 0.467205762187312, 0.465930259455921, 0.524495868736496,
0.482408228794972, 0.498202392725583, 0.502618858440184, 0.489487329812287,
0.503840707835284, 0.514021291777706, 0.495297263755732, 0.482202022708633,
0.483839116286323, 0.539456419577533, 0.539456419855875, 0.498082441376597,
0.630858086293792, 0.63756198028618, 0.615358704038841, 0.637489319425201,
0.653397114261802, 0.625957013049853, 0.609464834716713, 0.612676084901444,
0.612530536192196, 0.612530533035217, 0.63430804238461, 0.48126980512503,
0.484644526574109, 0.470238034678857, 0.487935539905689, 0.49887970982208,
0.476533589513863, 0.464212956954452, 0.466465412750473, 0.46642671330667,
0.52379164029609, 0.48210024308779, 0.495313482556363, 0.499430830726606,
0.486650554549094, 0.501074567462105, 0.511559881655238, 0.492318751733689,
0.479463896518796, 0.480962859032664, 0.479819940340815, 0.536420385604673,
0.494978560935791, 0.628848475181058, 0.63411772566777, 0.613650360338718,
0.637687298501148, 0.651062927764633, 0.624780782896341, 0.608015537732378,
0.609978147167127, 0.610267677247537, 0.679026578215092, 0.630653747823922,
0.484330062840347, 0.483272947533652, 0.469030546777778, 0.486654560445457,
0.497498231247353, 0.475287888336171, 0.46299937090013, 0.465252231525678,
0.465143863657343, 0.52242431063692, 0.480777563607102, 0.509393190572395,
0.0306794102100841, 0.499801210623311, 0.514261273631288, 0.524257222129056,
0.507090829156798, 0.492293988923706, 0.494634579696826, 0.492902890462201,
0.551785598208862, 0.510878424089161, 0.639185175219647, 0.646663818507054,
0.622627268125056, 0.646370988091098, 0.662988587960886, 0.634650836091679,
0.616659986042537, 0.619748241531646, 0.61951439818747, 0.691805427278811,
0.64284460028603, 0.484887769151249, 0.48865031929918, 0.473940218959181,
0.484917825918303, 0.496960554187183, 0.473813537802849, 0.461261792526738,
0.463580335134683, 0.463394052048788, 0.520637868360136, 0.479413561159061,
0.503027081012421, 0.508886440468214, 0.493841291641758, 0.508191035709441,
0.517649011896045, 0.500464428119884, 0.486297995480364, 0.48821271352713,
0.486779215307284, 0.544834557703888, 0.503994168801514, 0.63420530022389,
0.641441566573504, 0.617939244523909, 0.641442341788609, 0.657759058843292,
0.629532366426079, 0.611898297010965, 0.615054071992963, 0.614543613798064,
0.686239403738044, 0.637751280776591, 0.476610456219972, 0.480071378890351,
0.465670604256241, 0.483172142840914, 0.494035003336151, 0.472023145046551,
0.459586167724079, 0.461826826254301, 0.461674647472426, 0.518534655365997,
0.477607992305144)), row.names = c(NA, -198L), class = "data.frame")
I try to sort it based on the column S with:
target <- c(12,15,9)
DATA <- DATA[match(target, DATA$S),]
The result is a 3 row dataframe but I want to keep the same number of rows and just sort it
> dput(DATA)
structure(list(N_b = c(5, 5, 5), N_l = c(4, 5, 3), S = c(12,
15, 9), Proposed.Girder3 = c(0.52, 0.65, 0.51), Lanes = c(4,
5, 3), UG = c(100, 100, 100), CSi = c(0.498857761911128, 0.634937543079967,
0.477839632977349)), row.names = c(1L, 12L, 23L), class = "data.frame")
Here is a round about way:
library(dplyr)
data %>%
# convert S to factor and specify order
mutate(S = factor(S, levels = c(12, 15, 9))) %>%
# sort by levels of S factor
arrange(S) %>%
# convert S back to numeric (need to go through character or weirdness happens)
mutate(S = as.numeric(as.character(S)))
The order in match should be reversed and you need to order the output from match.
target <- c(12,15,9)
DATA <- DATA[order(match(DATA$S, target)),]

Can´t use survfit on some data.frames

I have a dataset I´m going to use for survival analysis, and it seems to be working fine when I use the whole set. However, once I slice it into smaller dataframes using data[which(data$variable1=="somevalue")]the thing seems to break down.
Most of the resulting smaller dataframes work fine, but some are a problem. In the problematic ones, I can use summary(survfit(Surv(time, status)~variable2, data=smalldataframe))$surv without a problem, but when I try summary(survfit(Surv(time, status)~variable2, data=smalldataframe), time=5)$surv, it throws Error in array(xx, dim = dd) : negative length vectors are not allowed.
I´ve tried looking at the data, to see if I have any weird values, like negative times, but there aren´t any. Besides, if there were a problem with that, the full dataframe should be throwing an error too, but it doesn´t. All the smaller dataframes are created using the same line of code, so I also don´t understand why they are acting differently. And mostly, I don´t understand why summary(survfit(...))$surv works fine, as does plot(survfit(...)), but when I want to calculate survival at a specific time, it suddenly doesn´t like the data anymore.
Here´s one of the offending dataframes
test <-
structure(list(time2 = c(0.15, 2.08, 2.06, 0.32, 39.45, 39.09,
2.57, 3.64, 13.57, 36.57, 36.26, 0.78, 0.1, 33.94, 3.1, NA, 1.77,
28.38, 1.24, NA, 1.87, 25.83, 2.62, 1.57, 1.6, 22.74, 21.03,
20.54, 20.03, 0.97, 19.35, 18.09, 2.61, 17.68, NA, 3.85, 3.52,
11.22, 11.52, 11.04, 10.51, 1.68, 10.4, 10.61, 9.01, 9.05, 7.8,
0.11, 4.83), status = c(1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1,
0, 1, NA, 1, 1, 1, NA, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1,
0, NA, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0), cas_dg = c(1,
2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8,
8, 9, 9, 9, 9, 9)), .Names = c("time2", "status", "cas_dg"), row.names = c(NA, -49L), class = "data.frame")
The call that is giving me trouble is summary(survfit(Surv(time2, status)~cas_dg, data=test), time=5)$surv and that only with some of the smaller dataframes.
You need to use argument extend=TRUE in summary; according to ?summary.survfit:
extend: logical value: if TRUE, prints information for all specified
‘times’, even if there are no subjects left at the end of the
specified ‘times’. This is only valid if the ‘times’
argument is present.
So for your sample data, you can do:
fit <- survfit(Surv(time2, status) ~ cas_dg, data = test);
summary(fit, time = 5, extend = TRUE)$surv;
#[1] 0.0000000 0.0000000 0.5555556 0.5000000 0.3333333 0.5714286 0.6000000
#[8] 0.6666667 0.8000000

Multiple stat_function on grouped data with ggplot2

I am studying a data set with multiple observation of a parameter overtime. the data is like:
test<-data.frame(t = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1.33, 1.33, 1.33, 1.33, 1.33, 1.33, 1.33, 1.33, 1.33, 1.33, 1.33, 1.33, 1.33, 1.33, 1.33, 1.33, 1.67, 1.67, 1.67, 1.67, 1.67, 1.67, 1.67, 1.67, 1.67, 1.67, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10), int = c(76.44609375, 94.6619686800895, 112.148907103825, 75.1003097802036, 74.1037037037037, 76.7526662128432, 74.0734830988873, 87.9052100068855, 81.0525931336742, 92.1907873244038, 84.0708929788684, 88.8232221775814, 98.1323678006063, 115.175322139789, 91.2653104925053, 76.3661620658949, 152.637799717913, 107.054702135631, 83.4693197755961, 91.658991910392, 81.3991787335206, 106.153762268266, 100.919789842382, 67.2119436084271, 137.558914728682, 89.1182608695652, 156.10352233677, 108.180911207183, 87.9794680354643, 77.7501400560224, 80.7675382653061, 95.6662793399954, 92.5649630541872, 88.3301402668491, 84.3891875746714, 76.4318673395818, 111.413893510815, 82.4753828420879, 119.099190283401, 192.539417212559, 208.49203187251, 106.919937512205, 105.370936371214, 180.028767711464, 130.29369773608, 170.193357597816, 172.703180212014, 178.061569518042, 182.097607918614, 227.066976984743, 153.856101031661, 432.991580916745, 299.143735224586, 144.118156808803, 396.36644895153, 334.538796516231, 350.186359610275, 200.781101530882, 279.866079790223, 122.542700519331, 235.199555308505, 204.924140655867, 229.181848967152, 225.542753383955, 468.308974987739, 269.306058221873, 229.969282013323, 255.553846153846, 621.021220159151, 255.017211703959, 396.658265826583, 273.300663227708, 232.449965010497, 303.343894502483, 276.952483801296, 327.419805194805, 241.136864249474, 457.961489497136, 498.901714285714, 280.9558101473, 322.089588377724, 386.754533152909, 364.356809338521, 340.416035518412, 428.482916666667, 668.447197400487, 387.671341748481, 471.049545829893, 255.8802020688, 361.979536152797, 192.224629418472, 284.088954468803, 170.763997760358, 237.869065100343, 365.08237271854, 294.266488413547, 718.279750479846, 211.599427030671, 294.045375597047, 207.099267015707, 194.209973045822, 251.306358381503, 190.786794766966, 400.396083385976, 183.133240482823, 130.442107867392, 167.231452991453, 345.110896351776, 299.304645622394, 192.078204692282, 121.273544841369, 153.996295438759, 97.6034616378197, 362.80049522462, 130.498551774077, 106.031656035908, 117.682936668011, 90.1247837370242, 140.855475040258, 169.050049067713, 244.290241606527, 120.603356419819, 173.413333333333, 125.896389002872, 206.543873212215, 186.668320340184, 85.0988108720272, 106.57849117175, 102.867232728676, 216.232957110609, 86.6538461538462, 149.459777852575, 212.498573059361, 93.3816390633923, 105.567730417318, 120.095470383275, 137.205696941396, 141.156985871272, 90.578857338351, 84.8457760314342, 127.092660685395, 136.859870967742, 188.406440382942, 86.0879705400982))
class(test)
I managed to plot the density for each time point using:
ggplot(test, aes(int, group = as.factor(t),colour=t))+ geom_density()
But I would like to do the same graph but instead of the density I would like to plot a log normal fit of the density.
I know how to plot the lognormal fitting on one time point using fitdistr and passing parameter to stat_function whit this code
library(MASS)
fit <- fitdistr(subset(test, t == 0,select='int')$int, "lognormal")
ggplot(data=subset(test, t == 0,select='int'), aes(x=int)) +stat_function(fun = dlnorm,args = list(mean = fit$estimate[1], sd = fit$estimate[2]))
But how can I do it for all t with the colour of the line being given by the value of t is it possible to provide a function in the args list?
I thought of another naive solution: Predicting the values of every dlnorm().
## Split up the data according to t
tt <- split(test, test$t)
## Fit a lognormal to every dataset
fits <- lapply(tt, function(x) fitdistr(x$int, "lognormal"))
## Predict values
fitted <- lapply(fits, function(x) dlnorm(x = 1:max(test$int),
mean = x$estimate[1], sd = x$estimate[2]))
## Wrap everything into a data.frame ggplot can handle
plot.data <- data.frame(y = unlist(fitted), int = 1:max(test$int),
t = rep(unique(test$t),
each = length(unlist(fitted))/length(unique(test$t))))
## Plot
ggplot(test, aes(int, group = as.factor(t), colour=t)) +
#geom_density() +
geom_line(data = plot.data, aes(y = y), lwd = 1)
What about a naive solution, adding iteratively stat_function()?
cols <- brewer.pal(length(unique(test$t)),"Set1")
g <- ggplot(data=subset(test, t == 0, select='int'), aes(x=int))
n <- 1
for(i in unique(test$t)){
fit <- fitdistr(subset(test, t == i, select='int')$int, "lognormal")
g <- g+stat_function(fun = dlnorm,
args=list(mean=fit$estimate[1],sd=fit$estimate[2]),
col=cols[n])
n <- n + 1
}
g

Print factor analysis from factanal() with item labels

EDIT
So it looks like it's something in my call to library(reshape) that's breaking the labeling of factors. This was not included in the minimal example, but will be added now. It's not needed to create the example, but it's needed to recreate the issue. I need the library to get my data in shape to even do factanal(). Any ideas what part of reshape is breaking it and how to fix it?
Original question
I have been running factor analyses on my data and have been having an intermittent issue with the way results are printed.
If I create a data set like the following:
library(reshape)
mock <- data.frame(
sample_name1 = sample(1:100),
sample_name2 = sample(1:100),
sample_name3 = sample(1:100),
s_amplename_4 = sample(1:100),
samplename5 = sample(1:100),
sa_mplen_a_me_6 = sample(1:100),
samplename7 = sample(1:100),
samplename8 = sample(1:100)
)
and run a factor analysis with
factanal(mock, factors = 2)
I get the output to print out very prettily with item names as labels for the rows, e.g.:
# Snip snip
Loadings:
Factor1 Factor2
sample_name1 -0.126 -0.105
sample_name2 -0.414
sample_name3 0.665
s_amplename_4 -0.314
samplename5 0.850
sa_mplen_a_me_6 -0.117
samplename7 0.442
samplename8 -0.139
This kind of output is exactly what I am looking for. However, when I run the same type of analysis on my own data (and I apologize for the length here):
miniset <- structure(list(`clarity1` = c(2, 2, 2, 3, 4.5, 1.5, 1.5, 3.5,
2, 6, 2.5, 4, 1, 1.5, 6, 2, 5.5, 2, 2, 3, 1.5, 5, 3.5, 2, 1.5,
2.5, 3, 3, 2, 1),
`clarity2` = c(1.5, 2, 2, 2, 3.5, 5, 3, 5,
2, 4, 2, 2.5, 1, 1.5, 2, 4, 5, 2, 2, 3.5, 6, 1, 2, 1.5, 1, 2,
2, 3, 6.5, 1),
`clarity3` = c(3, 3.5, 2, 3.5, 5.5, 4, 6, 5.5,
2, 3, 3, 3.5, 1, 2.5, 2, 5, 5, 5, 2, 6.5, 5.5, 5, 5.5, 6, 3,
2, 2, 5, 4.5, 5.5),
`detail1` = c(3, 4, 2, 6, 5, 6.5, 5.5,
4, 3, 6, 2.5, 4, 1, 4, 2, 4.5, 7, 6.5, 2, 6.5, 6, 2, 6, 5, 2.5,
5.5, 4, 5.5, 6, 1.5),
`detail2` = c(3.5, 4, 4, 6.5, 4.5, 6,
4, 4.5, 2, 6, 2.5, 5, 2, 4, 3, 6, 7, 7, 2, 6.5, 6, 3, 6, 6, 2.5,
6, 3, 5, 6.5, 2.5),
`detail3` = c(2.5, 4, 2, 6, 5, 6, 6, 4,
2, 6, 2, 5, 2, 3, 3, 5, 6.5, 6, 2, 6.5, 7, 7, 5.5, 5, 3.5, 2,
3, 5, 6, 2),
`complete1` = c(2, 2.5, 2, 3, 3.5, 5.5, 2.5, 2.5,
2, 3, 3, 3.5, 2, 4, 3, 3, 7, 4, 2, 3, 6, 3, 5.5, 2, 3, 2, 2,
3, 6, 3),
`complete2` = c(3, 4.5, 2, 3, 4.5, 6, 6, 4.5, 3,
3, 3.5, 4, 2, 5, 3, 4, 7, 4, 2, 6, 7, 5, 5, 6, 3, 3, 5, 5, 6,
2),
`complete3` = c(3, 4.5, 2, 2.5, 4.5, 6.5, 5, 5, 2, 6.5,
3.5, 3.5, 1, 3, 3, 2.5, 7, 4, 2, 6, 1.5, 7, 5.5, 6.5, 3.5, 5.5,
3, 3, 2.5, 1),
`truthful1` = c(2.5, 2, 2, 3, 3.5, 2, 2, 2.5,
2, 3, 3, 2.5, 2, 3, 2, 2, 3.5, 3, 2, 3.5, 1.5, 1, 3.5, 2.5, 3,
2, 2, 3, 1.5, 1.5),
`truthful2` = c(2.5, 1.5, 2, 2, 3, 1.5,
2, 1, 1, 5.5, 3, 3.5, 1, 4.5, 2, 2, 5, 2, 2, 1.5, 4.5, 1, 3.5,
2, 3.5, 2.5, 2, 2, 4.5, 1),
`truthful3` = c(2, 1.5, 2, 3.5,
2.5, 2, 2, 2.5, 2, 2, 3.5, 2.5, 1, 1.5, 3, 2, 5, 3, 3, 2, 3.5,
1, 2, 1, 3.5, 2, 2, 2.5, 4.5, 1),
`relevant1` = c(1.5, 1.5,
2, 5, 2.5, 1.5, 2, 3.5, 2, 4.5, 2.5, 3.5, 1, 3.5, 3, 1.5, 5.5,
3.5, 2, 2, 6, 3, 3.5, 3, 1.5, 2, 3, 3, 6, 1),
`relevant2` = c(1.5,
3, 2, 2, 3.5, 1.5, 2.5, 5.5, 1, 2, 3.5, 2, 1, 1.5, 2, 4, 5.5,
2, 3, 5.5, 5.5, 1, 4, 5, 1.5, 2, 3, 2.5, 3, 1),
`relevant3` = c(1.5,
2, 2, 3, 2, 1, 2, 2, 1, 2, 1.5, 2.5, 1, 1.5, 2, 1.5, 5.5, 5,
2, 1, 7, 1, 1, 2, 1, 2, 3, 3, 2.5, 1)),
.Names = c("clarity1",
"clarity2", "clarity3", "detail1", "detail2", "detail3",
"complete1", "complete2", "complete3", "truthful1", "truthful2",
"truthful3", "relevant1", "relevant2", "relevant3"),
row.names = c(NA, 30L), class = c("cast_df", "data.frame"))
factanal(miniset, factors = 3)
the result is much less pretty, e.g.:
Loadings:
Factor1 Factor2 Factor3
[1,] 0.222 0.664
[2,] 0.559 0.524
[3,] 0.824
[4,] 0.740 0.361 0.282
[5,] 0.698 0.374 0.251
[6,] 0.783 0.278 0.265
[7,] 0.498 0.598 0.140
[8,] 0.796 0.227 0.204
[9,] 0.490 -0.240 0.835
[10,] 0.147 0.156 0.348
[11,] 0.697 0.324
[12,] 0.756
[13,] 0.319 0.811 0.204
[14,] 0.567 0.252 0.108
[15,] 0.320 0.690
So rather than having the nice item names as labels for the loadings, I now get indices. While that's fine for me, I'll be working with a professor tomorrow who is less familiar with R and will probably get frustrated by the lack of labels. So what happens to the labels in the second case? And how can I get them back?
The issue is that miniset is a cast_df and factanal calls as.matrix(x). The as.matrix.cast_df method uses rrownames and rcolnames (all reshape functions) to extract "special dimension names".
For miniset these are NULL (hence the rownames are lost). Without knowing how you constructed miniset I can't help further here. (You must have used reshape to construct miniset at some point as you have created a cast_df object.
Good news is that
factanal(as.data.frame(miniset))
Works as you wish

xyplot not merging plots when more than two conditioning variables

When I run the following code, xyplot produces 4 separate plots 2 by 3 plots,
whereas I want a single 4 by 6 trellis (to save real estate
space on the axis anotation and legends).
Note that my problem is different from this one in that I don't want to
see four set of axis/legends.
Here is some example data:
B <- structure(list(yval = c(0.88, 4.31, 7.52, 3.21, 3.27, 4.93, 4.21,
0.7, 0.68, 0.92, 3.86, 5.67, 9.08, 1.95, 3.27, 1.44, 2.38, 0.85,
0.79, 0.55, 0.79, 10.52, 0.9, 4, 0.78, 2.46, 0.78, 1.64, 2.47,
0.77, 0.83, 0.86, 3.65, 8.25, 0.65, 0.88, 0.95, 4.05, 4.98, 1.43,
4.43, 2.94, 5.52, 0.9, 3.69, 0.79, 0.74, 1.49, 7.29, 0.58, 8.47,
5.82, 0.84, 0.87, 0.69, 1.38, 0.83, 2.32, 0.86, 7.32, 6.73, 6.7,
3.3, 1.58, 2.74, 0.88, 4.2, 3.79, 4.98, 2.54, 1.84, 1.2, 2.59,
11.99, 0.78, 0.92, 0.59, 3.83, 0.92, 2.6, 0.95, 3.18, 2.75, 9.83,
9.81, 0.55, 0.83, 6.29, 1.64, 1.12, 0.65, 3.96, 4.27, 3.99, 20,
0.83, 6.23, 6.81, 0.86, 0.7), xval = c(0.62, 0.81, 9.01, 3.72,
1.49, 3.92, 6.22, 6.64, 5.56, 6.64, 4, 7.36, 9.6, 1, 1.64, 3.34,
3.47, 3.37, 4.34, 6.63, 7.62, 4.07, 5.69, 3.76, 9.74, 1.58, 1.53,
2.62, 1.64, 1.18, 9.79, 9.9, 2.76, 7.96, 5.11, 4.74, 9.92, 0.49,
9.05, 8.59, 0.7, 5.8, 5.34, 3.14, 6.96, 2.05, 8.29, 0.35, 7.52,
6.56, 2.01, 7.92, 3.89, 6.31, 8.64, 6.18, 4.49, 0.63, 7.52, 7.82,
1.25, 9.54, 4.68, 0.4, 1.38, 8.7, 4.71, 8.27, 5.72, 0.75, 6.08,
0.11, 1.38, 0.37, 4.94, 0.53, 7.53, 3.11, 2.73, 4.93, 9.47, 2.18,
4.54, 7.12, 8.28, 6.62, 5.14, 4.42, 0.21, 9.52, 3.77, 6.43, 6.78,
6.87, 9.47, 6.42, 0.81, 8.88, 7.2, 8.68), gval = c(1, 2, 5, 5,
2, 1, 2, 1, 2, 3, 6, 5, 1, 3, 2, 3, 5, 2, 6, 4, 4, 1, 1, 6, 4,
2, 1, 2, 4, 5, 5, 3, 6, 5, 4, 2, 2, 3, 3, 6, 2, 4, 1, 4, 4, 1,
1, 2, 2, 5, 1, 1, 2, 2, 1, 3, 1, 5, 6, 5, 1, 5, 4, 4, 3, 6, 6,
4, 5, 4, 4, 6, 5, 6, 5, 2, 1, 1, 6, 6, 2, 5, 5, 1, 1, 4, 6, 3,
4, 6, 3, 5, 3, 3, 6, 2, 1, 5, 1, 3), type = c(5, 2, 1, 5, 1,
1, 1, 1, 2, 12, 5, 1, 2, 5, 5, 12, 12, 12, 12, 2, 12, 2, 12,
5, 12, 2, 12, 12, 5, 12, 12, 12, 5, 2, 5, 12, 1, 1, 1, 1, 2,
12, 1, 12, 2, 12, 2, 2, 1, 1, 2, 1, 5, 12, 12, 5, 12, 5, 5, 1,
1, 1, 2, 5, 5, 5, 5, 5, 1, 5, 12, 12, 5, 2, 12, 12, 1, 1, 5,
5, 5, 2, 5, 1, 2, 2, 5, 1, 5, 2, 5, 5, 5, 2, 2, 5, 1, 2, 2, 5
), cr = c(0.2, 0.4, 0.4, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.4,
0.4, 0.4, 0.2, 0.4, 0.4, 0.4, 0.2, 0.2, 0.2, 0.2, 0.4, 0.4, 0.4,
0.2, 0.2, 0.2, 0.4, 0.2, 0.2, 0.4, 0.4, 0.4, 0.4, 0.2, 0.4, 0.2,
0.4, 0.2, 0.2, 0.4, 0.4, 0.2, 0.2, 0.4, 0.2, 0.2, 0.2, 0.4, 0.2,
0.4, 0.2, 0.2, 0.4, 0.4, 0.2, 0.2, 0.4, 0.2, 0.4, 0.4, 0.4, 0.4,
0.2, 0.4, 0.4, 0.4, 0.4, 0.2, 0.4, 0.4, 0.2, 0.4, 0.4, 0.2, 0.2,
0.2, 0.2, 0.2, 0.4, 0.4, 0.4, 0.2, 0.4, 0.4, 0.2, 0.2, 0.4, 0.4,
0.2, 0.2, 0.2, 0.4, 0.2, 0.4, 0.4, 0.4, 0.4, 0.2, 0.2), p = c(4,
12, 12, 8, 12, 8, 12, 4, 4, 8, 8, 4, 4, 8, 8, 8, 4, 12, 8, 4,
12, 12, 12, 12, 8, 12, 4, 4, 8, 8, 8, 4, 8, 12, 4, 12, 12, 4,
12, 8, 4, 4, 12, 4, 4, 8, 4, 4, 8, 4, 8, 12, 12, 8, 4, 8, 8,
8, 8, 12, 4, 8, 4, 12, 4, 4, 12, 4, 12, 12, 8, 4, 4, 12, 8, 12,
4, 4, 12, 4, 8, 4, 8, 12, 8, 4, 4, 4, 8, 4, 4, 12, 8, 12, 8,
4, 4, 8, 8, 4), nsamp = c(100, 300, 300, 200, 300, 200, 300,
100, 100, 200, 200, 100, 100, 200, 200, 200, 100, 300, 200, 100,
300, 300, 300, 300, 200, 300, 100, 100, 200, 200, 200, 100, 200,
300, 100, 300, 300, 100, 300, 200, 100, 100, 300, 100, 100, 200,
100, 100, 200, 100, 200, 300, 300, 200, 100, 200, 200, 200, 200,
300, 100, 200, 100, 300, 100, 100, 300, 100, 300, 300, 200, 100,
100, 300, 200, 300, 100, 100, 300, 100, 200, 100, 200, 300, 200,
100, 100, 100, 200, 100, 100, 300, 200, 300, 200, 100, 100, 200,
200, 100)), .Names = c("yval", "xval", "gval", "type", "cr",
"p", "nsamp"), row.names = c(NA, -100L), class = "data.frame")
And here is the code I am running:
library(lattice)
library(latticeExtra)
library(grid)
types<-rep(NA,6)
types[1]<-expression(paste(epsilon,"=",0.2,", p=",4,sep=""))
types[2]<-expression(paste(epsilon,"=",0.2,", p=",8,sep=""))
types[3]<-expression(paste(epsilon,"=",0.2,", p=",12,sep=""))
types[4]<-expression(paste(epsilon,"=",0.4,", p=",4,sep=""))
types[5]<-expression(paste(epsilon,"=",0.4,", p=",8,sep=""))
types[6]<-expression(paste(epsilon,"=",0.4,", p=",12,sep=""))
types<-rep(types,4)
cl<-rainbow(7)[-4]
xyplot(B$yval~B$xval|as.factor(B$p)*as.factor(B$cr)*as.factor(B$type),
group=B$gval, as.table=TRUE,
ylab=expression(kappa(Sigma,S)), col=cl, xlab=expression(nu),
xlim=c(0,10), ylim=c(0,10), type=c("l","g"), lwd=5, cex.lab=2,
strip=function(...){
panel.fill(trellis.par.get("strip.background")$col[1])
type <- types[panel.number()]
grid::grid.text(label=type,x=0.5,y=0.5,gp=gpar(fontsize=20))
grid::grid.rect()
},
key=list(text=list(c("A","B","C","D","E","F"),cex=2),
lines=list(type=rep("l",6), label.cex=2,col=cl,lwd=3),columns=3),
par.settings=list(par.xlab.text=list(cex=2),axis.text=list(cex=2),
par.ylab.text=list(cex=2)))
Three conditioning variables means that it makes a three dimensional grid of panels, where the third dimension is onto multiple pages. One alternative is to only condition on two variables; here I use : to make the first conditioning factor the intersection of the first two original conditioning factors.
xyplot(B$yval~B$xval|as.factor(B$p):as.factor(B$cr)*as.factor(B$type), ...
I think you want layout=c(6,4) somewhere in your call to xyplot. Once you do that you will have to reconfigure many other settings.

Resources