I am analyzing a data set where ~10 individuals are exposed to a set treatment (Time) and mortality is recorded (Alive, Dead). glmer was used to model the data because Treatments were blocked (Trial).
From the following model I want to predict the Time at which 50% of individuals die.
Trial <- c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3)
Time <- c(2, 6, 9, 12, 15, 18, 21, 24, 1, 2, 3, 4, 5, 6, 1.5, 3, 4.5, 6, 39)
Alive <- c(10, 0, 0, 0, 0, 0, 0, 0, 6, 2, 8, 1, 0, 0, 4, 6, 1, 2, 0)
Dead <- c(0, 10, 6, 10, 10, 10, 7, 10, 0, 8, 1, 9, 10, 10, 5, 0, 8, 6, 10)
ostrinaA.glmm<- glmer(cbind(Alive, Dead)~Time+(1|Trial), family = binomial(link="logit"))
summary(ostrinaA.glmm)
If I was simply modelling using glmthe dose.p function from MASS could be used. From a different forum I found generalized code for a dose.p.glmm from Bill Pikounis. It is as follows:
dose.p.glmm <- function(obj, cf = 1:2, p = 0.5) {
eta <- obj$family$linkfun(p)
b <- fixef(obj)[cf]
x.p <- (eta - b[1L])/b[2L]
names(x.p) <- paste("p = ", format(p), ":", sep = "")
pd <- -cbind(1, x.p)/b[2L]
SE <- sqrt(((pd %*% vcov(obj)[cf, cf]) * pd) %*% c(1, 1))
res <- structure(x.p, SE = SE, p = p)
class(res) <- "glm.dose"
res
}
I'm new to coding and need help adjusting this code for my model. My attempt is as follows:
dose.p.glmm <- function(ostrinaA.glmm, cf = 1:2, p = 0.5) {
eta <- ostrinaA.glmm$family$linkfun(p)
b <- fixef(ostrinaA.glmm)[cf]
x.p <- (eta - b[1L])/b[2L]
names(x.p) <- paste("p = ", format(p), ":", sep = "")
pd <- -cbind(1, x.p)/b[2L]
SE <- sqrt(((pd %*% vcov(obj)[cf, cf]) * pd) %*% c(1, 1))
res <- structure(x.p, SE = SE, p = p)
class(res) <- "glm.dose"
res
}
dose.p.glmm(ostrinaA.glmm, cf=1:2, p=0.5)
Error in ostrinaA.glmm$family : $ operator not defined for this S4 class
Any assistance adjusting this code for my model would be greatly appreciated.
At a quick glance I would think replacing
eta <- obj$family$linkfun(p)
with
f <- family(obj)
eta <- f$linkfun(p)
should do the trick.
You also need to replace the res <- ... line with
res <- structure(x.p, SE = matrix(SE), p = p)
This is rather obscure, but is necessary because the print.dose.glm method (from the MASS package) automatically tries to cbind() some stuff together. This fails if SE is a fancy matrix from the Matrix package rather than a vanilla matrix from base R: matrix() does the conversion.
If you are very new to coding, you might not realize that you don't have to change the obj variable name in the code you've copied to ostrina.glmm. In other words, Pikounis's code should work perfectly well with only the two modifications I suggested above.
Related
I have a task to minimize 2 similar quadratic functions for beta.
For example:
K <- matrix(data = c(1, 2, 1, 2, 1,
2, 16, 2, 1, 2,
1, 2, 8, 2, 1,
2, 1, 2, 16, 2,
1, 2, 1, 2, 32),
nrow = 5, ncol = 5, byrow = TRUE)
k <- c(-2, 4, 12, 0, 2)
First:
Where lambda is multiplier for penalty function.
I am using optim to solve this problem:
minimize <- function(beta){
value <- (1/2)*(t(beta)%*%K%*%beta) - t(k)%*%beta + 0.1*abs(sum(beta)-n_s) # lambda = 0.1
return(value)
}
betta2 = optim(c(0, 0, 0, 0, 0), minimize)
betta2$par
Which method in optim should I use for better and faster optimization?
Second:
With condition:
Where n_s and epsilon are known constants.
In other words, I have constrained and unconstrained minimization tasks.
I know, I can find a solution of second task using solve.QP, but constraints there might be only linear.
For example, I can minimize my second task using solve.QP:
betta = solve.QP(K, k, K, k, meq = 0, factorized = FALSE)
But I can't add my special constraints there (and I don't really like this solution):
A <- matrix(1:1, ncol = 5)
n_s = 5
epsilon = 0.1
betta = solve.QP(K, k, A, c(-(n_s*epsilon + n_s)), meq = 0, factorized = FALSE) # gets an error: Amat and dvec are incompatible!
How could I correctly use optim (or maybe another?) function to solve my constraint optimization problem (with my special constraint)? Which method should I choose for it?
Im trying to do a 10-fold cross validation and estimate the model performance of a joint model by using parallel processing (parLapply). Im trying to find out why I receive the error message:
"Error in checkForRemoteErrors(val): five nodes produced an error: object 'Week' not found"
The code looks as follows:
# Validation using 10-fold CV
library("parallel")
set.seed(123)
V <- 10
n <- nrow(dfC)
splits <- split(seq_len(n), sample(rep(seq_len(V), length.out = n)))
CrossValJM <- function (i) {
library("JM")
library("nlme")
trainingData <- dfL[!dfL$ID %in% i, ]
trainingData_ID <- trainingData[!duplicated(trainingData$ID), ]
testingData <- dfL[dfL$ID %in% i, ]
lmeFit <- lme(DA ~ ns(Week, 2), data = trainingData,
random = ~ ns(Week, 2) | ID)
coxFit <- coxph(Surv(TT_event, Event) ~ Gender * Age, data =
trainingData_ID,
x = TRUE)
jointFit <- jointModel(lmeFit, coxFit, timeVar = "Week")
pe <- prederrJM(jointFit, newdata = testingData, Tstart = 10,
Thoriz = 20)
auc <- aucJM(jointFit, newdata = testingData, Tstart = 10,
Thoriz = 20)
list(pe = pe, auc = auc)
}
cl <- makeCluster(5)
res <- parLapply(cl, splits, CrossValJM)
stopCluster(cl)
The function itself gets accepted but when running the Cluster commands I run into this error that mentions that it cannot recognize objects given within the function.. should they be defined within the function itself?? Or am I not using the parLapply function correctly?
P.S.: data looks as follows (dfL is a dataframe of length ~ 1000 and dfC ~ 200):
dfL <- data.frame(ID = c(1, 1, 1, 2, 2, 3), DA = c(0.4, 1.8, 1.2, 3.2, 3.6, 2.8), Week = c(0, 4, 16, 4, 20, 8), Event = c(1, 1, 1, 0, 0, 1), TT_Event = c(16, 20, 8), Gender = c(0, 0, 0, 1, 1, 0), Age = c(24, 24, 24, 56, 56, 76))
dfC <- data.frame(ID = c(1, 2, 3, 4, 5, 6), DA = c(1.2, 3.6, 2.8, 2.4, 1.9, 3.4), Week = c(16, 20, 8, 36, 24, 32), Event = c(1, 0, 1, 1, 1, 0), TT_Event = c(16, 20, 8, 36, 24, 32), Gender = c(0, 1, 0, 0, 1, 1), Age = c(24, 56, 76, 38, 44, 50))
Thnx :)
Very related questions have already been answered on Stack Overflow.
Basically, you have three solutions:
use clusterExport() to export the variables you need to the clusters (the most common method)
pass all variables as arguments of your function CrossValJM() so that they are automatically exported to the clusters (the solution I prefer, the most programmatically correct one)
use R package {future} which should detect automatically variables to export (the lazy solution, but seems to work well also)
See for example this.
I've coded my own Poisson likelihood function, but it is returning values that are significantly different from glm for a model with an interaction for a specific data. Notice that the function spits out exactly the same result as glm from all other data I've tried, as well as for the model without the interaction for this data.
> # Log likelihood function
> llpoi = function(X, y){
+ # Ensures X is a matrix
+ if(class(X) != "matrix") X = as.matrix(X)
+ # Ensures there's a constant
+ if(sum(X[, 1]) != nrow(X)) X = cbind(1, X)
+ # A useful scalar that I'll need below
+ k = ncol(X)
+ ## Function to be maximized
+ FUN = function(par, X, y){
+ # beta hat -- the parameter we're trying to estimate
+ betahat = par[1:k]
+ # mu hat -- the systematic component
+ muhat = X %*% betahat
+ # Log likelihood function
+ sum(muhat * y - exp(muhat))
+ }
+ # Optimizing
+ opt = optim(rep(0, k), fn = FUN, y = y, X = X, control = list(fnscale = -1), method = "BFGS", hessian = T)
+ # Results, including getting the SEs from the hessian
+ cbind(opt$par, sqrt(diag(solve(-1 * opt$hessian))))
+ }
>
> # Defining inputs
> y = c(2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 3, 1, 1, 3, 2, 2, 2, 3, 1, 2, 4, 3, 3, 3, 1, 3, 0, 2, 1, 2, 4, 1, 2, 0, 2, 1, 2, 1, 4, 1, 2, 0)
> x1 = c(8, 1, 0, 3, 3, 3, 5, 4, 0.4, 1.5, 2, 1, 1, 7, 2, 3, 0, 2, 1.5, 5, 1, 4, 5.5, 6, 3, 3, 2, 0.5, 5, 10, 3, 22, 20, 3, 20, 10, 15, 25, 15, 6, 3.5, 5, 18, 2, 15.0, 16, 24)
> x2 = c(12, 12, 12, 16, 12, 12, 12, 12, 12, 12, 12, 12, 9, 9, 12, 9, 12, 12, 9, 16, 9, 6, 12, 9, 9, 12, 12, 12, 12, 14, 14, 14, 9, 12, 9, 12, 3, 12, 9, 6, 12, 12, 12, 12, 12, 12, 9)
>
> # Results
> withmyfun = llpoi(cbind(x1, x2, x1 * x2), y)
> round(withmyfun, 2)
[,1] [,2]
[1,] 0.96 0.90
[2,] -0.05 0.09
[3,] -0.02 0.08
[4,] 0.00 0.01
> withglm = glm(y ~ x1 + x2 + x1 * x2, family = "poisson")
> round(summary(withglm)$coef[, 1:2], 2)
Estimate Std. Error
(Intercept) 1.08 0.90
x1 -0.07 0.09
x2 -0.03 0.08
x1:x2 0.00 0.01
Is this something data specific? Is it inherent to the
optimization process, which will eventually diverge more significantly from glm and I just got unlucky with this data? Is it a function of using method = "BFGS" for optim?
By rescaling the right-hand side variables, the outcome improves a lot.
> library(data.table)
> setDT(tmp)
> tmp[, x1 := scale(x1)][, x2 := scale(x2)]
>
>
> withmyfun = with(tmp, llpoi(cbind(x1, x2, x1 * x2), y))
> withmyfun
[,1] [,2]
[1,] 0.57076392 0.1124637
[2,] -0.19620040 0.1278070
[3,] -0.01509032 0.1169019
[4,] 0.05636459 0.1380611
>
> withglm = glm(y ~ x1 + x2 + x1 * x2, family = "poisson", data = tmp)
> summary(withglm)$coef[, 1:2]
Estimate Std. Error
(Intercept) 0.57075132 0.1124641
x1 -0.19618199 0.1278061
x2 -0.01507467 0.1169034
x1:x2 0.05636934 0.1380621
>
So, my recommendation is, inside llpoi, to have a procedure to normalize the variables before using optim to the data and rescale the estimates based before the function returns the value. Your example data have too big range, which results in very small estimates of coefficients. This problem gets worse because of the relatively flat likelihood surface because of insignificant variables.
Note:
You can get very close outputs from this except for the intercept. What I meant by standardizing is something like that.
llpoi = function(X, y){
# Ensures X is a matrix
if(class(X) != "matrix") X = as.matrix(X)
# Ensures there's a constant
if(sum(X[, 1]) != nrow(X)) X = cbind(1, X)
# A useful scalar that I'll need below
avgs <- c(0, apply(X[, 2:ncol(X)], 2, mean))
sds <- c(1, apply(X[, 2:ncol(X)], 2, sd))
X<- t((t(X) - avgs)/sds)
k = ncol(X)
## Function to be maximized
FUN = function(par, X, y){
# beta hat -- the parameter we're trying to estimate
betahat = par[1:k]
# mu hat -- the systematic component
muhat = X %*% betahat
# Log likelihood function
sum(muhat * y - exp(muhat))
}
# Optimizing
opt = optim(rep(0, k), fn = FUN, y = y, X = X, control = list(fnscale = -1), method = "BFGS", hessian = T)
# Results, including getting the SEs from the hessian
cbind(opt$par, sqrt(diag(solve(-1 * opt$hessian))))/sds
}
After much research, I learned that the two results differ because glm.fit, the workhorse behind glm optimizes the function through Newton-Raphson method, while I used BFGS in my llpoi function. BFGS is faster, but less precise. The two results will be very similar on most cases, but may differ more significantly when the surface area is too flat or has too many maxima, as correctly pointed out by amatsuo_net, because the climbing algorithm used by BFGS will get stuck.
Error in data - x : non-numeric argument to binary operator
My code is as follows:
x <- as.factor(c(2, 2, 8, 5, 7, 6, 1, 4))
y <- as.factor(c(10, 5, 4, 8, 5, 4, 2, 9))
coordinates <- data.frame(x, y)
colnames(coordinates) <- c("x_coordinate", "y_coordinate")
print(coordinates)
point_clusters <- dbscan(coordinates, 2, MinPts = 2, scale = FALSE,
method = c("hybrid", "raw", "dist"), seeds = TRUE,
showplot = 1, countmode = NULL)
point_clusters
But I'm getting following error while executing the above code:
> point_clusters <- dbscan(coordinates, 2, MinPts = 2, scale = FALSE, method = c("hybrid", "r ..." ... [TRUNCATED]
Error in data - x : non-numeric argument to binary operator
I don't know what is the problem with above code.
I solved the problem as per my need. I saw somewhere that the data needs to be numeric matrix, although I'm not sure about that. So, here is what I did:
x <- c(2, 2, 8, 5, 7, 6, 1, 4)
y <- c(10, 5, 4, 8, 5, 4, 2, 9)
coordinates <- matrix(c(x, y), nrow = 8, byrow = FALSE)
Remaining code is same as above. Now it works fine for me.
Suppose I have the following equations:
x + 2y + 3z = 20
2x + 5y + 9z = 100
5x + 7y + 8z = 200
How do I solve these equations for x, y and z? I would like to solve these equations, if possible, using R or any other computer tools.
This should work
A <- matrix(data=c(1, 2, 3, 2, 5, 9, 5, 7, 8), nrow=3, ncol=3, byrow=TRUE)
b <- matrix(data=c(20, 100, 200), nrow=3, ncol=1, byrow=FALSE)
round(solve(A, b), 3)
[,1]
[1,] 320
[2,] -360
[3,] 140
For clarity, I modified the way the matrices were constructed in the previous answer.
a <- rbind(c(1, 2, 3),
c(2, 5, 9),
c(5, 7, 8))
b <- c(20, 100, 200)
solve(a, b)
In case we need to display fractions:
library(MASS)
fractions(solve(a, b))
Another approach is to model the equations using lm as follows:
lm(b ~ . + 0,
data = data.frame(x = c(1, 2, 5),
y = c(2, 5, 7),
z = c(3, 9, 8),
b = c(20, 100, 200)))
which produces
Coefficients:
x y z
320 -360 140
If you use the tibble package you can even make it read just like the original equations:
lm(b ~ . + 0,
tibble::tribble(
~x, ~y, ~z, ~b,
1, 2, 3, 20,
2, 5, 9, 100,
5, 7, 8, 200))
which produces the same output.
A <- matrix(data=c(1, 2, 3, 2, 5, 9, 5, 7, 8),nrow=3,ncol=3,byrow=TRUE)
b <- matrix(data=c(20, 100, 200),nrow=3,ncol=1,byrow=FALSE)
solve(A)%*% b
Note that this is a square matrix!