I initially wanted to run a boxTidwell() (found in the "car" package) analysis on my prospective Logistic Regression model (BinaryOutcomeVar ~ ContinuousPredVar + ContinuousPredVar^2 + ContinuousPredVar^3). I ran into issues:
Error in x - xbar : non-numeric argument to binary operator
In addition: Warning message:
In mean.default(x) : argument is not numeric or logical: returning NA
So, I created a reproducable example for demonstrating the error:
Doesn't work:
boxTidwell(formula = Treatment ~ uptake, other.x = ~ poly(x = colnames(CO2)[c(1,2,4)], degree = 2), data = CO2)
boxTidwell(y = CO2$Treatment, x = CO2$uptake)
Works:
boxTidwell(formula = prestige ~ income + education, other.x = ~ poly(x = women , degree = 2), data = Prestige)
I've been goofing around with the other.x parameter and am guessing that's the issue.
Question
So, does anyone know if 1. the boxTidwell() function works with binary outcome variables 2. the logic behind the other.x, because I can't get my dummy example to work either.
After further searching, it looks like the car:::boxTidwell can't handle the binary outcome variable in the formula, but it can be hand coded:
require(MASS)
require(car)
d1<-read.csv("path for your csv file",sep=',',header=TRUE)
x<-d1$explanatory variable name
y<-d1$dependent variable name
#FIT IS DONE USING THE glm FUNCTION
m1res <- glm(y ~ x,family=binomial(link = "logit"))
coeff1<- coefficients(summary(m1res))
lnx<-x*log(x)
m2res <- glm(y ~ x+lnx ,family=binomial(link = "logit"))
coeff2<- coefficients(summary(m2res))
alpha0<-1.0
pvalue<-coeff2[3,4]
pvalue
beta1<-coeff1[2,1]
beta2<-coeff2[3,1]
iter<-0
err<-1
while (pvalue<0.1) {
alpha <-(beta2/beta1)+alpha0
err<-abs(alpha-alpha0)
alpha0<-alpha
mx<-x^alpha
m1res <- glm(y ~ mx,family=binomial(link = "logit"))
coeff1<- coefficients(summary(m1res))
mlnx<-mx*log(x)
m2res <- glm(y ~ mx+mlnx ,family=binomial(link = "logit"))
coeff2<- coefficients(summary(m2res))
pvalue<-coeff2[3,4]
beta1<-coeff1[2,1]
beta2<-coeff2[3,1]
iter<- iter+1
}
# PRINT THE POWER TO CONSOLE
alpha
above code taken from:
https://sites.google.com/site/ayyalaprem/box-tidwelltransform
Related
I fitted a model using the lmer() function (it works well). I have 11 explanatory variables. Three of them, if present in model, cause the step() function (from package lmerTest) to return the error: "Variables length differ (found on "...")" where "..." is the formula call.
I don't have any NA values in the data: there are 600 rows and all three of the problematic variables (H, I, J) are factors.
My code is:
library(purrr) ## for rdunif()
library(lmerTest)
data2 = as.data.frame(matrix(c(rdunif(600*7,1,5),
rdunif(600*3,0,1),
rdunif(600,1,9),
rep(c("a","b"),300)),
nrow = 600), byrow = FALSE)
names(data2) = c("A","B","C","D", "E","F","G","H","I","J","Z","M")
data2[,7:10] = lapply(data2[,7:10],factor)
data2[,c(1:6,11)] = lapply(data2[,c(1:6,11)],as.numeric)
mod1 = lmer(Z ~ A+B+C+D+E+F+G+
#H+
#I+
#J+
(1|M),data2)
step.mod1 = lmerTest::step(mod1) #it works
#
mod2 = lmer(Z ~ A+B+C+D+E+F+G+H+
#I+
#J+
(1|M),data2)
step.mod2 = lmerTest::step(mod2) #it does not work and returns: Variables length differ (found on "A+B+C+D+E+F+G+")
mod3 = lmer(Z ~ A+B+C+D+E+F+G+H+I+J+
(1|M),data2)
step.mod3 = lmerTest::step(mod3) #it does not work and returns: Variables length differ (found on "A+B+C+D+E+F+G+H+I+")
I know that this error is common when there are NAs, but what is the error in this case? How can I fix it?
I'm doing a bit of exploratory data analysis using HMDA data from the AER package; however, the variables that I used to fit the model seem to contain some observations that perfectly determine the outcomes, an issue known as "separation." So I tried to remedy this using the solution recommended by this thread, yet when I tried to execute the first set of source code from glm.fit(), R returned an error message:
Error in family$family : object of type 'closure' is not subsettable
so I could not proceed any further to remove those fully determined observations from my data with this code. I am wondering if anyone could help me fix this?
My current code is provided at below for your reference.
# load the AER package and HMDA data
library(AER)
data(HMDA)
# fit a 2-degree olynomial probit model
probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial, data = HMDA)
# using the revised source code from that stackexchage thread to find out observations that received a warning message
library(tidyverse)
library(dplyr)
library(broom)
eps <- 10 * .Machine$double.eps
if (family$family == "binomial") {
if (any(mu > 1 - eps) || any(mu < eps))
warning("glm.fit: fitted probabilities numerically 0 or 1 occurred",
call. = FALSE)
}
# this return the following error message
# Error in family$family : object of type 'closure' is not subsettable
probit.resids <- augment(probit.fit) %>%
mutate(p = 1 / (1 + exp(-.fitted)),
warning = p > 1-eps)
arrange(probit.resids, desc(.fitted)) %>%
select(2:5, p, warning) %>%
slice(1:10)
HMDA.nwarning <- filter(HMDA, !probit.resids$warning)
# using HMDA.nwarning should solve the problem...
probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial, data = HMDA.nwarning)
This chunk of code
if (family$family == "binomial") {
if (any(mu > 1 - eps) || any(mu < eps))
warning("glm.fit: fitted probabilities numerically 0 or 1 occurred",
call. = FALSE)
}
there is a function, binomial() called when you run glm with family == "binomial". If you look under glm (just type glm):
if (is.character(family))
family <- get(family, mode = "function", envir = parent.frame())
if (is.function(family))
family <- family()
if (is.null(family$family)) {
print(family)
stop("'family' not recognized")
}
And the glm function checks binomial()$family during the fit, and if any of the predicted values differ from 1 or 0 by eps, it raises that warning.
You don't need to run that part, and yes, you need to set eps <- 10 * .Machine$double.eps . So let's run the code below, and if you run a probit, you need to specify link="probit" in binomial, otherwise the default is logit:
library(AER)
library(tidyverse)
library(dplyr)
library(broom)
data(HMDA)
probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial(link="probit"), data = HMDA)
eps <- 10 * .Machine$double.eps
probit.resids <- augment(probit.fit) %>%
mutate(p = 1 / (1 + exp(-.fitted)),
warning = p > 1-eps)
The column warning indicates if the observations raises a warning, in this dataset, there's one:
table(probit.resids$warning)
FALSE TRUE
2379 1
We can use the next step to filter it
HMDA.nwarning <- filter(HMDA, !probit.resids$warning)
dim(HMDA.nwarning)
[1] 2379 14
And rerun the regression:
probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial(link="probit"), data = HMDA.nwarning)
coefficients(probit.fit)
(Intercept) poly(hirat, 2)1 poly(hirat, 2)2
-1.191292 8.708494 6.884404
I fitted a lasso logistic model with interaction terms. Then i wanted to visualize those interactions using a interaction plot.
I tried to find some R function that will plot interactions for glmnet models and i couldnt find any .
Is there any R package that will plot interactions for LASSO ?
Since i couldnt find any, i tried to do it manually , by plotting the predicted values. But i am getting some errors.
My code is as follows,
require(ISLR)
require(glmnet)
y <- Smarket$Direction
x <- model.matrix(Direction ~ Lag1 + Lag4* Volume, Smarket)[, -1]
lasso.mod <- cv.glmnet(x, y, alpha=1,family="binomial",nfolds = 5, type.measure="class",
lambda = seq(0.001,0.1,by = 0.001))
lasso.mod$lambda.min
pred = expand.grid(Lag1 = median(Smarket$Lag1),
Lag4 = c(-0.64,0.0385,0.596750),
Volume = seq(min(Smarket$Volume), max(Smarket$Volume), length=100))
lasso.mod1 <- glmnet(x, y, alpha=1,family="binomial",
lambda = lasso.mod$lambda.min)
pred$Direction = predict(lasso.mod1, newx=pred,
type="response", s= lasso.mod$lambda.min)
i am getting this error :
Error in cbind2(1, newx) %*% nbeta :
not-yet-implemented method for <data.frame> %*% <dgCMatrix>
Can any suggest anything to fix this issue ?
Thank you
predict.glmnet says newx must be a matrix. And you need to give interaction value by yourself.
library(dplyr)
pred = expand.grid(Lag1 = median(Smarket$Lag1),
Lag4 = c(-0.64,0.0385,0.596750),
Volume = seq(min(Smarket$Volume), max(Smarket$Volume), length=100)) %>%
mutate(`Lag4:Volume` = Lag4 * Volume) # preparing interaction values
pred$Direction = predict(lasso.mod1, newx = as.matrix(pred), # convert to matrix
type = "link", s= lasso.mod$lambda.min)
[EDITED]
Oh, I overlooked more general, better way.
pred = expand.grid(Lag1 = median(Smarket$Lag1),
Lag4 = c(-0.64,0.0385,0.596750),
Volume = seq(min(Smarket$Volume), max(Smarket$Volume), length=100))
pred$Direction = predict(lasso.mod1,
newx = model.matrix( ~ Lag1 + Lag4* Volume, pred)[, -1],
type="response", s= lasso.mod$lambda.min)
I tried running the Theoph model provided in the R documentation for nlmeODE package with my own dataset and it was able to return the parameter estimates. However, when I added an additional parameter that I wanted to estimate, I received an error message as shown below. Could anyone advise what went wrong and how I could fix this?
The example model provided in the R documentation looks like:
data(Theoph)
TheophODE <- Theoph
TheophODE$Dose[TheophODE$Time!=0] <- 0
TheophODE$Cmt <- rep(1,dim(TheophODE)[1])
OneComp <- list(DiffEq=list(
dy1dt = ~ -ka*y1 ,
dy2dt = ~ ka*y1-ke*y2),
ObsEq=list(
c1 = ~ 0,
c2 = ~ y2/CL*ke),
Parms=c("ka","ke","CL"),
States=c("y1","y2"),
Init=list(0,0))
TheophModel <- nlmeODE(OneComp,TheophODE)
Theoph.nlme <- nlme(conc ~ TheophModel(ka,ke,CL,Time,Subject),
data = TheophODE, fixed=ka+ke+CL~1, random = pdDiag(ka+CL~1),
start=c(ka=0.5,ke=-2.5,CL=-3.2),
control=list(returnObject=TRUE,msVerbose=TRUE),
verbose=TRUE)
plot(augPred(Theoph.nlme,level=0:1))
My modified model looks like:
group1 <- groupedData(conc ~ Time | Subject,
data=myowndata)
OneComp2 <- list(DiffEq=list(
dy1dt = ~ -ka*y1 + kr*y2,
dy2dt = ~ ka*y1 - kr*y2 -ke*y2), #I added in a reabsorption rate kr
ObsEq=list(
c1 = ~ 0,
c2 = ~ y2/CL*ke),
Parms=c("ka","ke","kr","CL"),
States=c("y1","y2"),
Init=list(0,0))
Model2 <- nlmeODE(OneComp2,group1)
nlme2 <- nlme(conc ~ Model2(ka,ke,kr,CL,Time,Subject),
data = group1, fixed=ka+ke+kr+CL~1, random = pdDiag(ka+CL~1),
start=c(ka=0.5,ke=-2.5,kr=-2.5,CL=-3.2),
control=list(returnObject=TRUE,msVerbose=TRUE),
verbose=TRUE)
The error message reads:
Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model
I'm having some problems with the predict function when using bayesglm. I've read some posts that say this problem may arise when the out of sample data has more levels than the in sample data, but I'm using the same data for the fit and predict functions. Predict works fine with regular glm, but not with bayesglm. Example:
control <- y ~ x1 + x2
# this works fine:
glmObject <- glm(control, myData, family = binomial())
predicted1 <- predict.glm(glmObject , myData, type = "response")
# this gives an error:
bayesglmObject <- bayesglm(control, myData, family = binomial())
predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response")
Error in X[, piv, drop = FALSE] : subscript out of bounds
# Edit... I just discovered this works.
# Should I be concerned about using these results?
# Not sure why is fails when I specify the dataset
predicted3 <- predict(bayesglmObject, type = "response")
Can't figure out how to predict with a bayesglm object. Any ideas? Thanks!
One of the reasons could be to do with the default setting for the parameter "drop.unused.levels" in the bayesglm command. By default, this parameter is set to TRUE. So if there are unused levels, it gets dropped during model building. However, the predict function still uses the original data with the unused levels present in the factor variable. This causes differences in level between the data used for model building and the one used for prediction (even it is the same data fame -in your case, myData). I have given an example below:
n <- 100
x1 <- rnorm (n)
x2 <- as.factor(sample(c(1,2,3),n,replace = TRUE))
# Replacing 3 with 2 makes the level = 3 as unused
x2[x2==3] <- 2
y <- as.factor(sample(c(1,2),n,replace = TRUE))
myData <- data.frame(x1 = x1, x2 = x2, y = y)
control <- y ~ x1 + x2
# this works fine:
glmObject <- glm(control, myData, family = binomial())
predicted1 <- predict.glm(glmObject , myData, type = "response")
# this gives an error - this uses default drop.unused.levels = TRUE
bayesglmObject <- bayesglm(control, myData, family = binomial())
predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response")
Error in X[, piv, drop = FALSE] : subscript out of bounds
# this works fine - value of drop.unused.levels is set to FALSE
bayesglmObject <- bayesglm(control, myData, family = binomial(),drop.unused.levels = FALSE)
predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response")
I think a better way would be to use droplevels to drop the unused levels from the data frame beforehand and use it for both model building and prediction.