I'm trying to 'tidy' up a binary regression (so using a log link not a logit link -> so I get RR estimates not OR) using the broom function 'tidy' on a 'glm2' object. However its giving me an error saying
> tidy(model, conf.int=TRUE, exponentiate=TRUE)
Error: no valid set of coefficients has been found: please supply starting values
Here is a reproducible example of what I mean:
library(tidyverse)
library(glm2)
library(broom)
data(iris)
glimpse(iris)
table(iris$Species)
##create an outcome
df <-iris %>%
mutate(outcome = case_when(Petal.Width>2 ~1,
TRUE ~0))
#fit stardard glm
glm(outcome ~ Sepal.Length+Sepal.Width, data=df,
family = binomial(link="log"))
# -> doesnt converge using a log link due to parameter space issues (common in fitting binary regression).
# go to glm2 to fit the model instead, but need starting values for this:
p0 <- sum(as.numeric(df$outcome))/length(as.numeric(df$outcome))
start.val <- c(log(p0),rep(0,2))
model<-glm2(outcome ~ Sepal.Length+Sepal.Width, data=df,
family = binomial(link="log"),
start = start.val)
##get warnings, but converges
model$converged
##now tidy up and display model
tidy(model, conf.int=TRUE, exponentiate=TRUE)
#error -> wants starting values again? also shows warnings from previous
# (which are now saying model hasnt converged?)
tidy(model, conf.int=TRUE, exponentiate=TRUE, start=start.val)
# doesnt recognise starting values?
Any ideas on how to get tidy to work, or do I just do it manually?
Related
I am trying to extract random intercepts from tidymodels using lme4 and multilevelmod. I able to do this using lme4 below:
Using R and lme4:
library("tidyverse")
library("lme4")
# set up model
mod <- lmer(Reaction ~ Days + (1|Subject),data=sleepstudy)
# create expanded df
expanded_df <- with(sleepstudy,
data.frame(
expand.grid(Subject=levels(Subject),
Days=seq(min(Days),max(Days),length=51))))
# create predicted df with **random intercepts**
predicted_df <- data.frame(expanded_df,resp=predict(mod,newdata=expanded_df))
predicted_df
# plot intercepts
ggplot(predicted_df,aes(x=Days,y=resp,colour=Subject))+
geom_line()
Using tidymodels:
# example from
# https://github.com/tidymodels/multilevelmod
library("multilevelmod")
library("tidymodels")
library("tidyverse")
library("lme4")
#> Loading required package: parsnip
data(sleepstudy, package = "lme4")
# set engine to lme4
mixed_model_spec <- linear_reg() %>% set_engine("lmer")
# create model
mixed_model_fit_tidy <-
mixed_model_spec %>%
fit(Reaction ~ Days + (1 | Subject), data = sleepstudy)
expanded_df_tidy <- with(sleepstudy,
data.frame(
expand.grid(Subject=levels(Subject),
Days=seq(min(Days),max(Days),length=51))))
predicted_df_tidy <- data.frame(expanded_df_tidy,resp=predict(mixed_model_fit_tidy,new_data=expanded_df_tidy))
ggplot(predicted_df_tidy,aes(x=Days,y=.pred,colour=Subject))+
geom_line()
Using the predict() function seems to gives only the fixed effect predictions.
Is there a way to extract the random intercepts from tidymodels and multilevelmod? I know the package is still in development so it might not be possible at this stage.
I think you can work around this as follows:
predicted_df_tidy <- mutate(expanded_df_tidy,
.pred = predict(mixed_model_fit_tidy,
new_data=expanded_df_tidy,
type = "raw", opts=list(re.form=NULL)))
bind_cols() instead of mutate() might be useful in some circumstances?
the issue is that multilevelmod internally sets the default for prediction to re.form = NA; the code above resets it to re.form = NULL (which is the lme4 default, i.e. include all random effects in the prediction)
If you actually want the random intercepts (only) I guess you could predicted_df_tidy %>% filter(Days==0)
PS If you want to be more 'tidy' about this I think you can use purrr::cross_df() in place of expand.grid and pipe the results directly to mutate() ...
I am trying to plot my svm model.
library(foreign)
library(e1071)
x <- read.arff("contact-lenses.arff")
#alt: x <- read.arff("http://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/contact-lenses.arff")
model <- svm(`contact-lenses` ~ . , data = x, type = "C-classification", kernel = "linear")
The contact lens arff is the inbuilt data file in weka.
However, now i run into an error trying to plot the model.
plot(model, x)
Error in plot.svm(model, x) : missing formula.
The problem is that in in your model, you have multiple covariates. The plot() will only run automatically if your data= argument has exactly three columns (one of which is a response). For example, in the ?plot.svm help page, you can call
data(cats, package = "MASS")
m1 <- svm(Sex~., data = cats)
plot(m1, cats)
So since you can only show two dimensions on a plot, you need to specify what you want to use for x and y when you have more than one to choose from
cplus<-cats
cplus$Oth<-rnorm(nrow(cplus))
m2 <- svm(Sex~., data = cplus)
plot(m2, cplus) #error
plot(m2, cplus, Bwt~Hwt) #Ok
plot(m2, cplus, Hwt~Oth) #Ok
So that's why you're getting the "Missing Formula" error.
There is another catch as well. The plot.svm will only plot continuous variables along the x and y axes. The contact-lenses data.frame has only categorical variables. The plot.svm function simply does not support this as far as I can tell. You'll have to decide how you want to summarize that information in your own visualization.
I'm using the MuMln package in R to get an averaged model (http://www.inside-r.org/packages/cran/MuMIn/docs/model.avg), and predict from that. The package also includes a predict function specifically for an object returned by model.avg (http://www.inside-r.org/node/123636). I've tried using the examples listed, code as follows:
# Example from Burnham and Anderson (2002), page 100:
fm1 <- lm(y ~ X1 + X2 + X3 + X4, data = Cement)
ms1 <- dredge(fm1)
# obtain model average for AIC delta <2
avgm <- model.avg(ms1, subset=delta<2)
# predict from the averaged model
averaged.full <- predict(avgm, full = TRUE)
But I keep getting
Error in predict.averaging(avgm, full = TRUE): can predict only from 'averaging' object containing model list
which I don't understand, because I did follow the examples and used an object returned by model.avg. Am I missing something?
When you create an "averaging" object directly from "model.selection" object, it does not contain the component models, which are required for predict to work. You can use model.avg(..., fit = TRUE) which will fit the models again.
To avoid fitting the models twice, you can first create a list of all models with
lapply(dredge(..., evaluate = FALSE), eval) and afterwards
use model.avg(..., subset = ...) on it.
I am trying to predict fitted values over data containing NAs, and based on a model generated by plm. Here's some sample code:
require(plm)
test.data <- data.frame(id=c(1,1,2,2,3), time=c(1,2,1,2,1),
y=c(1,3,5,10,8), x=c(1, NA, 3,4,5))
model <- plm(y ~ x, data=test.data, index=c("id", "time"),
model="pooling", na.action=na.exclude)
yhat <- predict(model, test.data, na.action=na.pass)
test.data$yhat <- yhat
When I run the last line I get an error stating that the replacement has 4 rows while data has 5 rows.
I have no idea how to get predict return a vector of length 5...
If instead of running a plm I run an lm (as in the line below) I get the expected result.
model <- lm(y ~ x, data=test.data, na.action=na.exclude)
As of version 2.6.2 of plm (2022-08-16), this should work out of the box: Predict out of sample on fixed effects model (from the NEWS file:
prediction implemented for fixed effects models incl. support for argument newdata and out-of-sample prediction. Help page (?predict.plm) added to specifically explain the prediction for fixed effects models and the out-of-sample case.
I think this is something that predict.plm ought to handle for you -- seems like an oversight on the package authors' part -- but you can use ?napredict to implement it for yourself:
pp <- predict(model, test.data)
na.stuff <- attr(model$model,"na.action")
(yhat <- napredict(na.stuff,pp))
## [1] 1.371429 NA 5.485714 7.542857 9.600000
I am using NeweyWest standard errors to correct my lm() / dynlm() output. E.g.:
fit1<-dynlm(depvar~covariate1+covariate2)
coeftest(fit1,vcov=NeweyWest)
Coefficients are displayed the way I´d like to, but unfortunately I loose all the regression output information like R squared, F-Test etc. that is displayed by summary. So I wonder how I can display robust se and all the other stuff in the same summary output.
Is there a way to either get everything in one call or to overwrite the 'old' estimates?
I bet I just missed something badly, but that is really relevant when sweaving the output.
Test example, taken from ?dynlm.
require(dynlm)
require(sandwich)
data("UKDriverDeaths", package = "datasets")
uk <- log10(UKDriverDeaths)
dfm <- dynlm(uk ~ L(uk, 1) + L(uk, 12))
#shows R-squared, etc.
summary(dfm)
#no such information
coeftest(dfm, vcov = NeweyWest)
btw.: same applies for vcovHC
coefficients is just a matrix in the lm (or dynlm) summary object, so all you need to do is unclass the coeftest() output.
library(dynlm)
library(sandwich)
library(lmtest)
temp.lm <- dynlm(runif(100) ~ rnorm(100))
temp.summ <- summary(temp.lm)
temp.summ$coefficients <- unclass(coeftest(temp.lm, vcov. = NeweyWest))
If you specify the covariance matrix, the F-statistics change and you need to compute it again using waldtest() right? Because
temp.summ$coefficients <- unclass(coeftest(temp.lm, vcov. = NeweyWest))
only overwrites the coefficients.
F-statistics change but R^2 remains the same .