R Predict using multiple models - r

I am new to R and trying to predict outcomes on a dataset using 4 different GLM's. I have tried running as one large model and while I do get results the model doesn't converge properly and I end up with N/A's. I therefore have four models:
model_team <- glm(mydata$OUT ~ TEAM + OPPONENT, family = "binomial",data = mydata )
model_conf <- glm(mydata$OUT ~ TCONF + OCONF, family = "binomial",data = mydata)
model_tstats <- glm(mydata$OUT ~ TPace + TORtg + TFTr + T3PAr + TTS. + TTRB. + TAST. + TSTL. + TBLK. + TeFG. + TTOV. + TORB. + TFT.FGA, family = "binomial",data = mydata)
model_ostats<- glm(mydata$OUT ~ OPace + OORtg + OFTr + O3PAr + OTS. + OTRB. + OAST. + OSTL. + OBLK. + OeFG. + OTOV. + OORB. + OFT.FGA, family = "binomial",data = mydata)
I then want to predict the outcomes using a different data set using the four models
predict(model_team, model_conf, model_tstats, model_ostats, fix, level = 0.95, type = "probs")
Is there a way to use all four models with joining them into one large set?

I don't really understand why you are trying to do what you are doing. I also don't have any example data that is a representation of the data you are working with. However, below is an example of how you could combine multiple GLMs into one using the resulting coefficients. Note that this will not work well if you have multicollinearity between the variables in your dataset.
# I used the iris dataset for my example
head(iris)
# Run several models
model1 <- glm(data = iris, Sepal.Length ~ Sepal.Width)
model2 <- glm(data = iris, Sepal.Length ~ Petal.Length)
model3 <- glm(data = iris, Sepal.Length ~ Petal.Width)
# Get combined intercept
intercept <- mean(
coef(model1)['(Intercept)'],
coef(model2)['(Intercept)'],
coef(model3)['(Intercept)'])
# Extract coefficients
coefs <- as.matrix(
c(coef(model1)[2],
coef(model2)[2],
coef(model3)[2])
# Get the feature values for the predictions
ds <- as.matrix(iris[,c('Sepal.Width', 'Petal.Length', 'Petal.Width')])
# Linear algebra: Matrix-multiply values with coefficients
prediction <- ds %*% coefs + intercept
# Let's look at the results
plot(iris$Petal.Length, prediction)

Related

svyglm - how to code for a logistic regression model across all variables?

In R using GLM to include all variables you can simply use a . as shown How to succinctly write a formula with many variables from a data frame?
for example:
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
however I am struggling to do this with svydesign. I have many exploratory variables and an ID and weight variable, so first I create my survey design:
des <-svydesign(ids=~id, weights=~wt, data = df)
Then I try creating my binomial model using weights:
binom <- svyglm(y~.,design = des, family="binomial")
But I get the error:
Error in svyglm.survey.design(y ~ ., design = des, family = "binomial") :
all variables must be in design = argument
What am I doing wrong?
You typically wouldn't want to do this, because "all the variables" would include design metadata such as weights, cluster indicators, stratum indicators, etc
You can use col.names to extract all the variable names from a design object and then reformulate, probably after subsetting the names, eg with the api example in the package
> all_the_names <- colnames(dclus1)
> all_the_actual_variables <- all_the_names[c(2, 11:37)]
> reformulate(all_the_actual_variables,"y")
y ~ stype + pcttest + api00 + api99 + target + growth + sch.wide +
comp.imp + both + awards + meals + ell + yr.rnd + mobility +
acs.k3 + acs.46 + acs.core + pct.resp + not.hsg + hsg + some.col +
col.grad + grad.sch + avg.ed + full + emer + enroll + api.stu

Plot the impact for each variable in linear regression?

I want to create a plot like below for a lm model calculated using R.
Is there a simple way of doing it?
The plot above was collected here in this page.
Package {caret} offers a convenient method varImp:
Example:
library(caret)
my_model <- lm(mpg ~ disp + cyl, data = mtcars)
## > varImp(my_model)
##
## Overall
## disp 2.006696
## cyl 2.229809
For different measures of variable importance see ?varImp. Feed values into your plotting library of choice.
Extra: {ggstatsplot} calculates and plots a host of model stats for a plethora of model objects. This includes hypotheses about regression coefficients, for which method ggcoefstats() might serve your purpose (remember to scale predictor variables for meaningful comparison of coefficients though).
Following the method in the linked article (relative marginal increase in r squared), you could write your own function that takes a formula, and the data frame, then plots the relative importance:
library(ggplot2)
plot_importance <- function(formula, data) {
lhs <- as.character(as.list(formula)[[2]])
rhs <- as.list(as.list(formula)[[3]])
vars <- grep("[+\\*]", rapply(rhs, as.character), invert = TRUE, value = TRUE)
df <- do.call(rbind, lapply(seq_along(vars), function(i) {
f1 <- as.formula(paste(lhs, paste(vars[-i], collapse = "+"), sep = "~"))
f2 <- as.formula(paste(lhs, paste(c(vars[-i], vars[i]), collapse = "+"),
sep = "~"))
r1 <- summary(lm(f1, data = data))$r.squared
r2 <- summary(lm(f2, data = data))$r.squared
data.frame(variable = vars[i], importance = r2 - r1)
}))
df$importance <- df$importance / sum(df$importance)
df$variable <- reorder(factor(df$variable), -df$importance)
ggplot(df, aes(x = variable, y = importance)) +
geom_col(fill = "deepskyblue4") +
scale_y_continuous(labels = scales::percent) +
coord_flip() +
labs(title = "Relative importance of variables",
subtitle = deparse(formula)) +
theme_classic(base_size = 16)
}
We can test this out with the sample data provided in the linked article:
IV <- read.csv(paste0("https://statisticsbyjim.com/wp-content/uploads/",
"2017/07/ImportantVariables.csv"))
plot_importance(Strength ~ Time + Pressure + Temperature, data = IV)
And we see that the plot is the same.
We can also test it out on some built-in datasets to demonstrate that its use is generalized:
plot_importance(mpg ~ disp + wt + gear, data = mtcars)
plot_importance(Petal.Length ~ Species + Petal.Width, data = iris)
Created on 2022-05-01 by the reprex package (v2.0.1)
Just ended up using relaimpo package and showing with ggplot answered by #Allan Cameron
library(relaimpo)
relative_importance <- calc.relimp(mymodel, type="lmg")$lmg
df = data.frame(
variable=names(relative_importance),
importance=round(c(relative_importance) * 100,2)
)
ggplot(df, aes(x = reorder(variable, -importance), y = importance)) +
geom_col(fill = "deepskyblue4") +
geom_text(aes(label=importance), vjust=.3, hjust=1.2, size=3, color="white")+
coord_flip() +
labs(title = "Relative importance of variables") +
theme_classic(base_size = 16)

How to use predict function with my pooled results from mice()?

Hi I just started using R as part of a module in school. I have a data set with missing data and I have used mice() to impute the missing data. I'm now trying to use the predict function with my pooled results. However, I observed the following error:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('mipo', 'data.frame')"
I have included my entire code below and I'd greatly apprciate it if y'all can help a novice out. Thanks!
```{r}
library(magrittr)
library(dplyr)
train = read.csv("Train_Data.csv", na.strings=c("","NA"))
test = read.csv("Test_Data.csv", na.strings=c("","NA"))
cols <- c("naCardiac", "naFoodNutrition", "naGenitourinary", "naGastrointestinal", "naMusculoskeletal", "naNeurological", "naPeripheralVascular", "naPain", "naRespiratory", "naSkin")
train %<>%
mutate_each_(funs(factor(.)),cols)
test %<>%
mutate_each_(funs(factor(.)),cols)
str(train)
str(test)
```
```{r}
library(mice)
md.pattern(train)
```
```{r}
miTrain = mice(train, m = 5, maxit = 50, meth = "pmm")
```
```{r}
model = with(miTrain, lm(LOS ~ Age + Gender + Race + Temperature + RespirationRate + HeartRate + SystolicBP + DiastolicBP + MeanArterialBP + CVP + Braden + SpO2 + FiO2 + PO2_POCT + Haemoglobin + NumWBC + Haematocrit + NumPlatelets + ProthrombinTime + SerumAlbumin + SerumChloride + SerumPotassium + SerumSodium + SerumLactate + TotalBilirubin + ArterialpH + ArterialpO2 + ArterialpCO2 + ArterialSaO2 + Creatinine + Urea + GCS + naCardiac + GCS + naCardiac + naFoodNutrition + naGenitourinary + naGastrointestinal + naMusculoskeletal + naNeurological + naPeripheralVascular + naPain + naRespiratory + naSkin))
model
summary(model)
```
```{r}
modelResults = pool(model)
modelResults
```
```{r}
pred = predict(modelResults, newdata = test)
PredTest = data.frame(test$PatientID, modelResults)
str(PredTest)
summary(PredTest)
```
One slightly hacky way to achieve this may be to take one of the fitted models created by fit() and replace the stored coefficients with the final pooled estimates. I haven't done detailed testing but it seems to be working on this simple example:
library(mice)
imp <- mice(nhanes, maxit = 2, m = 2)
fit <- with(data = imp, exp = lm(bmi ~ hyp + chl))
pooled <- pool(fit)
# Copy one of the fitted lm models fit to
# one of the imputed datasets
pooled_lm = fit$analyses[[1]]
# Replace the fitted coefficients with the pooled
# estimates (need to check they are replaced in
# the correct order)
pooled_lm$coefficients = summary(pooled)$estimate
# Predict - predictions seem to match the
# pooled coefficients rather than the original
# lm that was copied
predict(fit$analyses[[1]], newdata = nhanes)
predict(pooled_lm, newdata = nhanes)
As far as I know predict() for a linear regression should only depend
on the coefficients, so you shouldn't have to replace other
stored values in the fitted model (but you would have to if applying
methods other than predict()).

How to plot 3 models in one Figure in R?

I'm new with R and I have fit 3 models for my data as follows:
Model 1: y = a(x) + b
lm1 = lm(data$CBI ~ data$dNDVI)
Model 2: y = a(x)2 + b(x) + c
lm2 <- lm(CBI ~ dNDVI + I(dNDVI^2), data=data)
Model 3: y = x(a|x| + b)–1
lm3 = nls(CBI ~ dNDVI*(a*abs(dNDVI) + b) - 1, start = c(a = 1.5, b = 2.7), data = data)
Now I would like to plot all these three models in R but I could not find the way to do it, can you please help me? I have tried with the first two models as follow and it work but I don't know how to add the Model 3 on it:
ggplot(data = data, aes(x = dNDVI, y = CBI)) +
geom_point() +
geom_smooth(method = lm, formula = y ~ x, size = 1, se = FALSE) +
geom_smooth(method = lm, formula = y ~ x + I(x^2), size = 1, se = FALSE ) +
theme_bw()
I also would like to add a legend which show 3 different colours or types of lines/curves for the 3 models as well. Can you please guide me how to make it in the figure?
Using iris as a dummy set to represent the three models:
new.dat <- data.frame(Sepal.Length=seq(min(iris$Sepal.Length),
max(iris$Sepal.Length), length.out=50)) #new data.frame to predict the fitted values for each model
m1 <- lm(Petal.Length ~ Sepal.Length, iris)
m2 <- lm(Petal.Length ~ Sepal.Length + I(Sepal.Length^2), data=iris)
m3 <- nls(Petal.Length ~ Sepal.Length*(a*abs(Sepal.Length) + b) - 1,
start = c(a = 1.5, b = 2.7), data = iris)
new.dat$m1.fitted <- predict(m1, new.dat)
new.dat$m2.fitted <- predict(m2, new.dat)
new.dat$m3.fitted <- predict(m3, new.dat)
new.dat <- new.dat %>% gather(var, val, m1.fitted:m3.fitted) #stacked format of fitted data of three models (to automatically generate the legend in ggplot)
ggplot(new.dat, aes(Sepal.Length, val, colour=var)) +
geom_line()

In R, is there a parsimonious or efficient way to get a regression prediction holding all covariates at their means?

I'm wondering if there is essentially a faster way of getting predictions from a regression model for certain values of the covariates without manually specifying the formulation. For example, if I wanted to get a prediction for a given dependent variable at means of the covariates, I can do something like this:
glm(ins ~ retire + age + hstatusg + qhhinc2 + educyear + married + hisp,
family = binomial, data = dat)
meanRetire <- mean(dat$retire)
meanAge <- mean(dat$age)
meanHStatusG <- mean(dat$hStatusG)
meanQhhinc2 <- mean(dat$qhhinc2)
meanEducyear <- mean(dat$educyear)
meanMarried <- mean(dat$married)
meanYear <- mean(dat$year)
ins_predict <- coef(r_3)[1] + coef(r_3)[2] * meanRetire + coef(r_3)[3] * meanAge +
coef(r_3)[4] * meanHStatusG + coef(r_3)[5] * meanQhhinc2 +
coef(r_3)[6] * meanEducyear + coef(r_3)[7] * meanMarried +
coef(r_3)[7] * meanHisp
Oh... There is a predict function:
fit <- glm(ins ~ retire + age + hstatusg + qhhinc2 + educyear + married + hisp,
family = binomial, data = dat)
newdat <- lapply(dat, mean) ## column means
lppred <- predict(fit, newdata = newdat) ## prediction of linear predictor
To get predicted response, use:
predict(fit, newdata = newdat, type = "response")
or (more efficiently from lppred):
binomial()$linkinv(lppred)

Resources