I have the following dataset, I wish to find the best varibles among pred1, pred2, pred3, pred5, and pred6 to buid a regression model to predict resp1 and resp2.
So far, I only find out pred2 seems like the one to use for resp1.
library(ggplot2)
testdat <- read.csv("testdat.csv", header = T)
plot(testdat$pred2,testdat$resp1)
m1<-lm(resp1~pred2, data=testdat)
smooth <- smooth.spline(testdat$pred2,testdat$resp1, spar=1)
lines(smooth, col='red', lwd = 2, lty = 2)
I tried ^2,sqrt,log to do the transformations, when I check summary(m1), no matter what I tried, the R-squared seems no higher than 53%, desperate.
The same about transforming variables to predict resp2, no higher than 66%
plot(testdat$pred3,testdat$resp2)
m1<-lm(resp2~pred3, data=testdat)
smooth <- smooth.spline(testdat$pred3,testdat$resp2, spar=1)
lines(smooth, col='red', lwd = 2, lty = 2)
Sample dataset:
https://www.filehosting.org/file/details/846977/testdat.csv
Having more than one variable is highly considered, if you are trying to generate a regression model to predict resp1 and then resp2 by using one or more varibles among pred1, pred2, pred3, pred5, and pred6. What would you do?
If you are looking to use linear regression, you could perform a RESET test on your data and work from there:
resettest(resp1~pred2, data=testdat, power = 2:3, type = "regressor)
If the p.value is lower than 0.05, then you have to change the model.
And use adjusted R2 instead of R2 to make comparisons between models.
Otherwise, more information regarding your data might be necessary (e.g. what is its nature?)
I have two glmer models with two covariates each that I'm trying to plot into a single figure.
MWE:
## generalized linear mixed model
library(lattice)
cbpp$response <- sample(c(0,1), replace=TRUE, size=nrow(cbpp))
gm1 <- glmer(response ~ size + incidence + (1 | herd),
data = cbpp, family = binomial)
cbpp$obs <- 1:nrow(cbpp)
gm2 <- glmer(response ~ size + incidence + (1 | herd) + (1|obs),
family = binomial, data = cbpp)
I am trying to plot the predicted values againts each covariate for each model. I found the sjPlot library and the plot_model function, which can plot these predictions when using type = "pred". Calling this function individually on each model works perfect and yields two separate figures like this for each model:
However I'm not familiar with R and I am having a hard time trying to plot the 4 plots on the same figure.
The plot_model function has a grid parameter, which only works for models with a Poisson distirbution. For gm1 and gm2, I am getting the following error when I call plot_model(gm1, type = "pred", grid = TRUE):
Error in if (attr(x, "logistic", exact = TRUE) == "1" && attr(x, "is.trial", : missing value where TRUE/FALSE needed
Anyway, I would not be able to plot the three models in one figure using this so I tried three different approaches. First, I saw the plot_models function, which takes multiple models as input. When I try to pass the two models as arguments, calling plot_models(gm1, gm2) I get the following error:
Error: $ operator not defined for this S4 class
Second, I tried using the par function setting the mfrow and then calling plot_model again without success. I don't get any error but the plots keep showing as individual figures.
Third, I tried using the gridExtra library. Calling
p1 <- plot_model(gm1, type = "pred")
p2 <- plot_model(gm2, type = "pred")
grid.arrange(p1, p2)
results in the following error:
Error in gList(list(ppt = list(data = list(x = c(-2, -1, 0, 1, 2, 3, 4, : only 'grobs' allowed in "gList"
Does anyone have an insight on this?
EDIT
This seems to work:
pp1 <- plot_model(gm1,type="pred")
pp2 <- plot_model(gm2,type="pred")
plot_grid(c(pp1,pp2))
I have the following data, which I'm trying to model via GLM, using Gamma function. It works, except that abline won't show any line. What am I doing wrong?
y <- c(0.00904977380111,0.009174311972687,0.022573363475789,0.081632653008122,0.005571030584803,1e-04,0.02375296916921,0.004962779106823,0.013729977117333,0.00904977380111,0.004514672640982,0.016528925619835,1e-04,0.027855153258277,0.011834319585449,0.024999999936719,1e-04,0.026809651528869,0.016348773841071,1e-04,0.009345794439034,0.00457665899303,0.004705882305772,0.023201856194357,1e-04,0.033734939711656,0.014251781472007,0.004662004755245,0.009259259166667,0.056872037917387,0.018518518611111,0.014598540145986,0.009478673032951,0.023529411811211,0.004819277060357,0.018691588737881,0.018957345923721,0.005390835525461,0.056179775223141,0.016348773841071,0.01104972381185,0.010928961639344,1e-04,1e-04,0.010869565271444,0.011363636420778,0.016085790883856,0.016,0.005665722322786,0.01117318441372,0.028818443860841,1e-04,0.022988505862069,0.01010101,1e-04,0.018083182676638,0.00904977380111,0.00961538466323,0.005390835525461,0.005763688703004,1e-04,0.005571030584803,1e-04,0.014388489208633,0.005633802760722,0.005633802760722,1e-04,0.005361930241431,0.005698005811966,0.013986013986014,1e-04,1e-04)
x <- c(600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,744.47,744.47,744.47,744.47,744.47,744.47,744.47,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42)
hist(y,breaks=15)
plot(y~x)
fit <- glm(y~x,family='Gamma'(link='log'))
abline(fit)
abline plots linear functions, from a simple linear regression, say. A GLM with a Gamma family and a log link is nonlinear on the original scale. To visualize the fit of such a model, you could use predict (an example is given below). Several packages (e.g. effects or visreg) for R exist that feature functions that allow you to directly plot the fit on the original scale including confidence intervals.
Here is an example using visreg using your data and model:
library(visreg)
y <- c(0.00904977380111,0.009174311972687,0.022573363475789,0.081632653008122,0.005571030584803,1e-04,0.02375296916921,0.004962779106823,0.013729977117333,0.00904977380111,0.004514672640982,0.016528925619835,1e-04,0.027855153258277,0.011834319585449,0.024999999936719,1e-04,0.026809651528869,0.016348773841071,1e-04,0.009345794439034,0.00457665899303,0.004705882305772,0.023201856194357,1e-04,0.033734939711656,0.014251781472007,0.004662004755245,0.009259259166667,0.056872037917387,0.018518518611111,0.014598540145986,0.009478673032951,0.023529411811211,0.004819277060357,0.018691588737881,0.018957345923721,0.005390835525461,0.056179775223141,0.016348773841071,0.01104972381185,0.010928961639344,1e-04,1e-04,0.010869565271444,0.011363636420778,0.016085790883856,0.016,0.005665722322786,0.01117318441372,0.028818443860841,1e-04,0.022988505862069,0.01010101,1e-04,0.018083182676638,0.00904977380111,0.00961538466323,0.005390835525461,0.005763688703004,1e-04,0.005571030584803,1e-04,0.014388489208633,0.005633802760722,0.005633802760722,1e-04,0.005361930241431,0.005698005811966,0.013986013986014,1e-04,1e-04)
x <- c(600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,744.47,744.47,744.47,744.47,744.47,744.47,744.47,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42)
fit <- glm(y~x,family='Gamma'(link='log'))
visreg(fit, scale = "response")
An here is the example using R base graphics and predict:
pred_frame <- data.frame(
x = seq(min(x), max(x), length.out = 1000)
)
pred_frame$fit <- predict(fit, newdata = pred_frame, type = "response")
plot(y~x, pch = 16, las = 1, cex = 1.5)
lines(fit~x, data = pred_frame, col = "steelblue", lwd = 3)
You are not being consistent here since you chose to model on the log scale but you are plotting on the raw scale. Mind you many, many published plots do the same. You need to plot the points in log space or transform the coefficients and pass them to abline() explicitly.
I am running into an error when I try to compute the predicted values from a model averaged object using the MuMIn package's predict.averaging. I have been assured that when the full argument is set to FALSE the function should return predicted values based off the conditional average coefficients. However, it returns an error. See example below using the cars dataset. It is very similar to my actual set up.
library(MuMIn)
options(na.action = "na.fail")
global.model <- glm(mpg ~ hp + drat + wt,
data=mtcars)
dr <- dredge(global.model)
mod.avg <- model.avg(dr, subset = delta < 2, fit = T)
summary(mod.avg)
predict(mod.avg, se.fit = TRUE, full = FALSE)
The error indicates that full is ignored, meaning that the full model coefficients are used for the predicted values (not what I want). I have confirmed this by some simple manual checking of values. It is also evident my examining predict() output. Notice how the values jump, suggesting that a coefficient is set to zero or something. It has also been suggested that changing glm to lm will fix the issue but it does not, at least for me.
Thanks!
Comparing predictions from the component models to the averaged ones you can see that the "full averaged" predictions fall within the component predictions (which is as it should be).
On the other hand, the "subset averaged" coefficients produce predictions that are quite biased. This is because the effects are augmented due to ignoring the zero coefficients when calculating the mean.
# Full/subset averaged predictions
pyfa <- predict(mod.avg, full = TRUE)
pysa <- predict(mod.avg, full = FALSE)
# Note: full=FALSE works only with se.fit=FALSE
# Predictions from component models
pycm <- do.call("cbind", lapply(get.models(mod.avg, TRUE), predict))
n <- ncol(pycm)
k <- rep(1:3, c(n, 1, 1))
lty <- c(2,1,1); lwd <- c(1,2,2); col <- c(3,1,2)
matplot(cbind(pycm, pyfa, pysa), type = "l",
lty = lty[k], lwd = lwd[k], col = col[k],
ylab = "predicted")
legend("topleft", legend = c("component", "full average", "subset average"),
lty = lty, lwd = lwd, col = col)