Add raw data points to jp.int (sjPlot) - r

For my manuscript, I plotted a lme with an interaction of two continuous variables:
Create data
mydata <- data.frame( SID=sample(1:150,400,replace=TRUE),age=sample(50:70,400,replace=TRUE), sex=sample(c("Male","Female"),200, replace=TRUE),time= seq(0.7, 6.2, length.out=400), Vol =rnorm(400),HCD =rnorm(400))
mydata$time <- as.numeric(mydata$time)
Run the model:
model <- lme(HCD ~ age*time+sex*time+Vol*time, random=~time|SID, data=mydata)
Make plot:
sjp.int(model, swap.pred=T, show.ci=T, mdrt.values="meansd")
The reviewer now wants me to add the raw data points to this plot. How can I do this? I tried adding geom_point() referring to mydata, but that is not possible.
Any ideas?
Update:
I thought that maybe I could extract the random slope of HCD and then residuals HCD for the covariates and also residuals Vol for the covariates and plot those two to make things easier (then I could plot the points in a 2D plot).
So, I tried to extract the slopes and use these to fit a linear regression, but the results are different (in the reproducible example less significant, but in my data: the interaction became non-significant (and was significant in the lme)). Not sure what that means or whether this just shows that I should not try to plot it this way.
get the slopes:
model <- lme(HCD ~ time, random=~time|SID, data=mydata)
slopes <- rbind(row.names(model$coefficients$random$SID), model$coef$random$SID[,2])
slopes2 <- data.frame(matrix(unlist(slopes), nrow=144, byrow=T))
names(slopes2)[1] <- "SID"
names(slopes2)[2] <- "slopes"
(save the slopes2 and reopen, because somehow R sees it as a factor)
Then create a cross-sectional dataframe and merge the slopes:
mydata$time2 <- round(mydata$time)
new <- reshape(mydata,idvar = "SID", timevar="time2", direction="wide")
newdata <- dplyr::left_join(new, slop, by="SID")
The lm:
modelw <- lm(slop$slopes ~ age.1+sex.1+Vol.1, data=newdata)
Vol now has a p-value of 0.8 (previously this was 0.14)

Related

Plotting the predictions of a mixed model as a line in R

I'm trying to plot the predictions (predict()) of my mixed model below such that I can obtain my conceptually desired plot as a line below.
I have tried to plot my model's predictions, but I don't achieve my desired plot. Is there a better way to define predict() so I can achieve my desired plot?
library(lme4)
dat3 <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/dat3.csv')
m4 <- lmer(math~pc1+pc2+discon+(pc1+pc2+discon|id), data=dat3)
newdata <- with(dat3, expand.grid(pc1=unique(pc1), pc2=unique(pc2), discon=unique(discon)))
y <- predict(m4, newdata=newdata, re.form=NA)
plot(newdata$pc1+newdata$pc2, y)
More sjPlot. See the parameter grid to wrap several predictors in one plot.
library(lme4)
library(sjPlot)
library(patchwork)
dat3 <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/dat3.csv')
m4 <- lmer(math~pc1+pc2+discon+(pc1+pc2+discon|id), data=dat3) # Does not converge
m4 <- lmer(math~pc1+pc2+discon+(1|id), data=dat3) # Converges
# To remove discon
a <- plot_model(m4,type = 'pred')[[1]]
b <- plot_model(m4,type = 'pred',title = '')[[2]]
a + b
Edit 1: I had some trouble removing the dropcon term within the sjPlot framework. I gave up and fell back on patchwork. I'm sure Daniel could knows the correct way.
As Magnus Nordmo suggest, this is very simple with sjPlot which has some predefined functions for these types of plot.
library(lme4)
dat3 <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/dat3.csv')
m4 <- lmer(math~pc1+pc2+discon+(pc1+pc2+discon|id), data=dat3)
plot_model(m4, type = 'pred', terms = c('pc1', 'pc2'),
ci.lvl = 0)
which gives the following result.
This plot is designed to include different quantiles of the second term in terms over the axes of pc1 and pred. You could split up these plots and combine them using patchwork and the interval can be changed by using square brackets after the term in terms (eg pc1 [-10:1] for interval between -10 and 1).

creating a composite variable in piecewiseSEM for structural equation modelling in R

I am trying to create a composite variable to use in an SEM using piecewiseSEM. I've read this book but I still have two questions that I was hoping someone could help me with:
#create dummy data where Z is my response variable and X and Y are the indicators I want as a composite variable
set.seed(111)
dat <- data.frame(x = runif(100), group = rep(letters[1:2], each = 50))
dat$y <- dat$x + runif(100)
dat$z <- dat$y + runif(100)
#run a very simple model
model1<-lm(z~x+y,data=dat)
#run a model that fits data better than model1
model2<-lme(z~x+y,random=(1|group)
#check coefficients
summary(model1)
when extracting the coefficients of each indicator, do I use a full mixed effect model or a simple quick and dirty model (model1). in otherwords, should I go through the model selection process as if it were my final regression? (model2).
from the summary of the model, if an indicator is not significant, do I remove it or still use it? for example, if x would not have had a significant effect on z, does this mean it should not be included in the composite variable?
thanks in advance!

Predict Future values using polynomial regression in R

Was trying to predict the future value of a sample using polynomial regression in R. The y values within the sample forms a wave pattern.
For example
x = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
y= 1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4
But when the graph is plotted for future values the resultant y values was completely different from what was expected. Instead of a wave pattern, was getting a graph where the y values keep increasing.
futurY = 17,18,19,20,21,22
Tried different degrees of polynomial regression, but the predicted results for futurY were drastically different from what was expected
Following is the sample R code which was used to get the results
dfram <- data.frame('x'=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16))
dfram$y <- c(1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4)
plot(dfram,dfram$y,type="l", lwd=3)
pred <- data.frame('x'=c(17,18,19,20,21,22))
myFit <- lm(y ~ poly(x,5), data=dfram)
newdata <- predict(myFit, pred)
print(newdata)
plot(pred[,1],data.frame(newdata)[,1],type="l",col="red", lwd=3)
Is this the correct technique to be used for predicting the unknown future y values OR should I be using other techniques like forecasting?
# Reproducing your data frame
dfram <- data.frame("x" = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),
"y" = c(1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4))
From your graph I've got the phase and period of the signal. There're better ways of calculating that automatically.
# Phase and period
fase = 1
per = 10
In the linear model function I've put the triangular signal equations.
fit <- lm(y ~ I((((trunc((x-fase)/(per/2))%%2)*2)-1) * (x-fase)%%(per/2))
+ I((((trunc((x-fase)/(per/2))%%2)*2)-1) * ((per/2)-((x-fase)%%(per/2))))
,data=dfram)
# Predict the old data
p_olddata <- predict(fit,type="response")
# Predict the new data
newdata <- data.frame('x'=c(17,18,19,20,21,22))
p_newdata <- predict(fit,newdata,type="response")
# Ploting Old and new data
plot(x=c(dfram$x,newdata$x),
y=c(p_olddata,p_newdata),
col=c(rep("blue",length(p_olddata)),rep("green",length(p_olddata))),
xlab="x",
ylab="y")
lines(dfram)
Where the black line is the original signal, the blue circles are the prediction for the original points and the green circles are the prediction for the new data.
The graph shows a perfect fit for the model because there's no noise in the data. In a real dataset you may find it so the fit will not look as nice as that.

Overlay 2 allEffects graphs

I have the following model
require(effects)
fit<-lme(x ~ y, data, random= ~1|item)
plot(allEffects(fit)
fit2<-lme(x ~ y, data2, random = ~1|item)
plot(allEffects(fit2)
How can I plot fit and fit2 overlaying? I have tried the par(new=T), but it does not work. The graphs plot fine individually.
I'm not sure there's a very nice way to do this. I usually extract the information from the effects structure and plot it with ggplot (lattice would be possible too).
Here's an example:
library(effects)
library(nlme)
library(plyr) ## utilities
Fit a model to the first and second half of one of the standard example data sets:
fm1 <- lme(distance ~ age, random = ~1|Subject,
data = Orthodont[1:54,])
fm2 <- update(fm1, data = Orthodont[55:108,])
a1 <- allEffects(fm1)
a2 <- allEffects(fm2)
Extract the information from the efflist object. This is the part that isn't completely general ... the hard part is getting out the predictor variable.
as.data.frame.efflist <- function(x) {
ldply(x,
function(z) {
r <- with(z,data.frame(fit,
var=variables[[1]]$levels,
lower,upper))
return(plyr::rename(r,setNames(z$variables[[1]]$name,"var")))
})
}
For convenience, use ldply to put the results of both models together:
comb <- ldply(list(fm1=a1,fm2=a2),as.data.frame,.id="model")
Now plot:
library(ggplot2); theme_set(theme_bw())
ggplot(comb,aes(age,fit,
ymin=lower,ymax=upper,
colour=model,fill=model))+
geom_line()+
geom_ribbon(alpha=0.2,colour=NA)+
geom_rug(sides="b")
The rug plot component is a little silly here.

Plot each predictor variable from multivariate GLM versus response (other predictors held constant)

I can plot one predictor variable (from a mulitvariate logistic, binomial GLM) versus the predicted response. I do it like this:
m3 <- mtcars # example with mtcars
model = glm(vs~cyl+mpg+wt+disp+drat,family=binomial, data=m3)
newdata <- m3
newdata$cyl <- mean(m3$cyl)
newdata$mpg <- mean(m3$mpg)
newdata$wt <- mean(m3$wt)
newdata$disp <- mean(m3$disp)
newdata$drat <- m3$drat
newdata$vs <- predict(model, newdata = newdata, type = "response")
ggplot(newdata, aes(x = drat, y = vs)) + geom_line()
Above, drat vs vs with all other predictors held constant. However, I would to do this for each of the predictor variables, and doing the above process each time seems tedious. Is there a smarter way to do this? I'd like to visualize the response of each the different predictors and eventually, perhaps, at different constants.
Check the response.plot2 function in the biomod2 package. It was developed to create response curves for species distribution models but it essentially does what you need- it generates a multi pannel plot with responses for each variable used in your model. It also outputs the data into a data structure that can then be used to plot in whichever way you like.

Resources