I am currently running a mixed effects model using lmer in which random slopes and correlated random intercepts are estimated. After fitting the model I would like to plot the result allowing from random slopes and intercepts as well as one overall fixed line. How I currently implement is this way:
library(lmer)
library(sjPlot)
df <- read_csv("anonymized_test.csv")
m1 <- lmer("DV ~ IV + (1 + IV| iso)", df)
plot_model(m1, ,type="pred",
terms=c("IV","iso"),
pred.type="re", ci.lvl = NA)
This is the result:
Which is not what is expected as we would expect some negative and positive slopes in addition to the random intercepts according to the extracted random effects of the model
The problem is that sjPlot seems to only plot the random intercepts. Looking at an older vignette of sjPlot this seems to have been implemented in a deprecated function (see here ). The question is how do I get this functionality back? Thanks for any insight.
This is actually straightforward, even without the sjPlot package. We may extract fixef and ranef as fe and re and combine them in a plot. Both, fe and re have intercept and slope and get added together.
library(lme4)
fm1 <- lmer("Reaction ~ Days + (Days | Subject)", sleepstudy)
fe <- fixef(fm1)
re <- ranef(fm1)$Subject
clr <- rainbow(nrow(re)) ## define n colors
par(mfrow=c(1, 2))
plot(Reaction ~ Days, sleepstudy, col=clr[as.numeric(Subject)], main='Pred w/ points')
lapply(seq_len(nrow(re)), \(x) abline(fe[1] + re[x, 1], fe[2] + re[x, 2], col=clr[x]))
plot(Reaction ~ Days, sleepstudy, col=clr[as.numeric(Subject)], main='Pred w/o points', type='n')
lapply(seq_len(nrow(re)), \(x) abline(fe[1] + re[x, 1], fe[2] + re[x, 2], col=clr[x]))
However, I also get the random slopes using sjPlot. Not sure what went wrong, maybe you are using outdated software?
sjPlot::plot_model(fm1, type="pred", terms=c("Days","Subject"), pred.type="re", ci.lvl=NA)
# Warning message:
# In RColorBrewer::brewer.pal(n, pal) :
# n too large, allowed maximum for palette Set1 is 9
# Returning the palette you asked for with that many colors
Related
I am new to fitting gamm models and ran into two problems with my analysis.
I ran the same model using the gam and the bam function of the package mgcv. The models give me different estimates, and I don't understand why and how to choose which function to use. Can anyone explain to me why these functions give different estimates?
I am estimating a model including an interaction between age and condition (binomial factor with 2 conditions). For some reason one of the interaction terms (age:conditioncomputer or age:conditioncozmo) looks weird. It always gives a EDF and chi square of 0 and a p-value of 0.5, as if it was fixed to that. I tried using sum-to-zero and dummy contrasts, but that did not change the output. What is weird to me that there is a significant age effect, but this effect is not significant in neither condition. So I have the strong feeling that something is going wrong here.
Did anyone ever run into this before and can help me figure out if this is even a problem or normal, and how to solve it if it is a problem?
My model syntax is the following:
`bam(reciprocity ~ s(age,k=8) + condition + s(age, by = condition, k=8) + s(ID, bs="re") + s(class, bs="re") + s(school, bs="re"), data=df, family=binomial(link="logit"))`
This is the model output:
My df looks somewhat like this:
In short, I've used below code:
library(tidyverse)
library(psych)
library(mgcv)
library(ggplot2)
library(cowplot)
library(patchwork)
library(rstatix)
library(car)
library(yarrr)
library(itsadug)
df <- read.csv("/Users/lucaleisten/Desktop/Private/Master/Major_project/Data/test/test.csv", sep=",")
df$ID <- as.factor(as.character(df$ID))
df$condition <- as.factor(df$condition)
df$school <- as.factor(df$school)
df$class <- as.factor(df$class)
df$reciprocity <- as.factor(as.character(df$reciprocity))
summary(df)
model_reciprocity <- bam(reciprocity ~ s(age,k=7) +condition + s(age, by = condition, k=7) + s(ID, bs="re") + s(class, bs="re") + s(school, bs="re"), data=df, family=binomial(link="logit"))
summary(model_reciprocity)
I want to estimate a fixed effects model while using panel-corrected standard errors as well as Prais-Winsten (AR1) transformation in order to solve panel heteroscedasticity, contemporaneous spatial correlation and autocorrelation.
I have time-series cross-section data and want to perform regression analysis. I was able to estimate a fixed effects model, panel corrected standard errors and Prais-winsten estimates individually. And I was able to include panel corrected standard errors in a fixed effects model. But I want them all at once.
# Basic ols model
ols1 <- lm(y ~ x1 + x2, data = data)
summary(ols1)
# Fixed effects model
library('plm')
plm1 <- plm(y ~ x1 + x2, data = data, model = 'within')
summary(plm1)
# Panel Corrected Standard Errors
library(pcse)
lm.pcse1 <- pcse(ols1, groupN = Country, groupT = Time)
summary(lm.pcse1)
# Prais-Winsten estimates
library(prais)
prais1 <- prais_winsten(y ~ x1 + x2, data = data)
summary(prais1)
# Combination of Fixed effects and Panel Corrected Standard Errors
ols.fe <- lm(y ~ x1 + x2 + factor(Country) - 1, data = data)
pcse.fe <- pcse(ols.fe, groupN = Country, groupT = Time)
summary(pcse.fe)
In the Stata command: xtpcse it is possible to include both panel corrected standard errors and Prais-Winsten corrected estimates, with something allong the following code:
xtpcse y x x x i.cc, c(ar1)
I would like to achieve this in R as well.
I am not sure that my answer will completely address your concern, these days I've been trying to deal with the same problem that you mention.
In my case, I ran the Prais-Winsten function from the package prais where I included my model with the fixed effects. Afterwards, I correct for heteroskedasticity using the function vcovHC.prais which is analogous to vcovHC function from the package sandwich.
This basically will give you White's/sandwich heteroskedasticity-consistent covariance matrix which, if you later fit into the function coeftest from the package lmtest, it will give you the table output with the corrected standard errors. Taking your posted example, see below the code that I have used:
# Prais-Winsten estimates with Fixed Effects
library(prais)
prais.fe <- prais_winsten(y ~ x1 + x2 + factor(Country), data = data)
library(lmtest)
prais.fe.w <- coeftest(prais.fe, vcov = vcovHC.prais(prais.fe, "HC1")
h.m1 # run the object to see the output with the corrected standard errors.
Alas, I am aware that the sandwhich heteroskedasticity-consistent standard errors are not exactly the same as the Beck and Katz's PCSEs because PCSE deals with panel heteroskedasticity while sandwhich SEs addresses overall heteroskedasticity. I am not totally sure in how much these two differ in practice, but something is something.
I hope my answer was somehow helpful, this is actually my very first answer :D
I am a complete newbie to R.
I have the following logit equation I am estimating:
allAM <- glm (AM ~ VS + Prom + LS_Exp + Sex + Age + Age2 + Jpart + X2004LS + X2009LS + X2014LS + factor(State), family = binomial(link = "logit"), data = mydata)
AM is a standard binary (happened/didn’t happen). The three “X****LS” variables are dummies indicating different sessions of congress and “factor(State)” is used to generate fixed effects/dummies for each state.
VS is the key independent variable of interest and I want to generate the predicated probability that AM=1 for each value of VS between 0 and 60, holding everything else at its mean.
I am running into trouble, however, generating and plotting the predicted probabilities because “State” is a factor. I want to be able to show the average effects, not 50 different charts/effects for each state.
Per (Hanmer and Kalkan 2013) http://onlinelibrary.wiley.com/doi/10.1111/j.1540-5907.2012.00602.x/abstract I was advised to do the following to plot the predicted probabilities:
pred.seq <- seq(from=0, to=60, by=0.01)
pred.out <- c()
for(i in 1:length(pred.seq)){
mydata.c <- mydata
mydata.c$VS <- pred.seq[i]
pred.out[i] <- mean(predict(allAM, newdata=mydata.c, type="response"))
}
plot(pred.out ~ pred.seq, type="l")
This approach seems to work, though I don’t really understand it.
I want to add the upper and lower 95% confidence intervals to the plot, but when I attempt to do it by hand the way I know how:
lower <- pred.out$fit - (1.96*pred.out$se.fit)
upper <- pred.out$fit + (1.96*pred.out$se.fit)
I get the following error:
Error in pred.outfit:fit: operator is invalid for atomic vectors
Can anyone advise how I can plot the confidence intervals and how I can specify different levels of VS so that I can report some specific predicted probabilities?
Is there a way how I can extract coefficients of globally fitted terms in local regression modeling?
Maybe I do misunderstand the role of globally fitted terms in the function loess, but what I would like to have is the following:
# baseline:
x <- sin(seq(0.2,0.6,length.out=100)*pi)
# noise:
x_noise <- rnorm(length(x),0,0.1)
# known structure:
x_1 <- sin(seq(5,20,length.out=100))
# signal:
y <- x + x_1*0.25 + x_noise
# fit loess model:
x_seq <- seq_along(x)
mod <- loess(y ~ x_seq + x_1,parametric="x_1")
The fit is done perfectly, however, how can I extract the estimated value of the globally fitted term x_1 (i.e. some value near 0.25 for the example above)?
Finally, I found a solution to my problem using the function gam from the package gam:
require(gam)
mod2 <- gam(y ~ lo(x_seq,span=0.75,degree=2) + x_1)
However, the fits from the two models are not exactly the same (which might be due to different control settings?)...
I have been using e1071::svm(...,probability=TRUE) in R to fit a binary SVM classifier and then using predict.svm() to get the probabilities for both the training sample and a test sample. When I converted the probabilities to log(odds) and plotted them against the decision.values, I found there was a discontinuity in the predictions:
Plot of log(odds) = log(prob/(1-prob)) vs. Decision Values
This is happening with other models as well, whenever the probability is below about 0.25%; there is consistently a gap from log(odds)= -5.98 to -10.86. Note that this does not occur at a fixed decision.value (which varies with the model). I believe it may also be happening at high probabilities (>99%) as well.
The red and green lines are linear fits for the predictions with log(odds)<-8 and >-8, respectively. The coefficients of the latter agree with the probA and probB outputs returned with the svm object. I have seen other cases where the gap occurs from from +5.98 to +10.86 (only).
Here is the example using the iris dataset:
require("datasets")
require("e1071")
iris$is.setosa <- as.numeric(iris$Species=="setosa")
set.seed(8675309)
fit <- svm(
is.setosa ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
data=iris,probability=T,cost=0.01,kernel="linear",type="C-classification")
preds <- predict(fit,prob=TRUE,newdata=iris,decision=T)
DVs <- attr(preds,"decision.values")[,1]
probs <- attr(preds,"probabilities")[,"1"]
logodds <- log(probs/(1-probs))
plot(DVs,logodds,xlab="decision.values",ylab="log(odds)",main="IRIS dataset")
cat("Coefficents of probability model reported by svm():\n")
print(fit[c("probA","probB")])
fit <- lm(logodds ~ DVs,subset=which(logodds> -8))
cat("fit of logodds ~ DVs when log(odds) greater than -8:\n")
print(summary(fit))
abline(fit,col="green",lty=3)
fit <- lm(logodds ~ DVs,subset=which(logodds< -8))
abline(fit,col="red",lty=3)
Has anyone else seen this behavior? Any idea what might be causing it? Thanks!