Complex coefficient plot in R - r

I am trying to create a coefficient plot for regression results from two survey experiments. Both survey experiments asked the same questions (variables are identical) in two different countries. I am then running 2x2 models with only the treatment (independent variable with 3 levels) and control variables remaining constant.
Here is sample code with robust standard errors:
library(estimatr)
set.seed(124)
dv1 <- sample(0:1, 25, replace = TRUE)
dv2 <- sample(0:5, 25, replace = TRUE)
treatment_lvl <- sample(0:3, 25, replace = TRUE)
treatment_lvl <- factor(treatment_lvl, labels = c("baseline", "treatment1", "treatment2", "treatment3"))
control1 <- sample(2:10, 25, replace = TRUE)
control2 <- sample(0:3, 25, replace = TRUE)
df_country1 <- data.frame(dv1, dv2, treatment_lvl, control1, control2)
set.seed(200)
dv1 <- sample(0:1, 25, replace = TRUE)
dv2 <- sample(0:5, 25, replace = TRUE)
treatment_lvl <- sample(0:3, 25, replace = TRUE)
treatment_lvl <- factor(treatment_lvl, labels = c("baseline", "treatment1", "treatment2", "treatment3"))
control1 <- sample(2:10, 25, replace = TRUE)
control2 <- sample(0:3, 25, replace = TRUE)
df_country2 <- data.frame(dv1, dv2, treatment_lvl, control1, control2)
model1 <- lm(dv1 ~ treatment_lvl + control1 + control2, data=df_country1)
model2 <- lm(dv1 ~ treatment_lvl + control1 + control2, data=df_country2)
model3 <- lm(dv2 ~ treatment_lvl + control1 + control2, data=df_country1)
model4 <- lm(dv2 ~ treatment_lvl + control1 + control2, data=df_country2)
model1_robust <- commarobust(model1, se_type = "stata")
model2_robust <- commarobust(model2, se_type = "stata")
model3_robust <- commarobust(model3, se_type = "stata")
model4_robust <- commarobust(model4, se_type = "stata")
I now want to visualize the regressions (model1-4) in the following way:
y-axis: Dependent variable 1, dependent variable 2
Three plots/facets next to each other: One per treatment level (1-3)
For each dependent variable and treatment level one coefficient plot for country1 and one for country 2, stacked above each other
Control variables should not be shown
Something similar to the picture below, but with two instead of three countries. Then on the y-axis it would show the dependent variable of model 1+2; and at the second tick the dv of model 3+4. The other ticks would not be necessary. Instead of "Average treatment effect" it would be "Treatment 1" and then next to it the same for "Treatment 2" and "Treatment 3".

Related

How can I run a stratified glm.nb on a mids object in R?

I have a mids object output from MICE that I need to run a series of stratified negative binomial models (glm.nb()) on. The stratification variable ("pressure") is itself imputed. The below code subsets the data within the glm.nb() function.
Is this the correct approach to run the stratified models?
library("mice") # For MI
library("MASS") # For glm.nb()
set.seed(09212020)
df <- data.frame("ethnicity" = sample(1:4, 50, replace = TRUE) ,
"education" = sample(1:4, 50, replace = TRUE) ,
"age" = sample(50:90, 50, replace = TRUE) ,
"pressure" = sample(0:12, 50, replace = TRUE),
"outcome_rate" = sample(0:6, 50, replace = TRUE),
"exposure_quart" = sample(1:4, 50, replace=TRUE,
prob=c(0.1, 0.2, 0.65, 0.05)))
# Insert NAs for MICE
df <- as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.85, 0.15),
size = length(cc), replace = TRUE) ]))
# Run MICE
imp <- mice(data = df, m = 2, maxit = 1)
# Attempt to run stratified model on the mids object
# This is the portion I'm seeking to have confirmed/updated
m1 <- with(imp, (glm.nb(outcome_rate ~ exposure_quart + ethnicity + education + age + pressure,
subset = imp$imp$pressure < 9)))
m1

R GLM: Modify coefficients of an existing glm model

I have been trying to adjust the coefficients of an existing glm model but the predictions don't seem to change. The idea is to enhance an existing logistic model by incorporating 'qualitative' parameters in the quantitative coefficients (see 'adj model' block). I replicated the problem below.
I really appreciate any. Thank you!
set.seed(100)
#create sim data (correlated)
input_size <- 200
scale <- 10000
y_var = sample(0:1, input_size, replace = TRUE)
input_data <- cbind.data.frame(y_var, x1 = sample(1:1000, input_size, replace = TRUE) + (y_var*200), x2 = sample(1:50, input_size, replace = TRUE) - (y_var*30))
cor(input_data)
#build log-reg model
reg1 <- glm(input_data$y ~ input_data$x1 + input_data$x2, data = input_data, family = "binomial")
reg1$coefficients
#test log-reg model
input_test <- cbind.data.frame(x1 = sample(1:1000, input_size, replace = TRUE) + (y_var*400), x2 = sample(1:50, input_size, replace = TRUE) - (y_var*10))
y_predict <- predict(reg1, input_test, type="response")
#adjust log-reg model
adj_coeff <- round(c(intercept = reg1$coefficients[1], x1 = reg1$coefficients[2] * 3, x2 = -reg1$coefficients[3] * 0.5), 4)
reg2 <- reg1
reg2$coefficients <- as.numeric(adj_coeff)
reg2$coefficients
#visualize predication of the log-reg models
y2_predict <- predict(reg1, input_test, type="response")
plot(y_predict, type = "p", lwd = 2)
lines(y2_predict, type = "p", pch = 3, col = "orange")

Using `ordinal::clmm` model to make predictions on new data

I have some repeated measures, ordinal response data:
dat <- data.frame(
id = factor(sample(letters[1:5], 50, replace = T)),
response = factor(sample(1:7, 50, replace = T), ordered = T),
x1 = runif(n = 50, min = 1, max = 10),
x2 = runif(n = 50, min = 100, max = 1000)
)
I have built the following model:
library(ordinal)
model <- clmm(response ~ x1 + x2 + (1|id), data = dat)
I have some new data:
new_dat <- data.frame(
id = factor(sample(letters[1:5], 5, replace = T)),
x1 = runif(n = 5, min = 1, max = 10),
x2 = runif(n = 5, min = 100, max = 1000)
)
I want to be able to use the model to predict the probability of each level of dat$response occurring for new_dat, whilst still also accounting for id.
Unfortunately predict() does not work for clmm objects. predict() does work for clmm2 objects but it ignores any random effects included.
What I want to achieve is something similar to what has been done in Figure 3 of the following using this code:
library(ordinal)
fm2 <- clmm2(rating ~ temp + contact, random=judge, data=wine, Hess=TRUE, nAGQ=10)
pred <- function(eta, theta, cat = 1:(length(theta)+1), inv.link = plogis){
Theta <- c(-1e3, theta, 1e3)
sapply(cat, function(j)
inv.link(Theta[j+1] - eta) - inv.link(Theta[j] - eta))
}
mat <- expand.grid(judge = qnorm(0.95) * c(-1, 0, 1) * fm2$stDev,
contact = c(0, fm2$beta[2]),
temp = c(0, fm2$beta[1]))
pred.mat <- pred(eta=rowSums(mat), theta=fm2$Theta)
lab <- paste("contact=", rep(levels(wine$contact), 2), ", ", "temp=", rep(levels(wine$temp), each=2), sep="")
par(mfrow=c(2, 2))
for(k in c(1, 4, 7, 10)) {
plot(1:5, pred.mat[k,], lty=2, type = "l", ylim=c(0,1),
xlab="Bitterness rating scale", axes=FALSE,
ylab="Probability", main=lab[ceiling(k/3)], las=1)
axis(1); axis(2)
lines(1:5, pred.mat[k+1, ], lty=1)
lines(1:5, pred.mat[k+2, ], lty=3)
legend("topright",
c("avg. judge", "5th %-tile judge", "95th %-tile judge"),
lty=1:3, bty="n")
}
Except, my model contains multiple continuous covariates (as opposed to binary factors).
How can I use the model data to predict the probability of each level of dat$response occurring for new_dat, whilst still also accounting for id?
Many thanks.

Adapting the meansd moderator option in sjPlot interaction

I am using sjPlot, the sjp.int function, to plot an interaction of an lme.
The options for the moderator values are means +/- sd, quartiles, all, max/min. Is there a way to plot the mean +/- 2sd?
Typically it would be like this:
model <- lme(outcome ~ var1+var2*time, random=~1|ID, data=mydata, na.action="na.omit")
sjp.int(model, show.ci=T, mdrt.values="meansd")
Many thanks
Reproducible example:
#create data
mydata <- data.frame( SID=sample(1:150,400,replace=TRUE),age=sample(50:70,400,replace=TRUE), sex=sample(c("Male","Female"),200, replace=TRUE),time= seq(0.7, 6.2, length.out=400), Vol =rnorm(400),HCD =rnorm(400))
mydata$time <- as.numeric(mydata$time)
#insert random NAs
NAins <- NAinsert <- function(df, prop = .1){
n <- nrow(df)
m <- ncol(df)
num.to.na <- ceiling(prop*n*m)
id <- sample(0:(m*n-1), num.to.na, replace = FALSE)
rows <- id %/% m + 1
cols <- id %% m + 1
sapply(seq(num.to.na), function(x){
df[rows[x], cols[x]] <<- NA
}
)
return(df)
}
mydata2 <- NAins(mydata,0.1)
#run the lme which gives error message
model = lme(Vol ~ age+sex*time+time* HCD, random=~time|SID,na.action="na.omit",data=mydata2);summary(model)
mydf <- ggpredict(model, terms=c("time","HCD [-2.5, -0.5, 2.0]"))
#lmer works
model2 = lmer(Vol ~ age+sex*time+time* HCD+(time|SID),control=lmerControl(check.nobs.vs.nlev = "ignore",check.nobs.vs.rankZ = "ignore", check.nobs.vs.nRE="ignore"), na.action="na.omit",data=mydata2);summary(model)
mydf <- ggpredict(model2, terms=c("time","HCD [-2.5, -0.5, 2.0]"))
#plotting gives problems (jittered lines)
plot(mydf)
With sjPlot, it's currently not possible. However, I have written a package especially dedicated to compute and plot marginal effects: ggeffects. This package is a bit more flexible (for marginal effects plots).
In the ggeffects-package, there's a ggpredict()-function, where you can compute marginal effects at specific values. Once you know the sd of your model term in question, you can specify these values in the function call to plot your interaction:
library(ggeffects)
# plot interaction for time and var2, for values
# 10, 30 and 50 of var2
mydf <- ggpredict(model, terms = c("time", "var2 [10,30,50]"))
plot(mydf)
There are some examples in the package-vignette, see especially this section.
Edit
Here are the results, based on your reproducible example (note that GitHub-Version is currently required!):
# requires at least the GitHub-Versiob 0.1.0.9000!
library(ggeffects)
library(nlme)
library(lme4)
library(glmmTMB)
#create data
mydata <-
data.frame(
SID = sample(1:150, 400, replace = TRUE),
age = sample(50:70, 400, replace = TRUE),
sex = sample(c("Male", "Female"), 200, replace = TRUE),
time = seq(0.7, 6.2, length.out = 400),
Vol = rnorm(400),
HCD = rnorm(400)
)
mydata$time <- as.numeric(mydata$time)
#insert random NAs
NAins <- NAinsert <- function(df, prop = .1) {
n <- nrow(df)
m <- ncol(df)
num.to.na <- ceiling(prop * n * m)
id <- sample(0:(m * n - 1), num.to.na, replace = FALSE)
rows <- id %/% m + 1
cols <- id %% m + 1
sapply(seq(num.to.na), function(x) {
df[rows[x], cols[x]] <<- NA
})
return(df)
}
mydata2 <- NAins(mydata, 0.1)
# run the lme, works now
model = lme(
Vol ~ age + sex * time + time * HCD,
random = ~ time |
SID,
na.action = "na.omit",
data = mydata2
)
summary(model)
mydf <- ggpredict(model, terms = c("time", "HCD [-2.5, -0.5, 2.0]"))
plot(mydf)
lme-plot
# lmer also works
model2 <- lmer(
Vol ~ age + sex * time + time * HCD + (time |
SID),
control = lmerControl(
check.nobs.vs.nlev = "ignore",
check.nobs.vs.rankZ = "ignore",
check.nobs.vs.nRE = "ignore"
),
na.action = "na.omit",
data = mydata2
)
summary(model)
mydf <- ggpredict(model2, terms = c("time", "HCD [-2.5, -0.5, 2.0]"), ci.lvl = NA)
# plotting works, but only w/o CI
plot(mydf)
lmer-plot
# lmer also works
model3 <- glmmTMB(
Vol ~ age + sex * time + time * HCD + (time | SID),
data = mydata2
)
summary(model)
mydf <- ggpredict(model3, terms = c("time", "HCD [-2.5, -0.5, 2.0]"))
plot(mydf)
plot(mydf, facets = T)
glmmTMB-plots

Sampling out of tables depending on other variables (R)

I am a physician just who just started working in R and appreciate any help in this question:
i have 2 tables (A, B) with the variables age (continous), sex (binary) and test_value (binary). Each table has a different age and sex distribution.
set.seed(10)
AgeA <- round(rnorm(100, mean = 40, sd = 15))
SexA <- sample(c("M","F"), 100, replace = TRUE, prob = c(0.5, 0.5))
Test_ValueA <- rbinom(100, 1, 0.3)
set.seed(20)
AgeB <- round(rnorm(1000, mean = 50, sd = 15))
SexB <- sample(c("M","F"), 1000, replace = TRUE, prob = c(0.5, 0.5))
Test_ValueB <- rbinom(1000, 1, 0.4)
A <- data.frame(Age = AgeA, Sex = SexA, Test = Test_ValueA)
B <- data.frame(Age = AgeB, Sex = SexB, Test = Test_ValueB)
genderA<-(prop.table(table(A[,2])))
TestA<-(prop.table(table(A[,3])))
paste("median age in group A is",median(A[,1]), "percentage female in group A is",genderA[1], "percentage of test positive in A is", TestA[2])
genderB<-(prop.table(table(B[,2])))
TestB<-(prop.table(table(B[,3])))
paste("median age in group A is",median(B[,1]), "percentage female in group B is",genderB[1], "percentage of test positive in A is", TestB[2])
The difference in test-proportion is now confounded by age and sex.
now i would like to match the patients from table A with table B to adjust for age and sex. because B is the smaller cohort i would prefer to sample out of A and match to B. is the match package an option? any other ideas
hopefully I was able to explain my problem.
any hints to which functions this may point?
Hello i have a possible answer, I will build two populations of a 100 people with the characteristics you said
set.seed(10)
AgeA <- rnorm(100, mean = 30, sd = 10)
#population A is 0.8 percent male
SexA <- sample(c("M","F"), 100, replace = TRUE, prob = c(0.5, 0.5))
Test_ValueA <- rbinom(100, 1, 0.5)
set.seed(20)
AgeB <- rnorm(100, mean = 30, sd = 10)
#population B is 0.8 percent male
SexB <- sample(c("M","F"), 100, replace = TRUE, prob = c(0.8, 0.2))
Test_ValueB <- rbinom(100, 1, 0.3)
A <- data.frame(Age = AgeA, Sex = SexA, Test = Test_ValueA)
B <- data.frame(Age = AgeB, Sex = SexB, Test = Test_ValueB)
Then using dplyr you can summarise population B parameters:
library(dplyr)
Bsummary <- group_by(B,Sex)
Bsummary <- summarise(Bsummary, PercenteagePositive = sum(Test == 1)/length(Test == 1), PercenteageSex = n()/100)
Bsummary
If you look at the results of this B is 76% male and 24% female, if you sampled 20 people from A you would have to sample 15 males and 5 females. First you separate the population of A on males and females:
Amale <- filter(A, Sex == "M")
Afemale <- filter(A, Sex == "F")
And from that you sample 15 males and 5 females:
SampleAMale <- Amale[sample(nrow(Amale), 15), ]
SampleAFemale <-Afemale[sample(nrow(Afemale), 5), ]
Then join them and you can summarise Them
sampleA <- rbind(SampleAMale, SampleAFemale)
ASampleSummary <- group_by(sampleA,Sex)
ASampleSummary <- summarise(ASampleSummary, PercenteagePositive = sum(Test == 1)/length(Test == 1), PercenteageSex = n()/100)
OK Fank I think you will like this answer a little better, the first part is the same, exept that the AGE IS ROUNDED:
set.seed(10)
AgeA <- round(rnorm(100, mean = 30, sd = 2))
#population A is 0.8 percent male
SexA <- sample(c("M","F"), 100, replace = TRUE, prob = c(0.5, 0.5))
Test_ValueA <- rbinom(100, 1, 0.5)
set.seed(20)
AgeB <- round(rnorm(100, mean = 30, sd = 2))
#population B is 0.8 percent male
SexB <- sample(c("M","F"), 100, replace = TRUE, prob = c(0.8, 0.2))
Test_ValueB <- rbinom(100, 1, 0.3)
A <- data.frame(Age = AgeA, Sex = SexA, Test = Test_ValueA)
B <- data.frame(Age = AgeB, Sex = SexB, Test = Test_ValueB)
now you just use prop.table to get the proportions of your population. Lets say you want to sample a 1000 individuals from B in the same proportion as A in terms of AGE and SEX you do this.
1000*(prop.table(table(A[,1:2])))
then by applying filters you can sample within groups:
for example if you want to get only the males age 30 in group B you could go
BMale30 <- filter(B, Sex == "M" & Age == 30)

Resources