Graphing 95% Confidence Intervals for Regression with Linear AND Quadratic terms - r

I’m currently producing graphs that map out the parabolic relationship between a continuous predictor and the risk of an outcome including 95% confidence intervals. My model has linear and quadratic terms for my predictor (in this case it is blood pressure).
The issue that I’m having is that I have two terms for blood pressure a quadratic term (SBP2) and a linear term (SBP) that are both grand mean centered at around 138 mmHg and measured in increments of 5 mmHg (i.e. I divide by 5).
If I was producing the linear 95% confidence intervals it would be very easy but when I have two terms rather than producing one line below the graph and one line above the graph it produces two lines that both intersect in the graph. (see R script below).
The way that I produced these graphs is I took the linear term and added and subtracted 1.96*Std Error for that term to the Estimate. I then did the same for the quadratic term. I calculated the upper CI line by adding the linear upper CI term with the quadratic upper CI term (see R script below).
I’ve had a read online and quickly discovered that this is clearly the wrong approach.
But I can’t find very much at all on this question. I found an interesting article informing me how I might compute the Cis for a single linear term (https://rpubs.com/aaronsc32/regression-confidence-prediction-intervals) but nothing that might tell me how to do so for a combined linear and quadratic term equation.
Do you have any ideas on how I should approach this?
Matt
Using R Version - 4.2.1
R CODE
library('readr')
library('dplyr')
library('data.table')
library("survival")
library('ggplot2')
#AD RISK AND SBP
res.cox_SBPs_AD_m2 <- coxph(Surv(AD_timing, AD_binary) ~ Sex + W1_Age_CENT + W1_Age_CENT_2 +
+ W1_Education + W1_HT_hx + W1_DIAB + W1_SMOKING + APOE4 +
+ W1_SBP_CENT5 + W1_SBP_CENT5_2, data = dat2[dat2$Wave == 1 & dat2$W1_SBP_3SD == TRUE, ])
#Graphing out the risk
#Calculating our parameters
summary_table_AD <- summary(res.cox_SBPs_AD_m2)$coefficients
SBP_linear_term_AD <- summary_table_AD['W1_SBP_CENT5', 'coef']
SBP_linear_term_AD_UL <- summary_table_AD['W1_SBP_CENT5', 'coef'] + 1.96*summary_table_AD['W1_SBP_CENT5', 'se(coef)']
SBP_linear_term_AD_LL <- summary_table_AD['W1_SBP_CENT5', 'coef'] - 1.96*summary_table_AD['W1_SBP_CENT5', 'se(coef)']
SBP_quadratic_term_AD <- summary_table_AD['W1_SBP_CENT5_2', 'coef']
SBP_quadratic_term_AD_UL <- summary_table_AD['W1_SBP_CENT5_2', 'coef'] + 1.96*summary_table_AD['W1_SBP_CENT5_2', 'se(coef)']
SBP_quadratic_term_AD_LL <- summary_table_AD['W1_SBP_CENT5_2', 'coef'] - 1.96*summary_table_AD['W1_SBP_CENT5_2', 'se(coef)']
x_vals_AD <- array(74:204)
y_vals_AD <- array()
y_vals_AD_UL <- array()
y_vals_AD_LL <- array()
for (i in x_vals_AD) {
y_vals_AD[x_vals_AD - 73] <- exp(SBP_quadratic_term_AD*((x_vals_AD - 138.5835)/5)**2 + SBP_linear_term_AD*((x_vals_AD - 138.5835)/5))
y_vals_AD_UL[x_vals_AD - 73] <- exp(SBP_quadratic_term_AD_UL*((x_vals_AD - 138.5835)/5)**2 + SBP_linear_term_AD_UL*((x_vals_AD - 138.5835)/5))
y_vals_AD_LL[x_vals_AD - 73] <- exp(SBP_quadratic_term_AD_LL*((x_vals_AD - 138.5835)/5)**2 + SBP_linear_term_AD_LL*((x_vals_AD - 138.5835)/5))
}
total_results <- data.frame(cbind(x_vals_AD, y_vals_AD_UL, y_vals_AD_LL))
# Basic line plot with points
ggplot(data=total_results) +
geom_line(aes(x=x_vals_AD, y=y_vals_AD, colour = "AD"), linetype = "solid", size = 1) +
geom_line(aes(x=x_vals_AD, y=y_vals_AD_UL, colour = "AD UL"), linetype = "dotted", size = 1) +
geom_line(aes(x=x_vals_AD, y=y_vals_AD_LL, colour = "AD LL"), linetype = "dotted", size = 1) + labs(x = 'Baseline SBP (mmHg)', y = 'Hazard Ratios (HR)', title = "SBP and Dementia Risk") +
theme_light(base_size = 15) + theme(axis.line = element_line(size = 1, colour = "black", linetype=1), panel.grid = element_line(color = "grey", size = 0.125, linetype = 1), panel.border = element_blank())
ggsave(filename = 'SBP_Coxproportional_HR_AD_ULandLL.pdf')

Related

How can I find the optimal point that maximizes profit on a Demand Curve in R?

Relatively new to R so please bear with me - My dataset relates to one product. I have a list of Price points per unit which vary for this one product (price is dependent on salesperson decision), and a list of Quantities (number of units purchased at each price point), with the corresponding Profit for each transaction.
I have plotted a Demand Curve with Quantity on the Y axis and Unit Price on the X axis. I have also plotted a linear regression model which has given me the coefficient and intercept and allowed me to plot the abline showing the relationship. Code is as follows:
#Model
testlm <- lm(Dataset$Quantity Sold~Dataset$Unit Price)
#Plot
plot(Dataset$Unit Price, Dataset$Quantity Sold,
xlab = "Unit Price", ylab = "Quantity", pch=20,
ylim=c(0,40))
#abline
abline(a=11.38, b=-0.24, col="blue", lwd=2)
#Here is where I am trying to input two lines to cross an optimal point
abline(v=OptimalUnitPrice, lty=4, col = "red", lwd=2)
abline(h=OptimalQty, lty=4, col = "red", lwd=2)`
I am trying to figure out how to find the values for OptimalUnitPrice and OptimalQty - I am trying to find a function that works similarly to Excel's Solver - which allows me to find the optimal Price that factor's in the Quantity and maximizes the Profit. I believe I would have to overlay profit into the relationship and/or graph somehow but I cannot figure out how to do it.
Could someone please help? Thank you!
Hope this example can help you.
library(data.table)
library(ggplot2)
library(magrittr)
# linear function
linear <- function(p, alpha, beta) {
alpha*p + beta
}
# Synthetic data
p <- seq(60,200)
d <- linear(p, alpha = -1.5, beta = 300) + rnorm(sd = 5, length(p))
c <- 88
profit <- d*(p-c)
# Fit of the demand model
lm_model <- lm(d~p)
profit_fitted <- lm_model$fitted.values*(p - c)
# Pricing Optimization
alpha <- lm_model$coefficients[2]
beta <- lm_model$coefficients[1]
p_max_profit <- (alpha*c - beta)/(2*alpha)
# Plots: Prices X Demand
df_linear <- data.table('Prices' = p, 'Demand' = d,
'profit_fitted' = profit_fitted, 'Profit' = profit)
ggplot(df_linear) + aes(x = Prices, y = Demand) +
geom_point() + geom_smooth(method = lm)
# Plots: Prices X Profit
ggplot(df_linear) +
aes(x = Prices, y = Profit) +
geom_point() + geom_vline(xintercept = p_max_profit, lty = 2) +
geom_line(data = df_linear, aes(x = Prices, y = profit_fitted), color = 'blue')

Marginal effects / interaction plots for lfe felm regression object

I need to create an interaction / marginal effects plot for a fixed effects model including clustered standard errors generated using the lfe "felm" command.
I have already created a function that achieves this. However, before I start using it, I wanted to double-check whether this function is correctly specified. Please find the function and a reproducible example below.
library(lfe)
### defining function
felm_marginal_effects <- function(regression_model, data, treatment, moderator, treatment_translation, moderator_translation, dependent_variable_translation, alpha = 0.05, se = NULL) {
library(ggplot2)
library(ggthemes)
library(gridExtra)
### defining function to get average marginal effects
getmfx <- function(betas, data, treatment, moderator) {
betas[treatment] + betas[paste0(treatment, ":", moderator)] * data[, moderator]
}
### defining function to get marginal effects for specific levels of the treatment variable
getmfx_high_low <- function(betas, data, treatment, moderator, treatment_val) {
betas[treatment] * treatment_val + betas[paste0(treatment, ":", moderator)] * data[, moderator] * treatment_val
}
### Defining function to analytically derive standard error for marginal effects
getvarmfx <- function(my_vcov, data, treatment, moderator) {
my_vcov[treatment, treatment] + data[, moderator]^2 * my_vcov[paste0(treatment, ":", moderator), paste0(treatment, ":", moderator)] + 2 * data[, moderator] * my_vcov[treatment, paste0(treatment, ":", moderator)]
}
### constraining data to relevant variables
data <- data[, c(treatment, moderator)]
### getting marginal effects
data[, "marginal_effects"] <- getmfx(coef(regression_model), data, treatment, moderator)
### getting marginal effects for high and low cases of treatment variable
data[, "marginal_effects_treatment_low"] <- getmfx_high_low(coef(regression_model), data, treatment, moderator, quantile(data[,treatment], 0.05))
data[, "marginal_effects_treatment_high"] <- getmfx_high_low(coef(regression_model), data, treatment, moderator, quantile(data[,treatment], 0.95))
### getting robust SEs
if (is.null(se)) {
data$se <- getvarmfx(regression_model$vcv, data, treatment, moderator)
} else if (se == "clustered") {
data$se <- getvarmfx(regression_model$clustervcv, data, treatment, moderator)
} else if (se == "robust") {
data$se <- getvarmfx(regression_model$robustvcv, data, treatment, moderator)
}
### Getting CI bounds
data[, "ci_lower"] <- data[, "marginal_effects"] - abs(qt(alpha/2, regression_model$df, lower.tail = TRUE)) * sqrt(data$se)
data[, "ci_upper"] <- data[, "marginal_effects"] + abs(qt(alpha/2, regression_model$df, lower.tail = TRUE)) * sqrt(data$se)
### plotting marginal effects plot
p_1 <- ggplot(data, aes_string(x = moderator)) +
geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), fill = "grey70", alpha = 0.4) +
geom_line(aes(y = marginal_effects)) +
theme_fivethirtyeight() +
theme(plot.title = element_text(size = 11.5, hjust = 0.5), axis.title = element_text(size = 10)) +
geom_rug() +
xlab(moderator_translation) +
ylab(paste("Marginal effect of",treatment_translation,"on",dependent_variable_translation)) +
ggtitle("Average marginal effects")
p_2 <- ggplot(data, aes_string(x = moderator)) +
geom_line(aes(y = marginal_effects_treatment_high, color = paste0("High ",treatment_translation))) +
geom_line(aes(y = marginal_effects_treatment_low, color = paste0("Low ",treatment_translation))) +
theme_fivethirtyeight() +
theme(plot.title = element_text(size = 11.5, hjust = 0.5), axis.title = element_text(size = 10), axis.title.y = element_blank(), legend.justification = c(0.95, 0.95), legend.position = c(1, 1), legend.direction = "vertical") +
geom_rug() +
xlab(moderator_translation) +
ylab(paste("Marginal effect of",treatment_translation,"on",dependent_variable_translation)) +
ggtitle("Marginal effects at high / low levels of treatment") +
scale_color_manual(name = NULL, values = c(rgb(229, 93, 89, maxColorValue = 255), rgb(75, 180, 184, maxColorValue = 255)), labels=c(paste0("High ",treatment_translation), paste0("Low ",treatment_translation)))
### exporting plots as combined grob
return(grid.arrange(p_1, p_2, ncol = 2))
}
### example:
# example model (just for demonstration, fixed effects and cluster variables make little sense here)
model <- felm(mpg ~ cyl + am + cyl:am | carb | 0 | cyl, data = mtcars)
# creating marginal effects plot
felm_marginal_effects(regression_model = model, data = mtcars, treatment = "cyl", moderator = "am", treatment_translation = "Number of cylinders", moderator_translation = "Transmission", dependent_variable_translation = "Miles per (US) gallon")
The example output looks like this:
Happy for any advice on how to make this a better, "well-coded", fast function so that it's more useful for others afterwards. However, I'm mostly looking to confirm whether it's "correct" in the first place.
Additionally, I wanted to check back with the community regarding some remaining questions, particularly:
Can I use the standard errors I generated for the average marginal effects for the "high" and "low" treatment cases as well or do I need to generate different standard errors for these cases? If so how?
Instead of using the analytically derived standard errors, I could also calculate bootstrapped standard errors by creating many coefficient estimates based on repeated sub-samples of the data. How would I generate bootstrapped standard errors for the high / low case?
Is there something about fixed effects models or fixed effects models with clustered standard errors that make marginal effects plots or anything else I did in the code fundamentally inadmissible?
PS.: The above function and questions are kind of an extension of How to plot marginal effect of an interaction after felm() function

How can I get confidence intervals for an nls(broken stick) class object

I am trying to figure out how to obtain/plot confidence bounds for nls objects in R.
For example here is a nls model
bstick.lm.mean <- nls(TCTmean ~ cbind("intercept" = 1,
"l2Flow" = l2Flow,
"l2FlowBr" = ifelse(l2Flow > Br,
l2Flow - Br, 0)),
start = list(Br = 6),
algorithm = "plinear",
data = flow.new.sum.dat)
So bstick.lm.mean is a nls class object.
new.seq4 = seq(min(flow.new.sum.dat$l2Flow), max(flow.new.sum.dat$l2Flow), length = 200)
new.seq4 = data.frame(new.seq4)
names(new.seq4) = 'l2Flow'
pz = predict(bstick.lm.mean, newdata = new.seq4,
interval = 'confidence', se.fit = TRUE, level = 0.95)
test.frame2 = data.frame(new.seq4,pz)
ggplot(data = test.frame2)+
geom_point(mapping = aes(x = l2Flow, y = pz),
shape = 1, col = 'red') +
geom_point(data = test.frame,
aes(x = l2Flow, y = TCTmean),
shape = 0) +
theme(panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black")) +
xlab("Log2 Flow (KL)") +
ylab("Mean Transformed Ct")
Produces the following plot:
My question is, how can I obtain confidence bands for this non linear plot? I am referring to the bands around the regression line.
I extracted data from the scatterplot for analysis, and performed an equation search. I found that several different sigmoidal equations were fitting the data best, and for individual sigmoidal equations it should be simpler to determine the confidence intervals with standard statistical software. As an example, here are my results for a Hyperbolic Logistic sigmoidal equation, "y = (a * pow(x, b)) / (c + pow(x, b))", with fitted parameters a = 1.6177820755100655E+01, b = -1.5270446610701983E+01, and c = 4.2601082365916449E-12 yielding RMSE = 2.58 and R-squared = 0.85. Again, there were several equally "good" sigmoidal equations to choose from.
Estimates of uncertainty around change points (point of break) are notoriously hard to do analytically. Try the R package mcp which takes a computational (Bayesian) approach:
library(mcp)
model = list(
y ~ 1 + x, # Slope
~ 0 # joined plateau
)
fit = mcp(model, df)
plot(fit, q_fit = TRUE) # Plot with quantiles
The red lines are the highest-density interval, the grey lines random posterior draws, and the blue curve is the posterior distribution of the change point location. Use plot_pars(fit) and summary(fit) to get parameter-wise summaries and plots, including uncertainty.

How to find x value for OR = 1 in logit crude and adjusted GAM

I've some data for fitting crude and adjusted logit GAMs:
library(mgcv)
## Simulate some data...
set.seed(3);n<-400
dat <- gamSim(1,n=n)
mu <- binomial()$linkinv(dat$f/4-2)
phi <- .5
a <- mu*phi;b <- phi - a;
dat$y <- rbeta(n,a,b)
## Fitting GAMs
crude <- gam(y~s(x0),family=binomial(link="logit"),data=dat)
adj <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=binomial(link="logit"),data=dat)
Now I would intercept the value of x0 with the odds ratio (OR) 1.00 (i.e. probability 0.50). For this purpose I use visreg with argument plot = FALSE.
## Prepare data for ggplotting
library(visreg)
p.crude <- visreg(crude, "x0", plot = FALSE)
p.adj <- visreg(adj, "x0", plot = FALSE)
library(dplyr)
bind_rows(
mutate(p.crude$fit, Model = "crude"),
mutate(p.adj$fit, Model = "adj")
) -> fits
Ok. I gonna compute OR from LogOR. Is the following code correct?
# Compute ORs and CI from LogOR
fits$or <- exp(fits$visregFit)
fits$ci.low <- exp(fits$visregLwr)
fits$ci.up <- exp(fits$visregUpr)
Now I use approx in order to interpolate the x0 value with OR 1.00
## Interpolate x0 which give OR 1.00 (or 50% of probability)
x.crude <- round(approx(x = crude$fitted.values, y=crude$model$x0, xout = .5)$y, 1)
x.adj <- round(approx(x = adj$fitted.values, y=adj$model$x0, xout = .5)$y, 1)
Finally, I'm plotting the two models in a single graph:
## Plotting using ggplot
library(ggplot2)
ggplot(data = fits) +
geom_vline(aes(xintercept = x.crude), size=.2, color="black")+
geom_vline(aes(xintercept = x.adj), size=.2, color="red")+
annotate(geom ="text", x= x.crude - 0.05, y=.5, label = x.crude, size=3.5) +
annotate(geom ="text", x= x.adj - 0.05, y=.5, label = x.adj, size=3.5, color="red") +
geom_ribbon(aes(x0, ymin=ci.low, ymax=ci.up, group=Model, fill=Model), alpha=.05) +
geom_line(aes(x0, or, group=Model, color=Model)) +
labs(x="X0", y="Odds ratio")+
theme_bw(16)
As you can see, only the crude model shows an intercept with OR almost equal to 1.00 (x0 = 0.9), while this never happens for the adj model.
First, how can I get an interpolation with OR that is exactly at 1?
Second...With the limitation of my statistical knowledge, it was my understanding that I should have observed an intercept with OR=1 for the adj model, as well, based on the observed values for x0 according to this model. Why is the relative curve set upwards?

Line plot of mixed models / lsmeans results (with ggplot?)

I have longitudinal repeated measures on individuals over 4 timepoints. Following a mixed models analysis with time as fixed effect and random slopes I have used lsmeans to estimate the mean values at each time point as well as 95% confidence intervals. I would now like to plot a line graph with time points (x) and mean values of my outcome variable (y) with the CIs. Can I use e.g. ggplot to plot the results that I got from lsmeans? Or is there another smart way to plot this?
The results that I get from lsmeans, and that I would like to plot (lsmean, lower.CL, upperCL over time), are:
$lsmeans
time lsmean SE df lower.CL upper.CL
0 21.967213 0.5374422 60 20.892169 23.04226
1 16.069586 0.8392904 60 14.390755 17.74842
2 13.486802 0.8335159 60 11.819522 15.15408
3 9.495137 0.9854642 60 7.523915 11.46636
Confidence level used: 0.95
Is this what you meant?
# To convert from lsmeans output (d <- lsmeans(paramaters))
d <- summary(d)$lsmeans[c("lsmean", "lower.CL", "upper.CL")]
library(ggplot2)
ggplot(d, aes(time)) +
geom_line(aes(y = lsmean)) +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL),
width = 0.2) +
geom_point(aes(y = lsmean), size = 3,
shape = 21, fill = "white") +
labs(x = "Time", y = "ls mean",
title = "ls mean result over time") +
theme_bw()
To summarize, the whole code that will give you the estimates and plot of the mixed model is:
## random slope model
summary(model <- lme(outcome ~ time, random = ~1+time|ID, data = data,
na.action = na.exclude, method = "ML"))
## pairwise comparisons of timepoints
install.packages("lsmeans")
library(lsmeans)
lsmeans(model, pairwise~time, adjust="tukey")
### Draw the picture
d <- summary(lsmeans(model, ~time))
library(ggplot2)
ggplot(d, aes(time)) +
geom_line(aes(y = lsmean, group = 1)) +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width = 0.2) +
geom_point(aes(y = lsmean), size = 3, shape = 21, fill = "white") +
labs(x = "Time", y = "ls mean", title = "ls mean result over time") +
theme_bw()

Resources