Related
I'm attempting to use the nlme package to fit the Generalized beta of the 2nd kind distribution to simulated health cost data.
Running the following code on a test dataset:
Package installation (if necessary)
install.packages("withr", dependencies = T)
library(withr)
with_makevars(c(PKG_CFLAGS ="-std=gnu99"),
install.packages("cubature"), assignment="+=")
install.packages("GB2", dependencies = T)
install.packages("nlme", dependencies = T)
# load packages
library(cubature)
library(GB2)
library(nlme)
# Binary independent variables
age <- rbinom(n=1000, size=1, prob=.3)
sex <- rbinom(n=1000, size=1, prob=.5)
trmt <- rbinom(n=1000, size=1, prob=.5)
# GB2 parameter equations
shape1 <- exp(rnorm(n=1000, mean=.1 + age/100 - sex/10 + trmt/10, sd=.3))
scale <- exp(rnorm(n=1000, mean=7 + age/50 + sex - trmt, sd=.5))
shape2 <- exp(rnorm(n=1000, mean=1.5 + age/100 + sex/10 - trmt/10, sd=.3))
shape3 <- exp(rnorm(n=1000, mean=.5 + age/100 - sex/10 - trmt/10, sd=.3))
# Outcome
y <- rgb2(1000, shape1, scale, shape2, shape3)
# Create test dataset
df <- data.frame(cbind(y,age,sex,trmt,shape1,scale,shape2,shape3))
# Fit GB2 distribution to data
gb2_fit <- nlme(y ~ scale*beta(shape2 + 1/shape1, shape3 - 1/shape1)/beta(shape2, shape3),
# data = list(y=df_gb2_test[,1]),
data = df,
fixed = list(shape1 ~ age + sex + trmt,
scale ~ age + sex + trmt,
shape2 ~ age + sex + trmt,
shape3 ~ age + sex + trmt),
start = list(fixed = c(shape1 = 1.00, scale = 100, shape2 = 1.00, shape3 = 1.00)))
I get the error:
Error in parse(text = paste("~", paste(nVal, collapse = "/"))) :
<text>:2:0: unexpected end of input
1: ~
^
Any ideas what I'm doing wrong? I seem to be using the tilde operator correctly.
I think nlme doesn't do what you think it does. It does nonlinear least squares mixed models; i.e., the response is assumed to be Gaussian, and there is assumed to be a random effect (perhaps you're confusing this with SAS PROC NLMIXED, which is more general?
library(bbmle)
## we need a version of the density function that takes a 'log' argument
dgb2B <- function(..., log=FALSE) {
r <- GB2::dgb2(...)
if (!log) r else log(r)
}
## don't include shape1, scale shape2, shape3 in the data, that confuses things
df2 <- df[,c("y","age","sex", "trmt")]
## fit homogeneous model
m1 <- mle2(y ~ dgb2B(shape1, scale, shape2, shape3),
method="Nelder-Mead",
trace=TRUE,
data=df2,
start = list(shape1 = 1.00, scale = 100, shape2 = 1.00, shape3 = 1.00))
## allow parameters to vary by group
mle2(y ~ dgb2B(shape1, scale, shape2, shape3),
## parameters need to be in the same order!
parameters=list(shape1 ~ age + sex + trmt,
scale ~ age + sex + trmt,
shape2 ~ age + sex + trmt,
shape3 ~ age + sex + trmt),
method="Nelder-Mead",
control=list(maxit=10000,
## set parameter scales equal to magnitude
## of starting values; each top-level parameter
## has 4 associated values (intercept, + 3 cov effects)
parscale=rep(abs(coef(m1)), each=4)),
trace=TRUE,
data=df2,
start = as.list(coef(m1))
)
For what it's worth, for this example you could achieve the same goal by fitting eight separate models to all of the age × sex × treatment groups (but I can appreciate that your real application may be more complicated, i.e. you might only want a subset of the parameters to vary across groups, or might want to allow parameters to vary according to a continuous covariate.
If you are going to try much harder problems you might want to fit the parameters on the log scale.
There is also an error happening earlier on:
y <- rgb2(1, shape1, scale, shape2, shape3)
Error in rgb2(1, shape1, scale, shape2, shape3) :
could not find function "rgb2"
you may need to load the required package for this:
https://www.rdocumentation.org/packages/gamlss.dist/versions/5.3-2/topics/GB2
it appears to be in library(gamlss.dist)
Most of the time we run a regression with interactive terms, we are interested in a partial derivative. For example, consider the model below,
If I am interested to know the effect of X1 on P(Y), or the partial derivative of X1 on P(Y), I need the following combination of coefficients:
Instead of calculating it by hand, I can use, for example, the lincom function in R to calculate linear combination of regression parameters. But I would like not only to know the numbers from calculations like this; I would like to plot them. The problem is, if I am using a R package to plot coefficients (e.g., coefplot) it plots the coefficients from my model, but with no option for linear combination of coefficients. Is there any way to combine the lincom function (or other function that calculates combination of parameter) with coefplot (or other coefficient plot packages with this option)?
Of course, in the example above I only consider the derivative of X1, and if I plot it I will have a plot with one dot and its confidence intervals only, but I would like to show in the plot the coefficients for the partial derivatives of X1, X2, and Z, as in the example below.
Coefficients plot (the one I have):
Combination of parameters or partial derivatives plot (the one I am trying to get):
I discovered that Stata has a function that does what I am looking for, called "plotbeta." Does R have something similar?
Here's a start. This defined a function called plotBeta(), the ... are arguments that get passed down to geom_text() for the estimate text.
plotBeta <- function(mod, confidence_level = .95, include_est=TRUE, which.terms=NULL, plot=TRUE, ...){
require(glue)
require(ggplot2)
b <- coef(mod)
mains <- grep("^[^:]*$", names(b), value=TRUE)
mains.ind <- grep("^[^:]*$", names(b))
if(!is.null(which.terms)){
if(!(all(which.terms %in% mains)))stop("Not all terms in which.terms are in the model\n")
ins <- match(which.terms, mains)
mains <- mains[ins]
mains.ind <- mains.ind[ins]
}
icept <- grep("Intercept", mains)
if(length(icept) > 0){
mains <- mains[-icept]
mains.ind <- mains.ind[-icept]
}
if(inherits(mod, "lm") & !inherits(mod, "glm")){
crit <- qt(1-(1-confidence_level)/2, mod$df.residual)
}else{
crit <- qnorm(1-(1-confidence_level)/2)
}
out.df <- NULL
for(i in 1:length(mains)){
others <- grep(glue("^{mains[i]}:"), names(b))
others <- c(others, grep(glue(":{mains[i]}:"), names(b)))
others <- c(others, grep(glue(":{mains[i]}$"), names(b)))
all.inds <- c(mains.ind[i], others)
ones <- rep(1, length(all.inds))
est <- c(b[all.inds] %*% ones)
se.est <- sqrt(c(ones %*% vcov(mod)[all.inds, all.inds] %*% ones))
lower <- est - crit*se.est
upper <- est + crit*se.est
tmp <- data.frame(var = mains[i],
lab = glue("dy/d{mains[i]} = {paste('B', all.inds, sep='', collapse=' + ')}"),
labfac = i,
est = est,
se.est = se.est,
lower = lower,
upper=upper)
tmp$est_text <- sprintf("%.2f (%.2f, %.2f)", tmp$est, tmp$lower, tmp$upper)
out.df <- rbind(out.df, tmp)
}
out.df$labfac <- factor(out.df$labfac, labels=out.df$lab)
if(!plot){
return(out.df)
}else{
g <- ggplot(out.df, aes(x=est, y=labfac, xmin=lower, xmax=upper)) +
geom_vline(xintercept=0, lty=2, size=.25, col="gray50") +
geom_errorbarh(height=0) +
geom_point() +
ylab("") + xlab("Estimates Combined") +
theme_classic()
if(include_est){
g <- g + geom_text(aes(label=est_text), vjust=0, ...)
}
g
}
}
Here's an example with some made-up data:
set.seed(2101)
dat <- data.frame(
X1 = rnorm(500),
X2 = rnorm(500),
Z = rnorm(500),
W = rnorm(500)
)
dat <- dat %>%
mutate(yhat = X1 - X2 + X1*X2 - X1*Z + .5*X2*Z - .75*X1*X2*Z + W,
y = yhat + rnorm(500, 0, 1.5))
mod <- lm(y ~ X1*X2*Z + W, data=dat)
plotBeta(mod, position=position_nudge(y=.1), size=3) + xlim(-2.5,2)
EDIT: comparing two models
Using the newly-added plot=FALSE, we can generate the data and then combine and plot.
mod <- lm(y ~ X1*X2*Z + W, data=dat)
p1 <- plotBeta(mod, plot=FALSE)
mod2 <- lm(y ~ X1*X2 + Z + W, data=dat)
p2 <- plotBeta(mod2, plot=FALSE)
p1 <- p1 %>% mutate(model = factor(1, levels=1:2,
labels=c("Model 1", "Model 2")))
p2 <- p2 %>% mutate(model = factor(2, levels=1:2,
labels=c("Model 1", "Model 2")))
p_both <- bind_rows(p1, p2)
p_both <- p_both %>%
arrange(var, model) %>%
mutate(labfac = factor(1:n(), labels=paste("dy/d", var, sep="")))
ggplot(p_both, aes(x=est, y=labfac, xmin=lower, xmax=upper)) +
geom_vline(xintercept=0, lty=2, size=.25, col="gray50") +
geom_linerange(position=position_nudge(y=c(-.1, .1))) +
geom_point(aes(shape=model),
position=position_nudge(y=c(-.1, .1))) +
geom_text(aes(label=est_text), vjust=0,
position=position_nudge(y=c(-.2, .15))) +
scale_shape_manual(values=c(1,16)) +
ylab("") + xlab("Estimates Combined") +
theme_classic()
sub <- c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,16,16,17,17,18,18,19,19,20,20)
f1 <- c("f","f","f","f","f","f","f","f","f","f","f","f","f","f","f","f","f","f","f","f","m","m","m","m","m","m","m","m","m","m","m","m","m","m","m","m","m","m","m","m")
f2 <- c("c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c2","c2","c2","c2","c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c2","c2","c2","c2")
f3 <- c(0.03,0.03,0.49,0.49,0.33,0.33,0.20,0.20,0.13,0.13,0.05,0.05,0.47,0.47,0.30,0.30,0.22,0.22,0.15,0.15, 0.03,0.03,0.49,0.49,0.33,0.33,0.20,0.20,0.13,0.13,0.05,0.05,0.47,0.47,0.30,0.30,0.22,0.22,0.15,0.15)
y <- c(0.9,1,98,96,52,49,44,41,12,19,5,5,89,92,65,56,39,38,35,33, 87,83,5,7,55,58,67,61,70,80,88,90,0.8,0.9,55,52,55,58,70,69)
dat <- data.frame(sub=sub, f1=f1, f2=f2, f3=f3, y=y)
m <- lmer(y ~ f1*f2*f3 + (1|sub), data=dat)
Only the f1*f3 interaction is significant so now I'd like to plot this interaction using the predicted values from model m. I tried
X <- with(dat, expand.grid(f1=unique(f1), f3=range(f3)))
X$Predicted <- predict(m, newdata=X, re.form=NA)
but get an error...
If I add f2 and plot the results
X <- with(dat, expand.grid(f1=unique(f1), f3=range(f3), f2=unique(f2)))
X$Predicted <- predict(m, newdata=X, re.form=NA)
ggplot(X, aes(f3, Predicted)) + geom_path(aes(color=f2)) + facet_wrap(~f1)
I get two slopes in each panel corresponding to the levels of f2, but I just want the f1*f3 interaction from model m (without f2). Does anybody know how can I solve this?
The effects package is useful:
library(effects)
fit <- effect('f1:f3', m) # add xlevels = 100 for higher resolution CI's
fit_df <- as.data.frame(fit)
ggplot() +
geom_point(aes(f3, y, color = f1), dat) +
geom_ribbon(aes(f3, ymin = lower, ymax = upper, fill = f1), fit_df, alpha = 0.3) +
geom_line(aes(f3, fit, color = f1), fit_df)
The package prints a NOTE warning you that the requested term is part of a higher order interaction. Proceed at own risk. I'm pretty sure the confidence intervals here are asymptotic.
I was wondering how I can modify the following code to have a plot something like
data(airquality)
library(quantreg)
library(ggplot2)
library(data.table)
library(devtools)
# source Quantile LOESS
source("https://www.r-statistics.com/wp-content/uploads/2010/04/Quantile.loess_.r.txt")
airquality2 <- na.omit(airquality[ , c(1, 4)])
#'' quantreg::rq
rq_fit <- rq(Ozone ~ Temp, 0.95, airquality2)
rq_fit_df <- data.table(t(coef(rq_fit)))
names(rq_fit_df) <- c("intercept", "slope")
#'' quantreg::lprq
lprq_fit <- lapply(1:3, function(bw){
fit <- lprq(airquality2$Temp, airquality2$Ozone, h = bw, tau = 0.95)
return(data.table(x = fit$xx, y = fit$fv, bw = paste0("bw=", bw), fit = "quantreg::lprq"))
})
#'' Quantile LOESS
ql_fit <- Quantile.loess(airquality2$Ozone, jitter(airquality2$Temp), window.size = 10,
the.quant = .95, window.alignment = c("center"))
ql_fit_df <- data.table(x = ql_fit$x, y = ql_fit$y.loess, bw = "bw=1", fit = "Quantile LOESS")
I want to have all these fits in a plot.
geom_quantile can calculate quantiles using the rq method internally, so we don't need to create the rq_fit_df separately. However, the lprq and Quantile LOESS methods aren't available within geom_quantile, so I've used the data frames you provided and plotted them using geom_line.
In addition, to include the rq line in the color and linetype mappings and in the legend we add aes(colour="rq", linetype="rq") as a sort of "artificial" mapping inside geom_quantile.
library(dplyr) # For bind_rows()
ggplot(airquality2, aes(Temp, Ozone)) +
geom_point() +
geom_quantile(quantiles=0.95, formula=y ~ x, aes(colour="rq", linetype="rq")) +
geom_line(data=bind_rows(lprq_fit, ql_fit_df),
aes(x, y, colour=paste0(gsub("q.*:","",fit),": ", bw),
linetype=paste0(gsub("q.*:","",fit),": ", bw))) +
theme_bw() +
scale_linetype_manual(values=c(2,4,5,1,1)) +
labs(colour="Method", linetype="Method",
title="Different methods of estimating the 95th percentile by quantile regression")
I have tested a large sample of participants on two different tests of visual perception – now, I'd like to see to what extent performance on both tests correlates.
To visualise the correlation, I plot a scatterplot in R using ggplot() and I fit a regression line (using stat_smooth()). However, since both my x and y variable are performance measures, I need to take both of them into account when fitting my regression line – thus, I cannot use a simple linear regression (using stat_smooth(method="lm")), but rather need to fit an orthogonal regression (or Total least squares). How would I go about doing this?
I know I can specify formula in stat_smooth(), but I wouldn't know what formula to use. From what I understand, none of the preset methods (lm, glm, gam, loess, rlm) are applicable.
It turns out that you can extract the slope and intercept from principal components analysis on (x,y), as shown here. This is just a little simpler, runs in base R, and gives the identical result to using Deming(...) in MethComp.
# same `x and `y` as #user20650's answer
df <- data.frame(y, x)
pca <- prcomp(~x+y, df)
slp <- with(pca, rotation[2,1] / rotation[1,1])
int <- with(pca, center[2] - slp*center[1])
ggplot(df, aes(x,y)) +
geom_point() +
stat_smooth(method=lm, color="green", se=FALSE) +
geom_abline(slope=slp, intercept=int, color="blue")
Caveat: not familiar with this method
I think you should be able to just pass the slope and intercept to geom_abline to produce the fitted line. Alternatively, you could define your own method to pass to stat_smooth (as shown at the link smooth.Pspline wrapper for stat_smooth (in ggplot2)). I used the Deming function from the MethComp package as suggested at link How to calculate Total least squares in R? (Orthogonal regression).
library(MethComp)
library(ggplot2)
# Sample data and model (from ?Deming example)
set.seed(1)
M <- runif(100,0,5)
# Measurements:
x <- M + rnorm(100)
y <- 2 + 3 * M + rnorm(100,sd=2)
# Deming regression
mod <- Deming(x,y)
# Define functions to pass to stat_smooth - see mnel's answer at link for details
# Defined the Deming model output as class Deming to define the predict method
# I only used the intercept and slope for predictions - is this correct?
f <- function(formula,data,SDR=2,...){
M <- model.frame(formula, data)
d <- Deming(x =M[,2],y =M[,1], sdr=SDR)[1:2]
class(d) <- "Deming"
d
}
# an s3 method for predictdf (called within stat_smooth)
predictdf.Deming <- function(model, xseq, se, level) {
pred <- model %*% t(cbind(1, xseq) )
data.frame(x = xseq, y = c(pred))
}
ggplot(data.frame(x,y), aes(x, y)) + geom_point() +
stat_smooth(method = f, se= FALSE, colour='red', formula=y~x, SDR=1) +
geom_abline(intercept=mod[1], slope=mod[2], colour='blue') +
stat_smooth(method = "lm", se= FALSE, colour='green', formula = y~x)
So passing the intercept and slope to geom_abline produces the same fitted line (as expected). So if this is the correct approach then imo its easier to go with this.
The MethComp package seems to be no longer maintained (was removed from CRAN).
Russel88/COEF allows to use stat_/geom_summary with method="tls" to add an orthogonal regression line.
Based on this and wikipedia:Deming_regression I created the following functions, which allow to use noise ratios other than 1:
deming.fit <- function(x, y, noise_ratio = sd(y)/sd(x)) {
if(missing(noise_ratio) || is.null(noise_ratio)) noise_ratio <- eval(formals(sys.function(0))$noise_ratio) # this is just a complicated way to write `sd(y)/sd(x)`
delta <- noise_ratio^2
x_name <- deparse(substitute(x))
s_yy <- var(y)
s_xx <- var(x)
s_xy <- cov(x, y)
beta1 <- (s_yy - delta*s_xx + sqrt((s_yy - delta*s_xx)^2 + 4*delta*s_xy^2)) / (2*s_xy)
beta0 <- mean(y) - beta1 * mean(x)
res <- c(beta0 = beta0, beta1 = beta1)
names(res) <- c("(Intercept)", x_name)
class(res) <- "Deming"
res
}
deming <- function(formula, data, R = 100, noise_ratio = NULL, ...){
ret <- boot::boot(
data = model.frame(formula, data),
statistic = function(data, ind) {
data <- data[ind, ]
args <- rlang::parse_exprs(colnames(data))
names(args) <- c("y", "x")
rlang::eval_tidy(rlang::expr(deming.fit(!!!args, noise_ratio = noise_ratio)), data, env = rlang::current_env())
},
R=R
)
class(ret) <- c("Deming", class(ret))
ret
}
predictdf.Deming <- function(model, xseq, se, level) {
pred <- as.vector(tcrossprod(model$t0, cbind(1, xseq)))
if(se) {
preds <- tcrossprod(model$t, cbind(1, xseq))
data.frame(
x = xseq,
y = pred,
ymin = apply(preds, 2, function(x) quantile(x, probs = (1-level)/2)),
ymax = apply(preds, 2, function(x) quantile(x, probs = 1-((1-level)/2)))
)
} else {
return(data.frame(x = xseq, y = pred))
}
}
# unrelated hlper function to create a nicer plot:
fix_plot_limits <- function(p) p + coord_cartesian(xlim=ggplot_build(p)$layout$panel_params[[1]]$x.range, ylim=ggplot_build(p)$layout$panel_params[[1]]$y.range)
Demonstration:
library(ggplot2)
#devtools::install_github("Russel88/COEF")
library(COEF)
fix_plot_limits(
ggplot(data.frame(x = (1:5) + rnorm(100), y = (1:5) + rnorm(100)*2), mapping = aes(x=x, y=y)) +
geom_point()
) +
geom_smooth(method=deming, aes(color="deming"), method.args = list(noise_ratio=2)) +
geom_smooth(method=lm, aes(color="lm")) +
geom_smooth(method = COEF::tls, aes(color="tls"))
Created on 2019-12-04 by the reprex package (v0.3.0)
For anyone who is interested, I validated jhoward's solution against the deming::deming() function, as I was not familiar with jhoward's method of extracting the slope and intercept using PCA. They indeed produce identical results. Reprex is:
# Sample data and model (from ?Deming example)
set.seed(1)
M <- runif(100,0,5)
# Measurements:
x <- M + rnorm(100)
y <- 2 + 3 * M + rnorm(100,sd=2)
# Make data.frame()
df <- data.frame(x,y)
# Get intercept and slope using deming::deming()
library(deming)
mod_Dem <- deming::deming(y~x,df)
slp_Dem <- mod_Dem$coefficients[2]
int_Dem <- mod_Dem$coefficients[1]
# Get intercept and slope using jhoward's method
pca <- prcomp(~x+y, df)
slp_jhoward <- with(pca, rotation[2,1] / rotation[1,1])
int_jhoward <- with(pca, center[2] - slp_jhoward*center[1])
# Plot both orthogonal regression lines and simple linear regression line
library(ggplot2)
ggplot(df, aes(x,y)) +
geom_point() +
stat_smooth(method=lm, color="green", se=FALSE) +
geom_abline(slope=slp_jhoward, intercept=int_jhoward, color="blue", lwd = 3) +
geom_abline(slope=slp_Dem, intercept=int_Dem, color = "white", lwd = 2, linetype = 3)
Interestingly, if you switch the order of x and y in the models (i.e., to mod_Dem <- deming::deming(x~y,df) and pca <- prcomp(~y+x, df)) , you get completely different slopes:
My (very superficial) understanding of orthogonal regression was that it does not treat either variable as independent or dependent, and thus that the regression line should be unaffected by how the model is specified, e.g., as y~x vs x~y. Clearly I was very much mistaken, and I would be interested to hear anyone's thoughts about exactly why I was so wrong.