Changing Color in ggplot2 Scatterplots - r

I'm attempting to modify some existing code that was originally from the question found here (https://stats.stackexchange.com/questions/76999/simulating-longitudinal-lognormal-data-in-r), and used to demonstrate Scatterplots in R at the following website: https://hopstat.wordpress.com/2014/10/30/my-commonly-done-ggplot2-graphs/
It's a simple and stupid question, but I've been struggling with it all morning. The following code gives a nice black and white scatterplot. I want to modify the code to make the lines a very light gray.
library(MASS)
library(nlme)
library(plyr)
library(ggplot2)
### set number of individuals
n <- 200
### average intercept and slope
beta0 <- 1.0
beta1 <- 6.0
### true autocorrelation
ar.val <- .4
### true error SD, intercept SD, slope SD, and intercept-slope cor
sigma <- 1.5
tau0 <- 2.5
tau1 <- 2.0
tau01 <- 0.3
### maximum number of possible observations
m <- 10
### simulate number of observations for each individual
p <- round(runif(n,4,m))
### simulate observation moments (assume everybody has 1st obs)
obs <- unlist(sapply(p, function(x) c(1, sort(sample(2:m, x-1,
replace=FALSE)))))
### set up data frame
dat <- data.frame(id=rep(1:n, times=p), obs=obs)
### simulate (correlated) random effects for intercepts and slopes
mu <- c(0,0)
S <- matrix(c(1, tau01, tau01, 1), nrow=2)
tau <- c(tau0, tau1)
S <- diag(tau) %*% S %*% diag(tau)
U <- mvrnorm(n, mu=mu, Sigma=S)
### simulate AR(1) errors and then the actual outcomes
dat$eij <- unlist(sapply(p, function(x) arima.sim(model=list(ar=ar.val),
n=x) * sqrt(1-ar.val^2) * sigma))
dat$yij <- (beta0 + rep(U[,1], times=p)) + (beta1 + rep(U[,2], times=p)) *
log(dat$obs) + dat$eij
dat = ddply(dat, .(id), function(x){
x$alpha = ifelse(runif(n = 1) > 0.9, 1, 0.1)
x$grouper = factor(rbinom(n=1, size =3 ,prob=0.5), levels=0:3)
x
})
tspag = ggplot(dat, aes(x=obs, y=yij)) +
geom_line() + guides(colour=FALSE) + xlab("Observation Time Point") +
ylab("Y")
spag = tspag + aes(colour = factor(id))
spag
bwspag = tspag + aes(group=factor(id))
bwspag
I've tried scale_colour_manual, I've tried defining the color within the aes statement on the bwspag line...no luck. I'm relatively inexperienced with R. I appreciate any assistance!

Do you want to make the line in grayscale? If yes, then adding colour in geom_line() function should be enough. For example:
ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_line(colour = "gray40")
You can choose other values with gray: from 0 to 100. More info here.

Related

plot lower-level interactions with predicted values in ggplot2

sub <- c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,16,16,17,17,18,18,19,19,20,20)
f1 <- c("f","f","f","f","f","f","f","f","f","f","f","f","f","f","f","f","f","f","f","f","m","m","m","m","m","m","m","m","m","m","m","m","m","m","m","m","m","m","m","m")
f2 <- c("c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c2","c2","c2","c2","c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c2","c2","c2","c2")
f3 <- c(0.03,0.03,0.49,0.49,0.33,0.33,0.20,0.20,0.13,0.13,0.05,0.05,0.47,0.47,0.30,0.30,0.22,0.22,0.15,0.15, 0.03,0.03,0.49,0.49,0.33,0.33,0.20,0.20,0.13,0.13,0.05,0.05,0.47,0.47,0.30,0.30,0.22,0.22,0.15,0.15)
y <- c(0.9,1,98,96,52,49,44,41,12,19,5,5,89,92,65,56,39,38,35,33, 87,83,5,7,55,58,67,61,70,80,88,90,0.8,0.9,55,52,55,58,70,69)
dat <- data.frame(sub=sub, f1=f1, f2=f2, f3=f3, y=y)
m <- lmer(y ~ f1*f2*f3 + (1|sub), data=dat)
Only the f1*f3 interaction is significant so now I'd like to plot this interaction using the predicted values from model m. I tried
X <- with(dat, expand.grid(f1=unique(f1), f3=range(f3)))
X$Predicted <- predict(m, newdata=X, re.form=NA)
but get an error...
If I add f2 and plot the results
X <- with(dat, expand.grid(f1=unique(f1), f3=range(f3), f2=unique(f2)))
X$Predicted <- predict(m, newdata=X, re.form=NA)
ggplot(X, aes(f3, Predicted)) + geom_path(aes(color=f2)) + facet_wrap(~f1)
I get two slopes in each panel corresponding to the levels of f2, but I just want the f1*f3 interaction from model m (without f2). Does anybody know how can I solve this?
The effects package is useful:
library(effects)
fit <- effect('f1:f3', m) # add xlevels = 100 for higher resolution CI's
fit_df <- as.data.frame(fit)
ggplot() +
geom_point(aes(f3, y, color = f1), dat) +
geom_ribbon(aes(f3, ymin = lower, ymax = upper, fill = f1), fit_df, alpha = 0.3) +
geom_line(aes(f3, fit, color = f1), fit_df)
The package prints a NOTE warning you that the requested term is part of a higher order interaction. Proceed at own risk. I'm pretty sure the confidence intervals here are asymptotic.

Making surface plot of regression estimates from multiple continuous variables

I have a multi-level model with categorical and continuous variables and splines. Nice and complex. Anyhow I am trying to visualize model fit.
For example, here is some toy data:
library(lme4)
library(rms)
library(gridExtra)
## Make model using sleepstudy data
head(sleepstudy)
# Add some extra vars
sleepstudy$group <- factor( sample(c(1,2), nrow(sleepstudy), replace=TRUE) )
sleepstudy$x1 <- jitter(sleepstudy$Days, factor=5)^2 * jitter(sleepstudy$Reaction)
# Set up a mixed model with spline
fm1 <- lmer(Reaction ~ rcs(Days, 4) * group + (rcs(Days, 4) | Subject), sleepstudy)
# Now add continuous covar
fm2 <- lmer(Reaction ~ rcs(Days, 4) * group + x1 + (rcs(Days, 4) | Subject), sleepstudy)
# Plot fit
new.df <- sleepstudy
new.df$pred1 <- predict(fm1, new.df, allow.new.levels=TRUE, re.form=NA)
new.df$pred2 <- predict(fm2, new.df, allow.new.levels=TRUE, re.form=NA)
g1 <- ggplot(data=new.df, aes(x=Days)) +
geom_line(aes(y=pred1, col=group), size=2) +
ggtitle("Model 1")
g2 <- ggplot(data=new.df, aes(x=Days)) +
geom_line(aes(y=pred2, col=group), size=2) +
ggtitle("Model 2")
grid.arrange(g1, g2, nrow=1)
Plot 1 is smooth, but plot 2 is jagged due to the effect of x1. So I would like to make a surface plot with x = Days, y = x1 and z = pred2 and stratified by group. Not having experience of surface plots I've started out with the wireframe command:
wireframe(pred2 ~ Days * x1, data = new.df[new.df$group==1,],
xlab = "Days", ylab = "x1", zlab="Predicted fit"
)
However although this command does not give an error, my plot is blank:
Questions:
Where am I going wrong with my wireframe?
Is there a better way to visualize my model fit?
I figured out that the data format needed for a wireframe' orplot_ly' surface is that of a 2D matrix of x rows by y columns of corresponding z values (I got a hint towards this from this question Plotly 3d surface graph has incorrect x and y axis values). I also realised I could use `expand.grid' to make a matrix covering the range of possible x and y values and use those to predict z as follows:
days <- 0:9
x1_range <- range(sleepstudy$x1)[2] * c(0.05, 0.1, 0.15, 0.2, 0.25, 0.3)
new.data2 <- expand.grid(Days = days, x1 = x1_range, group = unique(sleepstudy$group) )
new.data2$pred <- predict(fm2, new.data2, allow.new.levels=TRUE, re.form=NA)
I can then stuff those into two different matrices to represent the z-surface for each group in my model:
surf1 <- ( matrix(new.data2[new.data2$group == 1, ]$pred, nrow = length(days), ncol = length(x1_range)) )
surf2 <- ( matrix(new.data2[new.data2$group == 2, ]$pred, nrow = length(days), ncol = length(x1_range)) )
group <- c(rep(1, nrow(surf1)), rep(2, nrow(surf2) ))
Finally I can use plot_ly to plot each surface:
plot_ly (z=surf1, x = mets_range, y = ages, type="surface") %>%
add_surface (z = surf2, surfacecolor=surf2,
color=c('red','yellow'))
The resulting plot:
So the resulting plot is what I wanted (albeit not very useful in this made up example but useful in real data). The only thing I can't figure out is how to show two different color scales. I can suppres the scale altogether but if anyone knows how to show 2 scales for different surfaces do please let me know and I will edit the answer.

geom_abline for logistic regression (ggplot2)

I am sorry if this question is very simple, however, I could not find any solution to my problem. I want to plot logistic regressions lines with ggplot2. The problem is that I cannot use geom_abline because I dont have the original model, just the slope and intercept for each regression line. I have use this approach for linear regressions, and this works fine with geom_abline, because you can just give multiple slopes and intercepts to the function.
geom_abline(data = estimates, aes(intercept = inter, slope = slo)
where inter and slo are vectors with more then one value.
If I try the same approach with coefficients from a logistic regression, I will get the wrong regression lines (linear). I am trying to use geom_line, however, I cannot use the function predict to generate the predicted values because I dont have the a original model objetc.
Any suggestion?
Thanks in advance,
Gustavo
If the model had a logit link then you could plot the prediction using only the intercept (coefs[1]) and slope (coefs[2]) as:
library(ggplot2)
n <- 100L
x <- rnorm(n, 2.0, 0.5)
y <- factor(rbinom(n, 1L, plogis(-0.6 + 1.0 * x)))
mod <- glm(y ~ x, binomial("logit"))
coefs <- coef(mod)
x_plot <- seq(-5.0, 5.0, by = 0.1)
y_plot <- plogis(coefs[1] + coefs[2] * x_plot)
plot_data <- data.frame(x_plot, y_plot)
ggplot(plot_data) + geom_line(aes(x_plot, y_plot), col = "red") +
xlab("x") + ylab("p(y | x)") +
scale_y_continuous(limits = c(0, 1)) + theme_bw()
Edit
Here one way of plotting k predicted probability lines on the same graph following from the previous code:
library(reshape2)
k <- 5L
intercepts <- rnorm(k, coefs[1], 0.5)
slopes <- rnorm(k, coefs[2], 0.5)
x_plot <- seq(-5.0, 5.0, by = 0.1)
model_predictions <- sapply(1:k, function(idx) {
plogis(intercepts[idx] + slopes[idx] * x_plot)
})
colnames(model_predictions) <- 1:k
plot_data <- as.data.frame(cbind(x_plot, model_predictions))
plot_data_melted <- melt(plot_data, id.vars = "x_plot", variable.name = "model",
value.name = "y_plot")
ggplot(plot_data_melted) + geom_line(aes(x_plot, y_plot, col = model)) +
xlab("x") + ylab("p(y | x)") +
scale_y_continuous(limits = c(0, 1)) + theme_bw()

ggplot2: How to plot an orthogonal regression line?

I have tested a large sample of participants on two different tests of visual perception – now, I'd like to see to what extent performance on both tests correlates.
To visualise the correlation, I plot a scatterplot in R using ggplot() and I fit a regression line (using stat_smooth()). However, since both my x and y variable are performance measures, I need to take both of them into account when fitting my regression line – thus, I cannot use a simple linear regression (using stat_smooth(method="lm")), but rather need to fit an orthogonal regression (or Total least squares). How would I go about doing this?
I know I can specify formula in stat_smooth(), but I wouldn't know what formula to use. From what I understand, none of the preset methods (lm, glm, gam, loess, rlm) are applicable.
It turns out that you can extract the slope and intercept from principal components analysis on (x,y), as shown here. This is just a little simpler, runs in base R, and gives the identical result to using Deming(...) in MethComp.
# same `x and `y` as #user20650's answer
df <- data.frame(y, x)
pca <- prcomp(~x+y, df)
slp <- with(pca, rotation[2,1] / rotation[1,1])
int <- with(pca, center[2] - slp*center[1])
ggplot(df, aes(x,y)) +
geom_point() +
stat_smooth(method=lm, color="green", se=FALSE) +
geom_abline(slope=slp, intercept=int, color="blue")
Caveat: not familiar with this method
I think you should be able to just pass the slope and intercept to geom_abline to produce the fitted line. Alternatively, you could define your own method to pass to stat_smooth (as shown at the link smooth.Pspline wrapper for stat_smooth (in ggplot2)). I used the Deming function from the MethComp package as suggested at link How to calculate Total least squares in R? (Orthogonal regression).
library(MethComp)
library(ggplot2)
# Sample data and model (from ?Deming example)
set.seed(1)
M <- runif(100,0,5)
# Measurements:
x <- M + rnorm(100)
y <- 2 + 3 * M + rnorm(100,sd=2)
# Deming regression
mod <- Deming(x,y)
# Define functions to pass to stat_smooth - see mnel's answer at link for details
# Defined the Deming model output as class Deming to define the predict method
# I only used the intercept and slope for predictions - is this correct?
f <- function(formula,data,SDR=2,...){
M <- model.frame(formula, data)
d <- Deming(x =M[,2],y =M[,1], sdr=SDR)[1:2]
class(d) <- "Deming"
d
}
# an s3 method for predictdf (called within stat_smooth)
predictdf.Deming <- function(model, xseq, se, level) {
pred <- model %*% t(cbind(1, xseq) )
data.frame(x = xseq, y = c(pred))
}
ggplot(data.frame(x,y), aes(x, y)) + geom_point() +
stat_smooth(method = f, se= FALSE, colour='red', formula=y~x, SDR=1) +
geom_abline(intercept=mod[1], slope=mod[2], colour='blue') +
stat_smooth(method = "lm", se= FALSE, colour='green', formula = y~x)
So passing the intercept and slope to geom_abline produces the same fitted line (as expected). So if this is the correct approach then imo its easier to go with this.
The MethComp package seems to be no longer maintained (was removed from CRAN).
Russel88/COEF allows to use stat_/geom_summary with method="tls" to add an orthogonal regression line.
Based on this and wikipedia:Deming_regression I created the following functions, which allow to use noise ratios other than 1:
deming.fit <- function(x, y, noise_ratio = sd(y)/sd(x)) {
if(missing(noise_ratio) || is.null(noise_ratio)) noise_ratio <- eval(formals(sys.function(0))$noise_ratio) # this is just a complicated way to write `sd(y)/sd(x)`
delta <- noise_ratio^2
x_name <- deparse(substitute(x))
s_yy <- var(y)
s_xx <- var(x)
s_xy <- cov(x, y)
beta1 <- (s_yy - delta*s_xx + sqrt((s_yy - delta*s_xx)^2 + 4*delta*s_xy^2)) / (2*s_xy)
beta0 <- mean(y) - beta1 * mean(x)
res <- c(beta0 = beta0, beta1 = beta1)
names(res) <- c("(Intercept)", x_name)
class(res) <- "Deming"
res
}
deming <- function(formula, data, R = 100, noise_ratio = NULL, ...){
ret <- boot::boot(
data = model.frame(formula, data),
statistic = function(data, ind) {
data <- data[ind, ]
args <- rlang::parse_exprs(colnames(data))
names(args) <- c("y", "x")
rlang::eval_tidy(rlang::expr(deming.fit(!!!args, noise_ratio = noise_ratio)), data, env = rlang::current_env())
},
R=R
)
class(ret) <- c("Deming", class(ret))
ret
}
predictdf.Deming <- function(model, xseq, se, level) {
pred <- as.vector(tcrossprod(model$t0, cbind(1, xseq)))
if(se) {
preds <- tcrossprod(model$t, cbind(1, xseq))
data.frame(
x = xseq,
y = pred,
ymin = apply(preds, 2, function(x) quantile(x, probs = (1-level)/2)),
ymax = apply(preds, 2, function(x) quantile(x, probs = 1-((1-level)/2)))
)
} else {
return(data.frame(x = xseq, y = pred))
}
}
# unrelated hlper function to create a nicer plot:
fix_plot_limits <- function(p) p + coord_cartesian(xlim=ggplot_build(p)$layout$panel_params[[1]]$x.range, ylim=ggplot_build(p)$layout$panel_params[[1]]$y.range)
Demonstration:
library(ggplot2)
#devtools::install_github("Russel88/COEF")
library(COEF)
fix_plot_limits(
ggplot(data.frame(x = (1:5) + rnorm(100), y = (1:5) + rnorm(100)*2), mapping = aes(x=x, y=y)) +
geom_point()
) +
geom_smooth(method=deming, aes(color="deming"), method.args = list(noise_ratio=2)) +
geom_smooth(method=lm, aes(color="lm")) +
geom_smooth(method = COEF::tls, aes(color="tls"))
Created on 2019-12-04 by the reprex package (v0.3.0)
For anyone who is interested, I validated jhoward's solution against the deming::deming() function, as I was not familiar with jhoward's method of extracting the slope and intercept using PCA. They indeed produce identical results. Reprex is:
# Sample data and model (from ?Deming example)
set.seed(1)
M <- runif(100,0,5)
# Measurements:
x <- M + rnorm(100)
y <- 2 + 3 * M + rnorm(100,sd=2)
# Make data.frame()
df <- data.frame(x,y)
# Get intercept and slope using deming::deming()
library(deming)
mod_Dem <- deming::deming(y~x,df)
slp_Dem <- mod_Dem$coefficients[2]
int_Dem <- mod_Dem$coefficients[1]
# Get intercept and slope using jhoward's method
pca <- prcomp(~x+y, df)
slp_jhoward <- with(pca, rotation[2,1] / rotation[1,1])
int_jhoward <- with(pca, center[2] - slp_jhoward*center[1])
# Plot both orthogonal regression lines and simple linear regression line
library(ggplot2)
ggplot(df, aes(x,y)) +
geom_point() +
stat_smooth(method=lm, color="green", se=FALSE) +
geom_abline(slope=slp_jhoward, intercept=int_jhoward, color="blue", lwd = 3) +
geom_abline(slope=slp_Dem, intercept=int_Dem, color = "white", lwd = 2, linetype = 3)
Interestingly, if you switch the order of x and y in the models (i.e., to mod_Dem <- deming::deming(x~y,df) and pca <- prcomp(~y+x, df)) , you get completely different slopes:
My (very superficial) understanding of orthogonal regression was that it does not treat either variable as independent or dependent, and thus that the regression line should be unaffected by how the model is specified, e.g., as y~x vs x~y. Clearly I was very much mistaken, and I would be interested to hear anyone's thoughts about exactly why I was so wrong.

Create data with function inside shinyServer to feed ggplot

I have a Shiny app what calculates some power estimates for a type of genetic association study. The ui.R is pretty simple, and the server.R has a function that gives a data frame (I think I can't have this function as reactive because it has some parameters).
The link to the Gist is here. To run it:
library(shiny)
shiny:: runGist('5895082')
The app calculates correctly the estimates, but I have two questions regarding it:
Is it possible to have the output$powTable actually represent all the values contained within the range, in the first sliderInput(n.cases)?. It only seems to represent the two extreme values of the range... what I'm doing wrong?
There's an error when running the app:
Error: Reading objects from shinyoutput object not allowed.
How can I pass the data (reactivity?) from the function f() to feed the ggplot? After much trial and error, I am very lost. Where can be the error in my code? Many thaks in advance!
The original code of the function works well: (EDITED)
f <- function(ncases, p0, OR.cas.ctrl, Nh, sig.level) {
num.cases <- ncases
p0 <- p0
Nh <- Nh
OR.cas.ctrl <- OR.cas.ctrl
sig.level <- sig.level
# Parameters related to sig.level, from [Table 2] of Samuels et al.
# For 90% power and alpha = .05, Nscaled = 8.5
if (sig.level == 0.05){
A <- -28 # Parameter A for alpha=.05
x0 <- 2.6 # Parameter x0 for alpha=.05
d <- 2.4 # Parameter d for alpha=.05
}
if (sig.level == 0.01){
A <- -13 # Parameter A for alpha=.01
x0 <- 5 # Parameter x0 for alpha=.01
d <- 2.5 # Parameter d for alpha=.01
}
if (sig.level == 0.001){
A <- -7 # Parameter A for alpha=.001
x0 <- 7.4 # Parameter x0 for alpha=.001
d <- 2.8 # Parameter d for alpha=.001
}
out.pow <- NULL # initialize vector
for(ncases in ncases){
OR.ctrl.cas <- 1 / OR.cas.ctrl # 1. CALCULATE P1 FROM A PREDEFINED P0, AND A DESIRED OR
OR <- OR.ctrl.cas
bracket.pw <- p0 / (OR - OR*p0) # obtained after isolating p1 in OR equation [3].
p1 <- bracket.pw / (1 + bracket.pw)
Nh037 <- Nh^0.37 # 2. CALCULATE NSCALED
num.n <- num.cases*((p1-p0)^2)
den.n <- (p1*(1-p1) + p0*(1-p0))*Nh037
Nscaled <- num.n/den.n
num.power <- A - 100 # 3. CALCULATE POWER
den.power <- 1 + exp((Nscaled - x0)/d)
power <- 100 + (num.power/den.power) # The power I have to detect a given OR with my data, at a given alpha
}
OR <- OR.cas.ctrl
out.pow <- data.frame(num.cases, Nh, Nscaled, p0, OR, sig.level, power)
out.pow
}
mydata <- f(ncases=seq(50,1000, by=50), 0.4, 2.25, 11, 0.05)
mydata
library(ggplot2)
print(ggplot(data = mydata, aes(num.cases, power)) +
theme_bw() +
theme(text=element_text(family="Helvetica", size=12)) +
labs(title = "Ad-hoc power for haplogroup") +
scale_color_brewer(palette = "Dark2", guide = guide_legend(reverse=TRUE)) +
xlab("number of cases/controls") +
ylab("power") +
scale_x_log10() +
geom_line(alpha=0.8, size=0.2) +
geom_point(aes(shape = factor(OR)), colour="black"))
First of all, you have n.cases named inconsistently I think. It's n.cases sometimes, and ncases other times. Is that a mistake?
Anyway, output$mydata() is incorrect. It isn't an output. It should be just:
mydata <- reactive(f(input$n.cases,
input$p0,
input$OR.cas.ctrl,
input$Nh,
input$sig.level))
And then when executing it in output$powHap() it should be:
output$powHap <- renderPlot(
{
print(ggplot(data = mydata(), aes(ncases, power)) +
theme_bw() +
theme(text=element_text(family="Helvetica", size=12)) +
labs(title = "Ad-hoc power for haplogroup") +
scale_color_brewer(palette = "Dark2", guide = guide_legend(reverse=TRUE)) +
xlab("number of cases/controls") +
ylab("power") +
scale_x_log10() +
geom_line(alpha=0.8, size=0.2) +
geom_point(aes(shape = factor(OR)), colour="black"))
})
The important part there is that you need to do:
data = mydata()
rather than
data = output$mydata
Because output$mydata is a (reactive) function.
I would suggest reading the documentation on how reactives work. The whole thing should make a lot more sense afterwards. +1 for a very reproducible example by the way. This is how all questions should be posted.

Resources