How to create a different vline for 3 conditions? - r

I am trying to get a vline on a predicted plot until the sjplot::plot_model function. I have a paneled graph and for each condition (baseline, autonomous and fairness) I have a different vline I want to depict. Basically the point of divergence when the CI bands no longer overlap.
The issue I have is that I can get the lines on there, but it's all three lines for each panel rather than a unique line for each panel.
Here is the code I use to create my model
model3 <- glmer (reject ~ (1|sn) + condition*dist*actor.age.years +
block + actor.gender,
data = en.long.ai, family = binomial,
nAGQ = 1,
control = glmerControl(optimizer = "optimx", optCtrl = list(method = "bobyqa")))
then I do this to create my plot
ai.pred.plot <- sjPlot::plot_model(model3, type = "pred", terms = c( "actor.age.years", "dist", "condition"), title = "AI: Predicted probability of rejections",
se = TRUE,
legend.title = "allocation type",
show.data = F,
colors = "Set2")
print (ai.pred.plot + labs(y = "Probability of rejection") + labs(x = "Age (years)"))
Here is what I get.
predicted plot of rejections by condition and age
Then I try to make the vline. I've tried many methods....
#create a data frame
#one attempt
Primary<- c(6.6, 7.4, 4.75 )
grid <- c("Basline", "Autonomous", "Fairness")
treat<- data.frame(grid, Primary,stringsAsFactors = F)
#another attempt
vline.data <- data.frame(z = 6.6, 7.2, 4.5), condition = c("Baseline","Autonomous","Fariness"))
vline.data <- data.frame(condition = ("Baseline", z = 6.6), ("Autonomous", z = 7.2), ("Fariness", z = 7.2))
# then I try to add it to the plot
ai.pred.plot + geom_vline(aes(xintercept = z), vline.data, linetype = "dashed", color = "grey30") +
guides(fill = FALSE)
#or
print (ai.pred.plot + geom_vline(data=treat,aes(xintercept = Primary), linetype = "dotdash") )
I always get this.
predicted plot with 3 vlines
but I want one line per panel.
please help.

Related

How to plot two `ggscatter` correlation plots with confidence intervals on the same graph in R?

I am trying to get a spearman correlation for two scatter plots with confidence intervals on the same graph but it is proving difficult to do.
Given the following data:
spentWithTool <- sample(1:7, 20, replace = TRUE)
understoodWithTool <- sample(1:5, 20, replace = TRUE)
spentWithoutTool <- sample(1:4, 10, replace = TRUE)
understoodWithoutTool <- sample(1:5, 10, replace = TRUE)
This is the best workaround I came up with but it is not what I want - I want the withTool and the withoutTool plots to be on the same graph and NOT side by side.
plot_with <- ggscatter(data = data.frame(spentWithTool, understoodWithTool),
x = 'spentWithTool',
y = 'understoodWithTool',
color = 'darkred',
cor.method = "spearman",
cor.coef = TRUE,
conf.int = TRUE,
add = "reg.line",
add.params = list(color = 'red'),
)
plot_without <- ggscatter(data = data.frame(spentWithoutTool, understoodWithoutTool),
x = 'spentWithoutTool',
y = 'understoodWithoutTool',
color = 'darkblue',
cor.method = "spearman",
cor.coef = TRUE,
conf.int = TRUE,
add = "reg.line",
add.params = list(color = 'blue')
ggarrange(plotlist = list(plot_with, plot_without))
The above gives me:
Using ggpar(p = list(plot_with, plot_without)) just created two separate graphs so it was not helpful.
I am trying to get two lines (red and blue), where each line represents withTool and withoutTool respectively and the x and y axis represent timeSpent and understood respectively. Ideally, the confidence intervals, would be the same colour as what the line represents (red for withTool and blue for withoutTool).
Is there a way to get both correlation lines and points on the same graph?
Aside - Is there a way to use ggscatter without creating a data frame from vectors?
After checking out the docs and trying several options using the color and ggp arguments of ggscatter IMHO the easiest and less time-consuming option to achieve your desired result would be to build your plot from scratch using ggplot2 with some support from ggpubr to add the regression equations and the theme:
set.seed(1)
spentWithTool <- sample(1:7, 20, replace = TRUE)
understoodWithTool <- sample(1:5, 20, replace = TRUE)
spentWithoutTool <- sample(1:4, 10, replace = TRUE)
understoodWithoutTool <- sample(1:5, 10, replace = TRUE)
library(ggplot2)
library(ggpubr)
df <- rbind.data.frame(
data.frame(x = spentWithTool, y = understoodWithTool, id = "with"),
data.frame(x = spentWithoutTool, y = understoodWithoutTool, id = "without")
)
ggplot(df, aes(x, y, color = id, fill = id)) +
geom_point() +
geom_smooth(method = "lm") +
stat_cor(method = "spearman") +
scale_color_manual(values = c(with = "red", without = "blue"), aesthetics = c("color", "fill")) +
theme_pubr() +
labs(x = "timeSpent", y = "understood")
#> `geom_smooth()` using formula = 'y ~ x'

ggplot2: Add asterisks indicating significance values where significantly different from a set reference value

I am trying to plot electrical conductivity values of water for 10 different geographic districts as 10 separate boxplots on a single plot. I want to add asterisks to each boxplot indicating where values significantly differ from 400 (as opposed to significantly differ from each other, or from the mean of all values). My code currently looks like this:
well.data$ref <- 400
ggboxplot(well.data, x = "District", y = "Electrical_conductivity", color = "District",
add = "jitter", legend = "none") +
geom_boxplot() +
geom_text(aes(label = Sig, y = MaxWidth + 0.2), size = 10,
data = t_tests)+
geom_hline(yintercept=400, linetype="dashed", color = "red")+
stat_compare_means(method = "anova", label.y = 40)+
stat_compare_means(label = "p.signif", method = "t.test",
ref.group = "ref") +
theme(text = element_text(size = 20))
This generates the error:
Warning messages:
1: Computation failed in stat_compare_means():
missing value where TRUE/FALSE needed
2: Removed 10 rows containing missing values (geom_text).
My data looks roughly like this:
set.seed(42) ## for sake of reproducibility
n <- 100
well.data <- data.frame(
District=rep(LETTERS[1:10], n),
Electrical_conductivity=sample(200:500, n, replace=TRUE),
ref=400, n))
I don't know if I understand correctly your question but if you would like to run a comparison of each group against base-mean you can use stat_compare_means(label = "p.signif", method = "t.test",ref.group = ".all.")
Sample code:
#t_tests <- with(well.data, pairwise.t.test(Electrical_conductivity, District, p.adjust.method="bonferroni"))$p.value
#t_tests<-data.frame(t_tests) # this is missing from your sample data
ggboxplot(well.data, x = "District", y = "Electrical_conductivity", color = "District", add = "jitter", legend = "none") +
geom_boxplot() +
#geom_text(aes(label = Sig, y = MaxWidth + 0.2), size = 10, data = t_tests)+ # missing the t_test data
geom_hline(yintercept=400, linetype="dashed", color = "red")+
stat_compare_means(method = "anova", label.y = 40)+
stat_compare_means(method = "t.test", label = "p.signif",
ref.group = ".all.") # .all. formultiple pairwise tests against all (base-mean)
Plot:
Well, your ref.group is on the x-axis so what you could do to compare with "A" by ref.group = "A"
Sample data:
set.seed(42) ## for sake of reproducibility
n <- 100
well.data <- data.frame(
District=rep(LETTERS[1:10], n),
Electrical_conductivity=sample(200:500, n, replace=TRUE)) # I modified your sample data

Is there a way to plot exponential decay curves with different decay constants on one graph?

I've been trying to plot different exponential decay curves on to one graph. Initially I thought this would be rather be easy but it is turning out to be rather frustrating.
What I want to get:
nlsplot(k_data_nls, model = 6, start = c(a= 603.3, b= -0.03812), xlab = "hours", ylab = "copies")
nlsplot(r4, model=6, start=c(a=25.5487,b=-0.5723), xlab = "hours", ylab = "copies")
Here is some additional code for the data:
df4 <- data.frame(hours=c(0,1,3,5,12,24,48,96,168,336,504,720), copies=c(603.3,406,588,393.27,458.47,501.67,767.53,444.13,340.6,298.47,61.42,51.6))
nlsfit(df4, model=6, start=c(a=603.3,b=-0.009955831526))
d4plot <- nlsplot(df4, model=6, start=c(a=603.3,b=-0.009955831526))
r4 <- data.frame(hours=c(0,1,3,5,12,24,48,96,168,336,504,720), copies=c(26,13.44,4.57,3.12,6.89,0.71,0.47,0.47,0,0,0.24,0.48))
nlsLM(copies ~ a*exp(b*hours), data=r4, start=list(a=26,b=-0.65986))
r4plot <- nlsplot(r4, model=6, start=c(a=25.5487,b=-0.5723))
Essentially I want to be able to get both of these plots on one graph. I'm new to R so I'm not too sure where I can go from here. Thank you !
I don't know if this is actually helpful because it's so specific, but this is how I would do it (with ggplot2). First, you need data for the function you want to plot. Take the x for all the values you want to display and apply your function with your coefficients to the data. You need to have data points, not just a function, to plot data.
df_simulated <- data.frame("x" = rep(1:100, 2),
"class"= rep(c("DNA", "RNA"), each = 100))
df_simulated$y <- c(1683.7 * exp(-0.103 * 1:100), # DNA
578.7455 * exp(-0.156 * 1:100)) # RNA
However, since I never used the packages you used, I don't know how to extract the values from the models, so I took the values in your example plot. It's important that the "simulated" values for both groups are within one dataframe, and that you have a column which attributes the points to the respective group (RNA or DNA). At least it's easier if you do it like this. Then you need a data frame with your actual observations for the dots. I invented data again:
df_observed <- data.frame("x" = c(12, 13, 25, 26, 50, 51),
"y" = c(500, 50, 250, 25, 0, 5),
"class" = rep(c("DNA", "RNA"), 3))
Then you can create the plot. With color=class you specify that the data points will be grouped by "class" and will be colored accordingly. ("apple" and "banana" are just dummy words to demonstrate linebreaks)
ggplot() +
geom_line(data = df_simulated, aes(x = x, y = y, color = class), size = 1, linetype = "dashed") +
geom_point(data = df_observed, aes(x = x, y = y, color = class), size = 4, pch = 1) +
annotate("text", x = 50, y = 1250, label = "DNA\napple", color = "tomato", hjust = 0) +
annotate("text", x = 50, y = 750, label ="RNA\nbanana", color = "steelblue", hjust = 0) +
ggtitle(expression(~italic("Styela clava")~"(isolated)")) +
ylab("COI copies per 1ml") +
xlab("Time since removal of organisms (hours)") +
theme_classic() +
theme(legend.position = "none") +
scale_color_manual(values = c("DNA" = "tomato", "RNA" = "steelblue"))
This is the output:
First note that R squared is normally used for linear models and not for nonlinear models so the use of this statistic is suspect here; however, below we show it anyways since it seems that is what was asked for. A different goodness of fit measurement that is often used is residual standard error. If fm is the fitted model from nls then sigma(fm) is the residual standard error. Smaller values are more favorable. summary(fm) also reports this value.
For each of df4 and r4 we use lm to get starting values (taking log of both sides we get a model that is linear in log(a) and b), run nls fits and get the coefficients.
Now plot the points and add the fitted lines and legend. (Note that in setting up the graph we use rbind which assumes that df4 and r4 have the same column names, which they do.)
Note that the data provided in the question is much different than that shown in the question's image.
The code below does not need starting values since it uses lm to get them, runs nls and automatically extracts whatever information is needed for the graph.
1) Classic graphics In this alternative no packages are used.
r2 <- function(fm, digits = 3) {
y <- fitted(fm) + resid(fm)
r2 <- 1 - deviance(fm) / sum((y - mean(y))^2)
if (is.numeric(digits)) r2 <- round(r2, digits)
r2
}
fo <- copies ~ a * exp(b * hours) # formula used in nls
# get nls fitted model and coefficients for df4
co_d0 <- coef(lm(log(copies) ~ hours, df4, subset = copies > 0))
fmd <- nls(fo, df4, start = list(a = exp(co_d0[[1]]), b = co_d0[[2]]))
co_d <- round(coef(fmd), 4)
# get nls fitted model and coefficients for r4
co_r0 <- coef(lm(log(copies) ~ hours, r4, subset = copies > 0))
fmr <- nls(fo, r4, start = list(a = exp(co_r0[[1]]), b = co_r0[[2]]))
co_r <- round(coef(fmr), 4)
both <- rbind(cbind(df4, col = "red"), cbind(r4, col = "blue"))
plot(both[1:2], col = both$col,
xlab = "Time since removal of organisms", ylab = "COI copies per 1ml",
main = "C)" ~ italic("Styela clava") ~ "(isolated)", adj = 0)
lines(fitted(fmd) ~ hours, df4, col = "red", lty = 2)
lines(fitted(fmr) ~ hours, r4, col = "blue", lty = 2)
legend <- c(bquote(DNA),
bquote(y == .(co_d[[1]]) * e ^ {.(co_d[[2]])*x}),
bquote(R^2 == .(r2(fmd))),
bquote(),
bquote(RNA),
bquote(y == .(co_r[[1]]) * e ^ {.(co_r[[2]])*x}),
bquote(R^2 == .(r2(fmr))))
legend("right", legend = as.expression(legend), bty = "n",
text.col = c("red", "red", "red", NA, "blue", "blue", "blue"))
2) ggplot2 This uses ggplot2 and gridtext. r2, fmd, fmr, co_d and co_r are all taken from (1). We use richtest_grob from gridtext to create a custom grob for the legend and pass it using annotate_custom.
library(gridtext)
library(ggplot2)
txt <- sprintf(
"<span style='color:red'>DNA
<br>y = %.3f*e<sup>%.3fx</sup>
<br>R<sup>2</sup> = %.3f</span>
<br><br><span style='color:blue'>RNA
<br>y = %.3f*e<sup>%.3fx</sup>
<br>R<sup>2</sup> = %.3f</span>",
co_d[[1]], co_d[[2]], r2(fmd), co_r[[1]], co_r[[2]], r2(fmr))
both2 <- rbind(cbind(df4, col = "red", fitted = fitted(fmd)),
cbind(r4, col = "blue", fitted = fitted(fmr)))
ggplot(both2, aes(hours, copies, col = I(col))) +
geom_point() +
geom_line(aes(y = fitted), linetype = 2) +
annotation_custom(richtext_grob(txt, hjust = 0)) +
theme(legend.position = "none") +
labs(x = "Time since removal of organisms", y = "COI copies per 1ml") +
ggtitle(("C)" ~ italic("Styela clava") ~ "(isolated)"))
3) lattice
This uses legend from (1) and both2 from (2). First create a plot for the data points. It will also contain the legend, axes and labels. Then add a layer for the fitted lines. main.settings specifies that the main title should be left justified and bold and is adapted from this page.
library(latticeExtra)
main.settings <- list(par.main.text = list(font = 2, just = "left",
x = grid::unit(25, "mm")))
xyplot(copies ~ hours, both2, col = both2$col,
xlab = "Time since removal of organisms", ylab = "COI copies per 1ml",
main = "C)" ~ italic("Styela clava") ~ "(isolated)", adj = 0,
key = list(text = list(as.expression(legend),
col = c("red", "red", "red", NA, "blue", "blue", "blue")),
x = 0.65, y = 0.65, columns = 1),
par.settings = main.settings) +
as.layer(xyplot(fitted ~ hours, both2, groups = col, type = "l", lty = 2))

ggplot: Extend regression line to predicted value with different linetype

Is there a simple way to extend a dotted line from the end of a solid regression line to a predicted value?
Below is my basic attempt at it:
x = rnorm(10)
y = 5 + x + rnorm(10,0,0.4)
my_lm <- lm(y~x)
summary(my_lm)
my_intercept <- my_lm$coef[1]
my_slope <- my_lm$coef[2]
my_pred = predict(my_lm,data.frame(x = (max(x)+1)))
ggdf <- data.frame( x = c(x,max(x)+1), y = c(y,my_pred), obs_Or_Pred = c(rep("Obs",10),"Pred") )
ggplot(ggdf, aes(x = x, y = y, group = obs_Or_Pred ) ) +
geom_point( size = 3, aes(colour = obs_Or_Pred) ) +
geom_abline( intercept = my_intercept, slope = my_slope, aes( linetype = obs_Or_Pred ) )
This doesn't give the output I'd hoped to see. I've looked at some other answers on SO and haven't seen anything simple.The best I've come up with is:
ggdf2 <- data.frame( x = c(x,max(x),max(x)+12), y = c(y,my_intercept+max(x)*my_slope,my_pred), obs_Or_Pred = c(rep("Obs",8),"Pred","Pred"), show_Data_Point = c(rep(TRUE,8),FALSE,TRUE) )
ggplot(ggdf2, aes(x = x, y = y, group = obs_Or_Pred ) ) +
geom_point( data = ggdf2[ggdf2[,"show_Data_Point"],] ,size = 3, aes(colour = obs_Or_Pred) ) +
geom_smooth( method = "lm", se=F, aes(colour = obs_Or_Pred, linetype=obs_Or_Pred) )
This gives output which is correct, but I have had to include an extra column specifying whether or not I want to show the data points. If I don't, I end up with the second of these two plots, which has an extra point at the end of the fitted regression line:
Is there a simpler way to tell ggplot to predict a single point out from the linear model and draw a dashed line to it?
You can plot the points using only your actual data and build a prediction data frame to add the lines. Note that max(x) appears twice so that it can be an endpoint of both the Obs line and the Pred line. We also use a shape aesthetic so that we can remove the point marker that would otherwise appear in the legend key for Pred.
# Build prediction data frame
pred_x = c(min(x),rep(max(x),2),max(x)+1)
pred_lines = data.frame(x=pred_x,
y=predict(my_lm, data.frame(x=pred_x)),
obs_Or_Pred=rep(c("Obs","Pred"), each=2))
ggplot(pred_lines, aes(x, y, colour=obs_Or_Pred, shape=obs_Or_Pred, linetype=obs_Or_Pred)) +
geom_point(data=data.frame(x,y, obs_Or_Pred="Obs"), size=3) +
geom_line(size=1) +
scale_shape_manual(values=c(16,NA)) +
theme_bw()
Semi-ugly: You can use scale_x_continuous(limits = to set the range of x values used for prediction. Plot the predicted line first with fullrange = TRUE, then add the 'observed' line on top. Note that the overplotting isn't rendered perfectly, and you may want to increase the size of the observed line slightly.
ggplot(d, aes(x, y)) +
geom_point(aes(color = "obs")) +
geom_smooth(aes(color = "pred", linetype = "pred"), se = FALSE, method = "lm",
fullrange = TRUE) +
geom_smooth(aes(color = "obs", linetype = "obs"), size = 1.05, se = FALSE, method = "lm") +
scale_linetype_discrete(name = "obs_or_pred") +
scale_color_discrete(name = "obs_or_pred") +
scale_x_continuous(limits = c(NA, max(x) + 1))
However, I tend to agree with Gregor: "ggplot is a plotting package, not a modeling package".

Draw vegan graph on ggplot

I am fairly new to vegan and ggplot, I have drawn a species diversity plot in vegan. Ggplot has better graph so I was wondering if these codes could be modified to ggplot code.
Any help would be greatly appreciated. I am using bray in vegan.
library(vegan)
library(mass)
data <- read.table("data.txt", header = T)
attach(data)
rownames(data) <- c("TCI1", "TCI2", "TCI3", "TCII1", "TCII2", "TCII3", "TCIII1", "TCIII2", "TCIII3", "TCIV1", "TCIV2", "TCIV3",
"NCI1", "NCI2", "NCI3", "NCII1", "NCII2", "NCII3", "NCIII1", "NCIII2", "NCIII3", "NCIV1", "NCIV2", "NCIV3","TFI1", "TFI2", "TFI3", "TFII1", "TFII2", "TFII3", "TFIII1", "TFIII2", "TFIII3", "TFIV1", "TFIV2", "TFIV3",
"NFI1", "NFI2", "NFI3", "NFII1", "NFII2", "NFII3", "NFIII1", "NFIII2", "NFIII3", "NFIV1", "NFIV2", "NFIV3")
bcdist <- vegdist(data, "bray")
bcmds <- isoMDS(bcdist, k = 2)
plot(bcmds$points, type = "n", xlab = "", ylab = "")
text(bcmds$points, dimnames(data)[[1]])
You can indeed create a plot that looks like the imgur image. First I created some made-up data for your weeds. Then I called ggplot2 and put the weed names at the points, but made the points transparent.
x <- seq(from = -1, to = 1, .025)
df <- data.frame(valuesX = sample(x, size = 48, replace = TRUE),
valuesY = sample(x, size = 48, replace = TRUE),
seeds = c("TCI1", "TCI2", "TCI3", "TCII1", "TCII2", "TCII3", "TCIII1", "TCIII2", "TCIII3", "TCIV1", "TCIV2", "TCIV3",
"NCI1", "NCI2", "NCI3", "NCII1", "NCII2", "NCII3", "NCIII1", "NCIII2", "NCIII3", "NCIV1", "NCIV2", "NCIV3","TFI1", "TFI2", "TFI3", "TFII1", "TFII2", "TFII3", "TFIII1", "TFIII2", "TFIII3", "TFIV1", "TFIV2", "TFIV3",
"NFI1", "NFI2", "NFI3", "NFII1", "NFII2", "NFII3", "NFIII1", "NFIII2", "NFIII3", "NFIV1", "NFIV2", "NFIV3")
)
ggplot(df, aes(x = valuesX, y = valuesY)) +
geom_point(colour = "transparent") +
geom_text(data = df, aes(label = seeds), hjust = 1.5) +
theme_bw() +
labs(x = "Your axis label", y = "", title = "Weed Distribution") +
theme(axis.ticks= element_blank()) +
theme(plot.title = element_text(face = "bold", size = 12))
You can adjust all the elements of the plot as you see fit.

Resources