How to plot vectors from capscale analysis in ggplot2? - r

I have generated a capscale model of presence-absence community composition data using capscale in vegan. I am able to use the following code to produce a plot using base R
(SOR.capscale <- capscale(SOR ~ df.meta.clin.week.snake$copy.num + df.meta.clin.week.snake$exp.time, df.otus.clin.week.snake.nonzero)) #specify model
plot(SOR.capscale, main = "Capscale Analysis", xlab = "CAP1", ylab = "CAP2")
points(SOR.capscale, col = expgroup)
ordihull(SOR.capscale, expgroup, col = c("black", "red"), label = FALSE, display = "sites")
This generates the following visualization
I have used the following code to extract the CAP values from the capscale model to plot them in ggplot
x <- as.data.frame(scores(SOR.capscale, display = "sites"))
df.pred.clin.week.snake$CAP1 <- x$CAP1
df.pred.clin.week.snake$CAP2 <- x$CAP2
I have used this data to generate a plot in ggplot with the listed code
ggplot(df.pred.clin.week.snake, aes(x= CAP1, y= CAP2, color = expgroup)) +
stat_ellipse(aes(fill = expgroup), geom = "polygon", alpha = 0.2) +
geom_point() +
theme_classic() +
coord_cartesian(xlim=c(-20, 20), ylim=c(-6.5, 15)) +
geom_hline(yintercept = 0, linetype="dotted") +
geom_vline(xintercept = 0, linetype="dotted") +
labs(color = "Experimental Treatment", fill = "Experimental Treatment") +
ggtitle("Capscale Analysis") +
theme(plot.title = element_text(hjust = 0.5))
ylab("CAP2") +
xlab("CAP1")
My question is how would I extract the vector information from the capscale analysis model in such a way that I am then be able to plot them using ggplot? Thank you greatly!

In capscale the information is there to tell you where the arrows come from , generally SOR.capscale$CCA$biplot - whatever those numbers are you can plug them in:
plot + geom_segment(aes(x = 0.0, y =0.0,xend = 10.9 , yend = -0.01), arrow =arrow())+ geom_segment(aes(x = 0.0, y =0.0, xend = 0.01, yend = 10.1), arrow =arrow())
or whatever your actual numbers are.

Related

How to add OR and 95% CI as text into a forest plot?

I want to make a forest plot by subgroup, exactly as 'stupidWolf' suggests in his answer to this question in this link (if you scroll down to the last question, you would be able to see the forest plot by subgroups) Forest plot with subgroups in GGPlot2
But I want to add the OR and the 95% CI to that plot. Does anyone have a code that can add that information into in the forest plot?
I am able to make a forest plot by sub group in a normal ggplot. That is not a problem. But I can't figure out how to add the OR and 95% CI into the forest plot. should I use something else than ggplot? I hope someone can help! I have no clue where to begin with. Thanks in advance!
Using the data in the linked question, we can add the odds ratio and confidence intervals like this:
ggplot(df, aes(x = Outcome, y = OR, ymin = Lower, ymax = Upper,
col = group, fill = group)) +
geom_linerange(linewidth = 5, position = position_dodge(width = 0.5)) +
geom_hline(yintercept = 1, lty = 2) +
geom_point(size = 3, shape = 21, colour = "white", stroke = 0.5,
position = position_dodge(width = 0.5)) +
geom_text(aes(y = 3.75, group = group,
label = paste0("OR ", round(OR, 2), ", (", round(Lower, 2),
" - ", round(Upper, 2), ")")), hjust = 0,
position = position_dodge(width = 0.5), color = "black") +
scale_fill_manual(values = barCOLS) +
scale_color_manual(values = dotCOLS) +
scale_x_discrete(name = "(Post)operative outcomes") +
scale_y_continuous(name = "Odds ratio", limits = c(0.5, 5)) +
coord_flip() +
theme_minimal()

Why is fullrange=TRUE not working for geom_smooth in ggplot2?

I have a plot where I am plotting both the linear regressions for each level of a variable as well as the linear regression for the total sample.
library(ggplot2);library(curl)
df<-read.csv(curl("https://raw.githubusercontent.com/megaraptor1/mydata/main/example.csv"))df$group<-as.factor(df$group)
ggplot(df,aes(x,y))+
geom_point(size=2.5,shape=21,aes(fill=group),col="black")+
geom_smooth(formula=y~x,aes(col=group,group=group),method="lm",size=1,se=F)+
geom_smooth(formula=y~x,method="lm",col="black",size=1,fullrange=T,se=F)+
theme_classic()+
theme(legend.position = "none")
I am trying to extend the black line (which represents all specimens) to span the full range of the axes using the command fullrange=T. However, I have found the command fullrange=T is not working on this graph regardless of what I try. This is especially strange as I have not called any limits for the graph or set any additional global factors.
This question was the closest I was able to find to my current problem, but it does not appear to be describing the same issue because that issue had to do with how the limits of the graph were called.
This seems a bit heavy handed but allows you to extent your regression line to whatever limits you choose for the x axis.
The argument fullrange is not really documented very helpfully. If you have a look at http://www.mosaic-web.org/ggformula/reference/gf_smooth.html it appears that "fullrange" applies to the points in the dataframe that is used to generate the regression line. So in your case your regression line is extending to the "fullrange". It's just that your definition of "fullrange" is not quite the same as that used by geom_smooth.
library(ggplot2)
library(dplyr)
library(curl)
lm_formula <- lm(formula = y~x, data = df)
f_lm <- function(x){lm_formula$coefficients[1] + lm_formula$coefficients[2] * x}
df_lim <-
data.frame(x = c(0, 5)) %>%
mutate(y = f_lm(x))
ggplot(df,aes(x,y))+
geom_point(size=2.5,shape=21,aes(fill=group),col="black")+
geom_smooth(formula=y~x,aes(col=group,group=group),method="lm",size=1,se=F)+
geom_line(data = df_lim)+
coord_cartesian(xlim = df_lim$x, ylim = df_lim$y, expand = expansion(mult = 0))+
theme_classic()+
theme(legend.position = "none")
data
df<-read.csv(curl("https://raw.githubusercontent.com/megaraptor1/mydata/main/example.csv"))
df$group<-as.factor(df$group)
Created on 2021-04-05 by the reprex package (v1.0.0)
I had the same issue. Despite setting fullrange = TRUE, the line of best fit was only being drawn in the data range.
ggplot(data = df, aes(x = diameter, y = height)) +
geom_point(size = 2) +
geom_smooth(method = lm, se = FALSE, fullrange = TRUE) +
labs(x = "Diameter", y = "Height", title = "Tree Height vs. Diameter") +
theme(plot.title = element_text(hjust = 0.5, size = 15, face = 'bold'))
Bad plot: 1
Using scale_x_continuous() and scale_y_continuous() worked for me (thank you #markus). I added two lines of code, below geom_smooth(), to fix the issue.
ggplot(data = df, aes(x = diameter, y = height)) +
geom_point(size = 2) +
geom_smooth(method = lm, se = FALSE, fullrange = TRUE) +
scale_x_continuous(expand = c(0,0), limits=c(5, 32)) + #expand = c(num1,num2) => line of best fit stops being drawn at x = 32 + (32 - 5)*num1 + num2 = 32 + (32 - 5)*0 + 0 = 32
scale_y_continuous(expand = c(0,0), limits=c(7, 25)) + #expand = c(num1,num2) => line of best fit stops being drawn at y = 25 + (25 - 7)*num1 + num2 = 25 + (25 - 7)*0 + 0 = 25
labs(x = "Diameter", y = "Height", title = "Tree Height vs. Diameter") +
theme(plot.title = element_text(hjust = 0.5, size = 15, face = 'bold'))
Good plot: 2
Source: How does ggplot scale_continuous expand argument work?

Error when using multiple datasets to plot polygon annotation on ggplot2

I am creating a forest plot for a meta-analysis using ggplot2. I want to manually add a skewed diamond shape (asymmetric on the y-scale) to represent an effect size and confidence interval.
I can draw the forest plot and add four segments to create the diamond but this doesn't give a nice clear, sharp diamond. Instead I've used geom_polygon with a set of co-ordinates in a second dataframe. When I try to write to pdf I receive the following error
summarydiamond <- data.frame(
x = c(sleepstress.r.CI.L, sleepstress.r.estimate, sleepstress.r.CI.U, sleepstress.r.estimate, sleepstress.r.CI.L),
y = c(-1, -1.5, -1, -0.5, -1)
)
forest.plot <-
dat.sleepstress %>%
ggplot(aes(x = rev(key.pairing), y = r, ymin = r.CI.lower, ymax = r.CI.upper))+
geom_errorbar(width = 0.5) +
geom_point(aes(size = r.weights)) +
scale_size(range = c(1, 7)) +
geom_hline(yintercept = 0) +
theme_minimal() +
coord_flip() +
theme(legend.position = "none") +
labs(x = "", y = "Correlation coefficient") +
theme(text = element_text(size=14)) +
scale_x_discrete(limits=rev) +
geom_text(aes(label = paste0(format(round(r, 2),nsmall = 2),
" (",
format(round(r.CI.lower, 2),nsmall = 2),
", ",
format(round(r.CI.upper, 2),nsmall = 2),
")"),
y = 0.85),
hjust="inward") +
geom_polygon(aes(x=x, y=y), data = summarydiamond)
pdf(file = 'forestplot.pdf', width = 10, height = 10)
forest.plot
dev.off()
Output:
forest.plot
Error in FUN(X[[i]], ...) : object 'r.CI.lower' not found
I've tried adding the data= argument to all of the geom_ calls but this doesn't fix it.

Plotting legend in ggplot2

I am trying to plot several things in a chart. Points colored by ID, the regression line, a modified regression line and an area where I do not want values to fall in.
I would like to have a legend with the names of the two lines. e.g. blue=Fitted model, red=Worst case scenario and the area, red= Suspect values.
This is the code that I used to create the graph:
ggplot(data, aes(x=log_dilution, y=ct)) +
geom_point(aes(color=ID),show.legend = FALSE) +
geom_smooth(aes(linetype ='Fitted model +95%CI'), method = 'lm',show.legend = TRUE) +
geom_segment(aes(x = 0, xend = 1.7, y = model1$coefficients[1], yend
=model1$coefficients[1] + (model1$coefficients[2]+(1.96*1.030535))*1.7),
color='red', lwd=1, lty=2,show.legend = FALSE) +
theme(axis.text.x= element_text(size=14), axis.title= element_text(size=16), axis.text.y = element_text(size=14)) +
ylim(15, 40) +
annotate('rect',xmin=0, xmax=1.7, ymin=35, ymax=40, alpha=0.2, fill="red") +
scale_size_manual( values = c(1.5, 1.5), labels = c("Fitted model +95%CI", "Worst case")) +
guides(color=FALSE)+ylab("Ct")+theme(legend.position = "bottom")
and this is the result so far.
Can anybody give me some directions on how I can plot a legend for the lines and the area (i am not interested in plotting the points)?

ggplot outline jitter datapoints

I'm trying to create a scatterplot where the points are jittered (geom_jitter), but I also want to create a black outline around each point. Currently I'm doing it by adding 2 geom_jitters, one for the fill and one for the outline:
beta <- paste("beta == ", "0.15")
ggplot(aes(x=xVar, y = yVar), data = data) +
geom_jitter(size=3, alpha=0.6, colour=my.cols[2]) +
theme_bw() +
geom_abline(intercept = 0.0, slope = 0.145950, size=1) +
geom_vline(xintercept = 0, linetype = "dashed") +
annotate("text", x = 2.5, y = 0.2, label=beta, parse=TRUE, size=5)+
xlim(-1.5,4) +
ylim(-2,2)+
geom_jitter(shape = 1,size = 3,colour = "black")
However, that results in something like this:
Because jitter randomly offsets the data, the 2 geom_jitters are not in line with each other. How do I ensure the outlines are in the same place as the fill points?
I've see threads about this (e.g. Is it possible to jitter two ggplot geoms in the same way?), but they're pretty old and not sure if anything new has been added to ggplot that would solve this issue
The code above works if, instead of using geom_jitter, I use the regular geom_point, but I have too many overlapping points for that to be useful
EDIT:
The solution in the posted answer works. However, it doesn't quite cooperate for some of my other graphs where I'm binning by some other variable and using that to plot different colours:
ggplot(aes(x=xVar, y = yVar, color=group), data = data) +
geom_jitter(size=3, alpha=0.6, shape=21, fill="skyblue") +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_colour_brewer(name = "Title", direction = -1, palette = "Set1") +
xlim(-1.5,4) +
ylim(-2,2)
My group variable has 3 levels, and I want to colour each group level by a different colour in the brewer Set1 palette. The current solution just colours everything skyblue. What should I fill by to ensure I'm using the correct colour palette?
You don't actually have to use two layers; you can just use the fill aesthetic of a plotting character with a hole in it:
# some random data
set.seed(47)
df <- data.frame(x = rnorm(100), y = runif(100))
ggplot(aes(x = x, y = y), data = df) + geom_jitter(shape = 21, fill = 'skyblue')
The colour, size, and stroke aesthetics let you customize the exact look.
Edit:
For grouped data, set the fill aesthetic to the grouping variable, and use scale_fill_* functions to set color scales:
# more random data
set.seed(47)
df <- data.frame(x = runif(100), y = rnorm(100), group = sample(letters[1:3], 100, replace = TRUE))
ggplot(aes(x=x, y = y, fill=group), data = df) +
geom_jitter(size=3, alpha=0.6, shape=21) +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_fill_brewer(name = "Title", direction = -1, palette = "Set1")

Resources