Customizing a vegan pca plot with ggplot2 - r

I'm trying to make a custom plot of some vegan rda results in ggplot2. I'm essentially modifying directions as seen in Plotting RDA (vegan) in ggplot, so that I am using shape and color labels to convey some information about the sample points.
I set up a pca analysis with vegan as follows
library(vegan)
library(dplyr)
library(tibble)
library(ggplot2)
cbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#0072B2", "#D55E00", "#CC79A7", "#F0E442")
data(dune)
data(dune.env)
dune.pca <- rda(dune)
uscores <- data.frame(dune.pca$CA$u)
uscores1 <- inner_join(rownames_to_column(dune.env), rownames_to_column(data.frame(uscores)), type = "right", by = "rowname")
vscores <- data.frame(dune.pca$CA$v)
I can make a simple biplot
biplot(dune.pca)
Now, lets say I want to know more about the management conditions to which these different samples were subject. I'll color and shape code them and plot with ggplot.
p1 <- ggplot(uscores1, aes(x = PC1, y = PC2, col = Management,
shape = Management)) +
geom_point() +
scale_color_manual(values=cbPalette) +
scale_fill_manual(values=cbPalette) +
scale_shape_manual(values = c(21:25)) +
theme_bw() +
theme(strip.text.y = element_text(angle = 0))
p1
Next, I'd really like to add some biplot arrows that show us the axes corresponding to species abundance. I can use ggplot to plot just those arrows as follows:
p2 <- ggplot() + geom_text(data = vscores, aes(x = PC1, y = PC2, label = rownames(vscores)), col = 'red') +
geom_segment(data = vscores, aes(x = 0, y = 0, xend = PC1, yend = PC2), arrow=arrow(length=unit(0.2,"cm")),
alpha = 0.75, color = 'darkred')
p2
What I'd really like to do though, is get those arrows and points on the same plot. Currently this is the code that I am trying to use:
p3 <- p1 + geom_text(data = vscores, aes(x = PC1, y = PC2, label = rownames(vscores)), col = 'red') +
geom_segment(data = vscores, aes(x = 0, y = 0, xend = PC1, yend = PC2), arrow=arrow(length=unit(0.2,"cm")),
alpha = 0.75, color = 'darkred')
p3
To my annoyance, this yields only a blank plot (empty window, no error messages). Clearly I am missing something or scaling something incorrectly. Any suggestions about how I can best superimpose the last two plots?

Check ggvegan package from github. It is still in 0.0 versions, and not actively developed at the moment, but if you say
library(ggvegan)
autoplot(dune.pca) # your result object
You get this graph which you can customize in the usual ggplot2 way with various aesthetics.

Try:
library(cowplot) #not needed I just had it attached while answering the question hence the theme.
library(ggplot2)
ggplot(uscores1) +
geom_point(aes(x = PC1, y = PC2, col = Management,
shape = Management)) +
scale_color_manual(values=cbPalette) +
scale_fill_manual(values=cbPalette) +
scale_shape_manual(values = c(21:25)) +
geom_text(data = vscores, aes(x = PC1, y = PC2, label = rownames(vscores)), col = 'red') +
geom_segment(data = vscores, aes(x = 0, y = 0, xend = PC1, yend = PC2), arrow=arrow(length=unit(0.2,"cm")),
alpha = 0.75, color = 'darkred')+
theme_bw() +
theme(strip.text.y = element_text(angle = 0))
The p1 plot was passing col and shape variable Management to geom_text/geom_segment since they were not defined there but there is no Management column in data = vscores. At least I think so based on the error:
`Error in eval(expr, envir, enclos) : object 'Management' not found`

You should also take a look at ggordiplots on GitHub (https://github.com/jfq3/ggordiplots). It includes the function gg_env__fit which fits environmental vectors to an ordination plot. All of the functions inthe package silently return data frames that you may use to modify the plots anyway you wish. The package includes a vignette on modifying the plots. You can read the vignettes without having to install the package by going to john-quensen.com and looking at the GitHub page.

Related

ggplot line legend disappears with alpha < 1

When trying to plot some data in ggplot2 using geom_line(), I noticed that the legend items become empty if I use alpha < 1. How can I fix this and why is this happening?
# dummy data
data <- data.frame(
x = rep(1:10, 10),
y = 1:100 + c(runif(50,0,5), runif(50,0,10)),
grp = c(rep("A", 50), rep("B", 50)))
# using alpha on defaul = 1
ggplot(data, aes(x = x, y = y, col = grp)) +
geom_line()
When I plot the same graph, but with alpha < 1, the lines in the legend completely disappear:
# using alpha < 1
ggplot(data, aes(x = x, y = y, col = grp)) +
geom_line(alpha = 0.9)
(versions: R 4.1.3, ggplot2 3.3.5)
Edit: Updating R and restarting RStudio did not help. This also occurs when using R directly without RStudio.
I ran into the same problem. When saving the plots to PDF/PNG the lines do appear in the legend.
Another workaround I found is adding geom_point() so that way at least you have the colors in the legend:
ggplot(data, aes(x = x, y = y, col = grp)) +
geom_line(alpha = 0.4) +
geom_point(alpha = 0.4, size = 0.1) +
guides(colour = guide_legend(override.aes = list(size=4)))
Legend take the same aes() than plot, you can override this by override.aes.
This should work
ggplot(data, aes(x = x, y = y, col = grp)) +
geom_line(alpha = 0.2) + # using alpha = 0.2 to have it more evident
guides(col = guide_legend(override.aes = list(alpha = 1)))
The same can be used for example to change shape or color of legend elements, respect to aes() mapping in plot

How to assign colors to multicolor scatter plot with multicolor fitted lines in ggplot2

Problem
I have some data points stored in data.frame with three variables, x, y, and gender. My goal is to draw several generally fitted lines and also lines specifically fitted for male/female over the scatter plot, with points coloured by gender. It sounds easy but some issues just persist.
What I currently do is to use a new set of x's and predict y's for every model, combine the fitted lines together in a data.frame, and then convert wide to long, with their model name as the third var (from this post: ggplot2: how to add the legend for a line added to a scatter plot? and this: Add legend to ggplot2 line plot I learnt that mapping should be used instead of setting colours/legends separately). However, while I can get a multicolor line plot, the points come without specific colour for gender (already a factor) as I expected from the posts I referenced.
I also know it might be possible to use aes=(y=predict(model)), but I met other problems for this. I also tried to colour the points directly in aes, and assign colours separately for each line, but the legend cannot be generated unless I use lty, which makes legend in the same colour.
Would appreciate any idea, and also welcome to change the whole method.
Code
Note that two pairs of lines overlap. So it only appeared to be two lines. I guess adding some jitter in the data might make it look differently.
slrmen<-lm(tc~x+I(x^2),data=data[data['gender']==0,])
slrwomen<-lm(tc~x+I(x^2),data=data[data['gender']==1,])
prdf <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(1,100)))
prdm <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(0,100)))
prdf$fit <- predict(fullmodel, newdata = prdf)
prdm$fit <- predict(fullmodel, newdata = prdm)
rawplotdata<-data.frame(x=prdf$x, fullf=prdf$fit, fullm=prdm$fit,
linf=predict(slrwomen, newdata = prdf),
linm=predict(slrmen, newdata = prdm))
plotdata<-reshape2::melt(rawplotdata,id.vars="x",
measure.vars=c("fullf","fullm","linf","linm"),
variable.name="fitmethod", value.name="y")
plotdata$fitmethod<-as.factor(plotdata$fitmethod)
plt <- ggplot() +
geom_line(data = plotdata, aes(x = x, y = y, group = fitmethod,
colour=fitmethod)) +
scale_colour_manual(name = "Fit Methods",
values = c("fullf" = "lightskyblue",
"linf" = "cornflowerblue",
"fullm"="darkseagreen", "linm" = "olivedrab")) +
geom_point(data = data, aes(x = x, y = y, fill = gender)) +
scale_fill_manual(values=c("blue","green")) ## This does not work as I expected...
show(plt)
Code for another method (omitted two lines), which generates same-colour legend and multi-color plot:
ggplot(data = prdf, aes(x = x, y = fit)) + # prdf and prdm are just data frames containing the x's and fitted values for different models
geom_line(aes(lty="Female"),colour = "chocolate") +
geom_line(data = prdm, aes(x = x, y = fit, lty="Male"), colour = "darkblue") +
geom_point(data = data, aes(x = x, y = y, colour = gender)) +
scale_colour_discrete(name="Gender", breaks=c(0,1),
labels=c("Male","Female"))
This is related to using the colour aesthetic for lines and the fill aesthetics for points in your own (first) example. In the second example, it works because the colour aesthetic is used for lines and points.
By default, geom_point can not map a variable to fill, because the default point shape (19) doesn't have a fill.
For fill to work on points, you have to specify shape = 21:25 in geom_point(), outside of aes().
Perhaps this small reproducible example helps to illustrate the point:
Simulate data
set.seed(4821)
x1 <- rnorm(100, mean = 5)
set.seed(4821)
x2 <- rnorm(100, mean = 6)
data <- data.frame(x = rep(seq(20,80,length.out = 100),2),
tc = c(x1, x2),
gender = factor(c(rep("Female", 100), rep("Male", 100))))
Fit models
slrmen <-lm(tc~x+I(x^2), data = data[data["gender"]=="Male",])
slrwomen <-lm(tc~x+I(x^2),data = data[data["gender"]=="Female",])
newdat <- data.frame(x = seq(20,80,length.out = 200))
fitted.male <- data.frame(x = newdat,
gender = "Male",
tc = predict(object = slrmen, newdata = newdat))
fitted.female <- data.frame(x = newdat,
gender = "Female",
tc = predict(object = slrwomen, newdata = newdat))
Plot using colour aesthetics
Use the colour aesthetics for both points and lines (specify in ggplot such that it gets inherited throughout). By default, geom_point can map a variable to colour.
library(ggplot2)
ggplot(data, aes(x = x, y = tc, colour = gender)) +
geom_point() +
geom_line(data = fitted.male) +
geom_line(data = fitted.female) +
scale_colour_manual(values = c("tomato","blue")) +
theme_bw()
Plot using colour and fill aesthetics
Use the fill aesthetics for points and the colour aesthetics for lines (specify aesthetics in geom_* to prevent them being inherited). This will reproduce the problem.
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender)) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
To fix this, change the shape argument in geom_point to a point shape that can be filled (21:25).
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender), shape = 21) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
Created on 2021-09-19 by the reprex package (v2.0.1)
Note that the scales for colour and fill get merged automatically if the same variable is mapped to both aesthetics.
It seems to me that what you really want to do is use ggplot2::stat_smooth instead of trying to predict yourself.
Borrowing the data from #scrameri:
ggplot(data, aes(x = x, y = tc, color = gender)) +
geom_point() +
stat_smooth(aes(linetype = "X^2"), method = 'lm',formula = y~x + I(x^2)) +
stat_smooth(aes(linetype = "X^3"), method = 'lm',formula = y~x + I(x^2) + I(x^3)) +
scale_color_manual(values = c("darkseagreen","lightskyblue"))

add strip plot to bottom of geom_density

I would like to add a strip plot to the bottom of a geom_density plot... I could do something like :
ggplot(data = iris, aes(x = Sepal.Length)) +
geom_density(fill = "#2D708EFF", alpha = .2) +
geom_point(aes(y = 0), alpha = .4, shape = 73, size = 6)
But is there a more elegant way of doing this with ggplot2? My keywords might be off, but so far I haven't been able to find another ggplot2 solution.
You must be looking for geom_rug()
library(ggplot2)
ggplot(data = iris, aes(x = Sepal.Length)) +
geom_density(fill = "#2D708EFF", alpha = .2) +
# geom_point(aes(y = 0), alpha = .4, shape = 73, size = 6) +
geom_rug()
A rug plot is a compact visualisation designed to supplement a 2d display with the two 1d marginal distributions. Rug plots display individual cases so are best used with smaller datasets.

ggplot outline jitter datapoints

I'm trying to create a scatterplot where the points are jittered (geom_jitter), but I also want to create a black outline around each point. Currently I'm doing it by adding 2 geom_jitters, one for the fill and one for the outline:
beta <- paste("beta == ", "0.15")
ggplot(aes(x=xVar, y = yVar), data = data) +
geom_jitter(size=3, alpha=0.6, colour=my.cols[2]) +
theme_bw() +
geom_abline(intercept = 0.0, slope = 0.145950, size=1) +
geom_vline(xintercept = 0, linetype = "dashed") +
annotate("text", x = 2.5, y = 0.2, label=beta, parse=TRUE, size=5)+
xlim(-1.5,4) +
ylim(-2,2)+
geom_jitter(shape = 1,size = 3,colour = "black")
However, that results in something like this:
Because jitter randomly offsets the data, the 2 geom_jitters are not in line with each other. How do I ensure the outlines are in the same place as the fill points?
I've see threads about this (e.g. Is it possible to jitter two ggplot geoms in the same way?), but they're pretty old and not sure if anything new has been added to ggplot that would solve this issue
The code above works if, instead of using geom_jitter, I use the regular geom_point, but I have too many overlapping points for that to be useful
EDIT:
The solution in the posted answer works. However, it doesn't quite cooperate for some of my other graphs where I'm binning by some other variable and using that to plot different colours:
ggplot(aes(x=xVar, y = yVar, color=group), data = data) +
geom_jitter(size=3, alpha=0.6, shape=21, fill="skyblue") +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_colour_brewer(name = "Title", direction = -1, palette = "Set1") +
xlim(-1.5,4) +
ylim(-2,2)
My group variable has 3 levels, and I want to colour each group level by a different colour in the brewer Set1 palette. The current solution just colours everything skyblue. What should I fill by to ensure I'm using the correct colour palette?
You don't actually have to use two layers; you can just use the fill aesthetic of a plotting character with a hole in it:
# some random data
set.seed(47)
df <- data.frame(x = rnorm(100), y = runif(100))
ggplot(aes(x = x, y = y), data = df) + geom_jitter(shape = 21, fill = 'skyblue')
The colour, size, and stroke aesthetics let you customize the exact look.
Edit:
For grouped data, set the fill aesthetic to the grouping variable, and use scale_fill_* functions to set color scales:
# more random data
set.seed(47)
df <- data.frame(x = runif(100), y = rnorm(100), group = sample(letters[1:3], 100, replace = TRUE))
ggplot(aes(x=x, y = y, fill=group), data = df) +
geom_jitter(size=3, alpha=0.6, shape=21) +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_fill_brewer(name = "Title", direction = -1, palette = "Set1")

Sensible and easy way of placing labels in ggplot with use of geom_dl and geom_point

I'm using the code below to generate a simple chart.
# Data import -------------------------------------------------------------
data(mtcars)
mtcars$model <- rownames(mtcars)
# Graph: Income Broadband -------------------------------------------------
# Lib.
require(ggplot2); require(directlabels)
# Graph definition
ggplot(data = mtcars, aes(x = mpg, y = disp)) +
geom_point(shape = 1, colour = "black", size = 3, fill = "black") +
geom_smooth(method = lm, se = TRUE, fullrange = TRUE) +
geom_dl(aes(label = model), list("smart.grid", cex = 0.5, hjust = -.5)) +
xlab("MPG") +
ylab("DISP") +
theme_bw()
As illustrated below, the labels on the chart are placed far away from the points. I would like to amend this and place the point labels closer to the points on the graph. Naturally, for the sake of readability I would like for the labels not overlap. In addition, I would like for the solution to be easy to reproduce as I will have to apply across a number of charts. mlabvpos in Stata, as discussed here, provides some of those functionalities. I'm looking for a similar solution in R.
Edit
Following the comments, it appears the problem is not associated with the hjust settings. For instance, for the code:
# Graph definition
ggplot(data = mtcars, aes(x = mpg, y = disp)) +
geom_point(shape = 1, colour = "black", size = 3, fill = "black") +
geom_smooth(method = lm, se = TRUE, fullrange = TRUE) +
geom_dl(aes(label = model), list("smart.grid", cex = 0.5, hjust = -.001)) +
xlab("MPG") +
ylab("DISP") +
theme_bw()
The labels are still misplaced:
On the same lines, running the code with no hjust settings does not place the labels in a more sensible manner:

Resources