DCA : Labelling points with autoplot or ggplot2 - r

I find very difficult to put labels for sites with a DCA in a autoplot or ggplot.
I also want to differentiate the points on the autoplot/ggplot according to their groups.
This is the data and the code I used and it went well until the command for autoplot/ggplot:
library(vegan)
data(dune)
d <- vegdist(dune)
csin <- hclust(d, method = "single")
cl <- cutree(csin, 3)
dune.dca <- decorana(dune)
autoplot(dune.dca)
This is the autoplot obtained:
I am using simple coding and I tried these codes but they didn't led me anywhere:
autoplot(dune.dca, label.size = 3, data = dune, colour = cl)
ggplot(dune.dca(x=DCA1, y=DCA2,colour=cl))
ggplot(dune.dca, display = ‘site’, pch = 16, col = cl)
ggrepel::geom_text_repel(aes(dune.dca))
If anyone has a simple suggestion, it could be great.

With the added information (package) I was able to go and dig a bit deeper.
The problem is (in short) that autoplot.decorana adds the data to the specific layer (either geom_point or geom_text). This is not inherited to other layers, so adding additional layers results in blank pages.
Basically notice that one of the 2 code strings below results in an error, and note the position of the data argument:
# Error:
ggplot() +
geom_point(data = mtcars, mapping = aes_string(x = 'hp', y = 'mpg')) +
geom_label(aes(x = hp, y = mpg, label = cyl))
# Work:
ggplot(data = mtcars) +
geom_point(mapping = aes_string(x = 'hp', y = 'mpg')) +
geom_label(aes(x = hp, y = mpg, label = cyl))
ggvegan:::autoplot.decorana places data as in the example the returns an error.
I see 2 ways to get around this problem:
Extract the layers data using ggplot_build or layer_data and create an overall or single layer mapping.
Extract the code for generating the data, and create our plot manually (not using autoplot).
I honestly think the second is simpler, as we might have to extract more information to make our data sensible. By looking at the source code of ggvegan:::autoplot.decorana (simply printing it to console by leaving out brackets) we can extract the below code which generates the same data as used in the plot
ggvegan_data <- function(object, axes = c(1, 2), layers = c("species", "sites"), ...){
obj <- fortify(object, axes = axes, ...)
obj <- obj[obj$Score %in% layers, , drop = FALSE]
want <- obj$Score %in% c("species", "sites")
obj[want, , drop = FALSE]
}
With this we can then generate any plot that we desire, with appropriate mappings rather than layer-individual mappings
dune.plot.data <- ggvegan_data(dune.dca)
p <- ggplot(data = dune.dca, aes(x = DCA1, DCA2, colour = Score)) +
geom_point() +
geom_text(aes(label = Label), nudge_y = 0.3)
p
Which gives us what I hope is your desired output

Related

How can I manually add labels to multiple ggplot2 mappings created through a for-loop?

I have been working on plotting several lines according to different probability levels and am stuck adding labels to each line to represent the probability level.
Since each curve plotted has varying x and y coordinates, I cannot simply have a large data-frame on which to perform usual ggplot2 functions.
The end goal is to have each line with a label next to it according to the p-level.
What I have tried:
To access the data comfortably, I have created a list df with for example 5 elements, each element containing a nx2 data frame with column 1 the x-coordinates and column 2 the y-coordinates. To plot each curve, I create a for loop where at each iteration (i in 1:5) I extract the x and y coordinates from the list and add the p-level line to the plot by:
plot = plot +
geom_line(data=df[[i]],aes(x=x.coor, y=y.coor),color = vector_of_colors[i])
where vector_of_colors contains varying colors.
I have looked at using ggrepel and its geom_label_repel() or geom_text_repel() functions, but being unfamiliar with ggplot2 I could not get it to work. Below is a simplification of my code so that it may be reproducible. I could not include an image of the actual curves I am trying to add labels to since I do not have 10 reputation.
# CREATION OF DATA
plevel0.5 = cbind(c(0,1),c(0,1))
colnames(plevel0.5) = c("x","y")
plevel0.8 = cbind(c(0.5,3),c(0.5,1.5))
colnames(plevel0.8) = c("x","y")
data = list(data1 = line1,data2 = line2)
# CREATION OF PLOT
plot = ggplot()
for (i in 1:2) {
plot = plot + geom_line(data=data[[i]],mapping=aes(x=x,y=y))
}
Thank you in advance and let me know what needs to be clarified.
EDIT :
I have now attempted the following :
Using bind_rows(), I have created a single dataframe with columns x.coor and y.coor as well as a column called "groups" detailing the p-level of each coordinate.
This is what I have tried:
plot = ggplot(data) +
geom_line(aes(coors.x,coors.y,group=groups,color=groups)) +
geom_text_repel(aes(label=groups))
But it gives me the following error:
geom_text_repel requires the following missing aesthetics: x and y
I do not know how to specify x and y in the correct way since I thought it did this automatically. Any tips?
You approach is probably a bit to complicated. As far as I get it you could of course go on with one dataset and use the group aesthetic to get the same result you are trying to achieve with your for loop and multiple geom_line. To this end I use dplyr:.bind_rows to bind your datasets together. Whether ggrepel is needed depends on your real dataset. In my code below I simply use geom_text to add an label at the rightmost point of each line:
plevel0.5 <- data.frame(x = c(0, 1), y = c(0, 1))
plevel0.8 <- data.frame(x = c(0.5, 3), y = c(0.5, 1.5))
library(dplyr)
library(ggplot2)
data <- list(data1 = plevel0.5, data2 = plevel0.8) |>
bind_rows(.id = "id")
ggplot(data, aes(x = x, y = y, group = id)) +
geom_line(aes(color = id)) +
geom_text(data = ~ group_by(.x, id) |> filter(x %in% max(x)), aes(label = id), vjust = -.5, hjust = .5)

Is there a way to pass the data of a ggplot2 call to the scale_* functions that works with .+gg in one pass [duplicate]

I would like to use a variable of the dataframe passed to the data parameter of function the ggplot in another ggplot2 function in the same call.
For instance, in the following example I want to refer to the variable x in the dataframe passed to the data parameter in ggplot in another function scale_x_continuous such as in:
library(ggplot2)
set.seed(2017)
samp <- sample(x = 20, size= 1000, replace = T)
ggplot(data = data.frame(x = samp), mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(x), max(x)))
And I get the error :
Error in seq(min(x)) : object 'x' not found
which I understand. Of course I can avoid the problem by doing :
df <- data.frame(x = samp)
ggplot(data = df, mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(df$x), max(df$x)))
but I don't want to be forced to define the object df outside the call to ggplot. I want to be able to directly refer to the variables in the dataframe I passed in data.
Thanks a lot
The scale_x_continuous function does not evaluate it's parameters in the data environment. One reason for this is that each layer can have it's own data source so by the time you got to the scales it wouldn't be clear which data environment is the "correct" one any more.
You could write a helper function to initialize the plot with your default. For example
helper <- function(df, col) {
ggplot(data = df, mapping = aes_string(x = col)) +
scale_x_continuous(breaks = seq(min(df[[col]]), max(df[[col]])))
}
and then call
helper(data.frame(x = samp), "x") + geom_bar()
Or you could write a wrapper around just the scale part. For example
scale_x_custom <- function(x) {
scale_x_continuous(breaks = seq(min(x) , max(x)))
}
and then you can add your custom scale to your plot
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom(df$x)
Or since you just want breaks at integer values, you can calculate the breaks from the default limits without needed to actually specify the data. For example
scale_x_custom <- function() {
scale_x_continuous(expand=expansion(0, .3),
breaks = function(x) {
seq(ceiling(min(x)), floor(max(x)))
})
}
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom()
Another less than ideal alternative would be to utilize the . special symbol in combination with {} which is imported from magrittr.
Enclosing the ggplot call in curly brackets allows one to reference . multiple times.
data.frame(x = samp) %>%
{ggplot(data = ., mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(.$x), max(.$x)))}

Encounter a ggplot2's problem. pic1 is good, then pic2 is good,but when review pic1,it gets bad

Recently, I encountered a question in ggplot2 field. It's confused for me that everytime I plot first plot with ggplot names "pic1"(the result of running is okay), and then I plotted second one with ggplot2 called "pic2". Of course, the "pic2" is good. But at this moment, I check "pic1", I found the regression line became a vertical line.For example:
"pic1"
p <- ggplot()
p <- p + geom_line(data = MyData, aes(x = otherCrop, y = eta ))
p <- p+ geom_point(data = dat,aes(x =otherCrop,
y = dat$sumEnemies, colour = YEAR ),position = position_jitter(width = .01),size = 1)
p <- p+labs(colour = "年份\nYear") + theme_classic(base_size=18) +
theme(axis.title.x=element_text( vjust=0))
p=p + theme(text=element_text(family="Times", size=18))
pic1=p
"pic2"
p <- ggplot()
p <- p + geom_line(data = MyData, aes(x = SHDI, y = eta ))
p <- p+ geom_point(data = dat,aes(x = dat$SHDI,
y = eta,colour = YEAR ),position = position_jitter(width = .01),size = 1)
p <- p+labs(colour = "年份\nYear") + theme_classic(base_size=18) +
theme(axis.title.x=element_text( vjust=0))
p=p + theme(text=element_text(family="Times", size=18))
pic2=p
But at this moment, I started to review "pic1", I found it as below:
It became a strange short vertical line. This would be difficult because I cannot plot them in a same paper. Does anybody know what's the problem?
I think this is a great example of why using the dataframe$column syntax inside an aes call is discouraged: it makes your plot vulnerable to subsequent changes in your data. Here's a simple example. Start with a data frame with columns x and y:
library(ggplot2)
df <- data.frame(x = 1:10, y = 1:10)
Now make a ggplot, but instead of using aes(x = x, y = y), we make the mistake of doing aes(x = df$x, y = df$y):
vulnerable_plot <- ggplot()
vulnerable_plot <- vulnerable_plot + geom_line(data = df, aes(x = df$x, y = df$y))
pic1 <- vulnerable_plot
Now we review our plot. Sure, ggplot nags us to say we shouldn't use this syntax, but the plot looks fine, so who cares, right?
pic1
#> Warning: Use of `df$x` is discouraged. Use `x` instead.
#> Warning: Use of `df$y` is discouraged. Use `y` instead.
Now, let's make pic2 identical to pic1 except we use the correct syntax:
invulnerable_plot <- ggplot()
invulnerable_plot <- invulnerable_plot + geom_line(data = df, aes(x = x, y = y))
pic2 <- invulnerable_plot
Now we don't get any warning, but the plot looks the same.
pic2
So there's no difference between pic1 and pic2. Or is there? What happens when we change our data frame?
df$y <- 10:1
vulnerable_plot
Oh dear. Our first plot has changed because the plot object has a reference to an external variable that it relies on to build the plot. That's not what we wanted.
However, with the version where we used the correct syntax, a copy of the data was taken and is kept with the plot data, so it remains unaffected by subsequent changes to df:
invulnerable_plot
Created on 2020-08-23 by the reprex package (v0.3.0)

Refering to a variable of the data frame passed in the 'data' parameter of ggplot function

I would like to use a variable of the dataframe passed to the data parameter of function the ggplot in another ggplot2 function in the same call.
For instance, in the following example I want to refer to the variable x in the dataframe passed to the data parameter in ggplot in another function scale_x_continuous such as in:
library(ggplot2)
set.seed(2017)
samp <- sample(x = 20, size= 1000, replace = T)
ggplot(data = data.frame(x = samp), mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(x), max(x)))
And I get the error :
Error in seq(min(x)) : object 'x' not found
which I understand. Of course I can avoid the problem by doing :
df <- data.frame(x = samp)
ggplot(data = df, mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(df$x), max(df$x)))
but I don't want to be forced to define the object df outside the call to ggplot. I want to be able to directly refer to the variables in the dataframe I passed in data.
Thanks a lot
The scale_x_continuous function does not evaluate it's parameters in the data environment. One reason for this is that each layer can have it's own data source so by the time you got to the scales it wouldn't be clear which data environment is the "correct" one any more.
You could write a helper function to initialize the plot with your default. For example
helper <- function(df, col) {
ggplot(data = df, mapping = aes_string(x = col)) +
scale_x_continuous(breaks = seq(min(df[[col]]), max(df[[col]])))
}
and then call
helper(data.frame(x = samp), "x") + geom_bar()
Or you could write a wrapper around just the scale part. For example
scale_x_custom <- function(x) {
scale_x_continuous(breaks = seq(min(x) , max(x)))
}
and then you can add your custom scale to your plot
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom(df$x)
Or since you just want breaks at integer values, you can calculate the breaks from the default limits without needed to actually specify the data. For example
scale_x_custom <- function() {
scale_x_continuous(expand=expansion(0, .3),
breaks = function(x) {
seq(ceiling(min(x)), floor(max(x)))
})
}
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom()
Another less than ideal alternative would be to utilize the . special symbol in combination with {} which is imported from magrittr.
Enclosing the ggplot call in curly brackets allows one to reference . multiple times.
data.frame(x = samp) %>%
{ggplot(data = ., mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(.$x), max(.$x)))}

ggplot2 - passing dataframe with column names

I looked at other solutions but cannot get a logical within ggplot to work correctly. I have the following function. A dataframe is passed alongwith two
columns to plot as a scatter plot.
scatter_plot2 <- function(df, xaxis, yaxis){
b <- ggplot(data = df, aes_string(xaxis, yaxis), environment = environment())
gtype <- geom_point(aes(alpha = 0.2, color = yaxis > 0))
sm <- geom_smooth(formula = xaxis ~ yaxis, color="black")
b + gtype + sm + theme_bw()
}
which I call using :
scatter_plot2(train_df, "train_df$signal", "train_df$yhat5")
===
The color = yaxis > 0
is intended to plot points above (yaxis) 0 in "green" and ones below in "red". While i'm able to get the string names to correctly display on the axis, I'm not able to get the logical to work correctly.
Please help.
Since you're creating your own function for this, just calculate the needed color ahead of time. Since you're passing in a data frame and the variables, you'll need to use some standard evaluation (you're already doing this using aes_string).
I cleaned up the code a bit, putting the ggplot statement into a single chain,, making some aes calls explicit, and making your smooth formula y~x. You also don't want to use $ when passing in the variables, just pass quoted names.
library(dplyr)
library(ggplot2)
scatter_plot2 <- function(df, xaxis, yaxis){
df <- mutate_(df, color = ~ifelse(yaxis > 0, "green", "red"))
ggplot(data = df, aes_string(x = xaxis, y = yaxis)) +
geom_point(aes(alpha = 0.2, color = color)) +
geom_smooth(formula = y ~ x, color =" black") +
scale_color_identity() +
theme_bw()
}
The call would be (using iris for an example):
scatter_plot2(iris, "Sepal.Width", "Sepal.Length")
resulting in:

Resources