Annotate with ggplot2 when axis is of class 'date' - r

I'm dealing a lot with geom_line plots these days. What is the easiest way to annotate on a plot with an axis of class date? Other than to convert the date variable to a different class?
Here's my code:
china_trades %>%
filter(type %in% c("Imports")) %>%
ggplot() +
geom_line(aes(x = month, y = dollars, group = 1)) +
theme_minimal()
I would like to annotate the last data point which is at 2017-10 and 48.
Here's my plot:

Maybe somebody can chime in with a pure gg way of doing this but the directlabels package has this functionality:
china_trades %>%
filter(type %in% c("Imports")) %>%
ggplot() +
geom_line(aes(x = month, y = dollars, group = 1)) +
theme_minimal() +
geom_dl(aes(label = month), method = list(dl.combine("last.points")))
Edit: Here's a gg way using annotate:
x <- as.Date(c('2016-1-1','2016-1-2','2016-1-3','2016-1-4'))
y <- c(4,1,2,3)
df <- data.frame(x,y)
lastDate<- max(x)
lastDateY <- df[x==lastDate,2]
ggplot(df) +
geom_line(aes(x = x, y = y)) +
annotate(geom='text', x=lastDate,y=lastDateY, vjust=-2, label="China")

Related

ggplot two side by side graphs with the same scale

I'm trying to create two side by side graphs to compare the values (one absolute values and one proportions). I managed to create some simple graphs, but I cannot figure out if I have to wrap them or use a grid? I just keep getting errors.
My data looks something like this:
recent_quarter <- c(12, 15, 2, 3)
all_data <- c(218, 323, 34, 12)
recent_perc <- c(38,47,6,9)
all_perc <- c(37,55,4,5)
gender <- factor(c("M", "F", "Unknown", "Other"),
levels = c("M", "F", "Unknown", "Other"))
df <- data.frame(gender, all_data, recent_quarter, all_perc,
recent_perc, all_data)
Then I created a simple plot
ggplot(df, aes(x = gender, y = recent_perc)) +
geom_col(fill = "gray70") +
theme_minimal()
For this one, I'd like to add a second plot with the all_perc as the y axis. I'm stumped on how to do this.
You could:
g1 <- ggplot(df, aes(x = gender, y = recent_perc)) +
geom_col(fill = "gray70") +
theme_minimal()
g2 <- g1 + aes(y=all_perc)
cowplot::plot_grid(g1,g2)
gridExtra (as referenced in #Josh's answer) and patchwork are two other ways to do the grid assembly.
Or:
library(tidyverse)
df <- data.frame(gender, all_data, recent_quarter, all_perc, all_data, recent_perc)
df_long <- df %>%
select(gender, ends_with("perc")) %>%
pivot_longer(-gender) ## creates 'name', 'value' columns
ggplot(df_long, aes(gender, value)) + geom_col() +
facet_wrap(~name)
install the package gridExtra and use:
grid.arrange(
ggplot(df, aes(x = gender, y = recent_perc)) +
geom_col(fill = "gray70") +
theme_minimal(),
ggplot(df, aes(x = gender, y = all_perc)) +
geom_col(fill = "gray70") +
theme_minimal(),
ncol = 2)

How can you plot `geom_point()` with `facet_wrap()` using per-group row number as x?

Is there a way to plot geom_point() so that it implicitly uses the row number as x in a facet? Just like plot(y) but also for multiple facets.
The following fails with Error: geom_point requires the following missing aesthetics: x:
df = data.frame(y = rnorm(60), group = rep(c("A", "B", "C"), 20))
ggplot(df, aes(y = y)) +
geom_point() +
facet_wrap(~group)
Naturally, you can do it using something like the following, but it is quite cumbersome.
df = df %>%
group_by(group) %>%
mutate(row = row_number())
ggplot(df, aes(x = row, y = y)) +
geom_point() +
facet_wrap(~group)
You can try this:
ggplot(df, aes(x=seq(y),y = y))+geom_point() + facet_wrap(~group)
In that way you can avoid the creation of an index variable as you mentioned!!!

How to highlight a column in ggplot2

I have the following graph and I want to highlight the columns (both) for watermelons as it has the highest juice_content and weight. I know how to change the color of the columns but I would like to WHOLE columns to be highlighted. Any idea on how to achieve this? There doesn't seems to be any similar online.
fruits <- c("apple","orange","watermelons")
juice_content <- c(10,1,1000)
weight <- c(5,2,2000)
df <- data.frame(fruits,juice_content,weight)
df <- gather(df,compare,measure,juice_content:weight, factor_key=TRUE)
plot <- ggplot(df, aes(fruits,measure, fill=compare)) + geom_bar(stat="identity", position=position_dodge()) + scale_y_log10()
An option is to use gghighlight
library(gghighlight)
ggplot(df, aes(fruits,measure, fill = compare)) +
geom_col(position = position_dodge()) +
scale_y_log10() +
gghighlight(fruits == "watermelons")
In response to your comment, how about working with different alpha values
ggplot(df, aes(fruits,measure)) +
geom_col(data = . %>% filter(fruits == "watermelons"),
mapping = aes(fill = compare),
position = position_dodge()) +
geom_col(data = . %>% filter(fruits != "watermelons"),
mapping = aes(fill = compare),
alpha = 0.2,
position = position_dodge()) +
scale_y_log10()
Or you can achieve the same with one geom_col and a conditional alpha (thanks #Tjebo)
ggplot(df, aes(fruits, measure)) +
geom_col(
mapping = aes(fill = compare, alpha = fruits == 'watermelons'),
position = position_dodge()) +
scale_alpha_manual(values = c(0.2, 1)) +
scale_y_log10()
You could use geom_area to highlight behind the bars. You have to force the x scale to discrete first which is why I've used geom_blank (see this answer geom_ribbon overlay when x-axis is discrete) noting that geom_ribbon and geom_area are effectively the same except geom_area always has 0 as ymin
#minor edit so that the level isn't hard coded
watermelon_level <- which(levels(df$fruits) == "watermelons")
AreaDF <- data.frame(fruits = c(watermelon_level-0.5,watermelon_level+0.5))
plot <- ggplot(df, aes(fruits)) +
geom_blank(aes(y=measure, fill=compare))+
geom_area(data = AreaDF, aes( y = max(df$measure)), fill= "yellow")+
geom_bar(aes(y=measure, fill=compare),stat="identity", position=position_dodge()) + scale_y_log10()
Edit to address comment
If you want to highlight multiple fruits then you could do something like this. You need a data.frame with where you want the geom_area x and y, including dropping it to 0 between. I'm sure there's slightly tidier methods of getting the data.frame but this one works
highlight_level <- which(levels(df$fruits) %in% c("apple", "watermelons"))
AreaDF <- data.frame(fruits = unlist(lapply(highlight_level, function(x) c(x -0.51,x -0.5,x+0.5,x+0.51))),
yval = rep(c(1,max(df$measure),max(df$measure),1), length(highlight_level)))
AreaDF <- AreaDF %>% mutate(
yval = ifelse(floor(fruits) %in% highlight_level & ceiling(fruits) %in% highlight_level, max(df$measure), yval)) %>%
arrange(fruits) %>% distinct()
plot <- ggplot(df, aes(fruits)) +
geom_blank(aes(y=measure, fill=compare))+
geom_area(data = AreaDF, aes(y = yval ), fill= "yellow")+
geom_bar(aes(y=measure, fill=compare),stat="identity", position=position_dodge()) + scale_y_log10()
plot

How to do an association plot in ggplot2?

I have a table with two categorical values and I want to visualise their association; the number of times that they are found together in the same row.
For instance, let's take this data frame:
d <-data.frame(cbind(sample(1:5,100,replace=T), sample(1:10,100,replace=T)))
How can generate a heatmap like this:
Where the colour of the squares represent the number of times that X1 and X2 are found in a given combination.
It would be even better to know how to plot this with a dot plot instead, where the size of the dot represent the count of the combination occurrence between X1 and X2.
If you can guide me how to do this on ggplot2 or any other way in R, it would be really helpful.
Thanks!
Here's how I would do it:
library(ggplot2)
library(dplyr)
set.seed(123)
d <-data.frame(x = sample(1:5,100,replace=T), y = sample(1:10,100,replace=T))
d_sum <- d %>%
group_by(x, y) %>%
summarise(count = n())
For the heatmap:
ggplot(d_sum, aes(x, y)) +
geom_tile(aes(fill = count))
For the dotplot:
ggplot(d_sum, aes(x, y)) +
geom_point(aes(size = count))
library(ggplot2)
library(dplyr)
library(scales)
set.seed(123)
d <-data.frame(x = sample(1:20,1000,replace=T), y = sample(1:20,1000,replace=T))
d %>% count(x, y) %>% ggplot(aes(x, y, fill = n)) +
geom_tile() +
scale_x_continuous(breaks=1:20)+
scale_y_continuous(breaks=1:20)+
scale_fill_gradient2(low='white', mid='steelblue', high='red') +
guides(fill=guide_legend("Count")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + theme_bw()

More compact use of ggplot : grid spaghetti plot

The following code plot the predicted probability of several models against time. Having, all the plots on one graph was not readable so I divided the result in a grid.
I was wondering if it was possible to have only one ggplot with all the models then somehow specify which goes where with grid.arrange
Current :
p2.dat1 <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4 )
mdf1 <- melt(p2.dat1 , id.vars="EXPOSURE")
plm.plot.all1 <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
p2.dat2 <- select(ppf, EXPOSURE, predp.glm.gen, predp.glm5,predp.glm.step )
mdf2 <- melt(p2.dat2 , id.vars="EXPOSURE")
plm.plot.all2 <- ggplot(data = mdf2,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all1, plm.plot.all2, nrow=2)
Expected:
p2.dat <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step)
mdf <- melt(p2.dat , id.vars="EXPOSURE")
plm.plot.all <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all[some_selection_somehow], plm.plot.all[same], nrow=2)
Thanks,
You can do this with grid.arrange by writing some helper functions. It can be done more succinctly, but I prefer small focused functions that can be used with pipes.
library(tidyverse)
library(gridExtra)
# Helper Functions ----
plot_function <- function(x) {
ggplot(x, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
labs(title = unique(x$variable)) +
theme(legend.position = "none")
}
grid_plot <- function(x, selection) {
order <- c(names(x)[grepl(selection,names(x))], names(x)[!grepl(selection,names(x))])
grid.arrange(grobs = x[order], nrow = 2)
}
# Actually make the plot ----
ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
split(.$variable) %>%
map(plot_function) %>%
grid_plot("predp.glm3")
or you could do this with ggplot, a facet_wrap and factoring the variable column to the proper order. This has the benefits of shared axes across the plots, which facilitates easy comparison. You can alter the helper functions in the first approach to set the axes explicitly to achieve the same effect, but its just easier keeping it in ggplot.
library(tidyverse)
selection <- "predp.glm3"
plot_data <- ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
mutate(variable = fct_relevel(variable, c(selection, levels(variable)[-grepl(selection, levels(variable))])))
ggplot(plot_data, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
facet_wrap( ~variable, nrow = 2) +
theme(legend.position = "none")

Resources