Set / Link point and shape options for variables in ggplot2 - r

I would like to link variables I have in a dataframe i.e. ('prop1', 'prop2', 'prop3') to specific colours and shapes in the plot. However, I also want to exclude data (using dplyr::filter) to customise the plot display WITHOUT changing the points and shapes used for a specific variable. A minimal example is given below.
library(ggplot2)
library(dplyr)
library(magrittr)
obj <- c("cmpd 1","cmpd 1","cmpd 1","cmpd 2","cmpd 2")
x <- c(1, 2, 4, 7, 3)
var <- c("prop1","prop2","prop3","prop2","prop3")
y <- c(1, 2, 3, 2.5, 4)
col <- c("#E69F00","#9E0142","#56B4E9","#9E0142","#56B4E9")
shp <- c(0,1,2,1,2)
df2 <- cbind.data.frame(obj,x,var,y,col,shp)
plot <- ggplot(data = df2 %>%
filter(obj %in% c(
"cmpd 1",
"cmpd 2"
)),
aes(x = x,
y = y,
colour = as.factor(var),
shape = as.factor(var))) +
geom_point(size=2) +
#scale_shape_manual(values=shp) +
#scale_color_manual(values=col) +
facet_grid(.~obj)
plot
However, when I redact cmpd1 (just hashing in code) the colour and shape of prop2 and prop3 for cmpd2 change (please see plot2).
To this end, I tried adding in scale_shape_manual and scale_color_manual to the code (currently hashed) and linked these to specific vars (col and shp) in the dataframe (df2), but the same problem arises that both the shape and color of these variables changes when excluding one of the conditions?
Any and all help appreciated.

Try something like this:
library(tidyverse)
obj <- c("cmpd 1","cmpd 1","cmpd 1","cmpd 2","cmpd 2")
x <- c(1, 2, 4, 7, 3)
var <- c("prop1","prop2","prop3","prop2","prop3")
y <- c(1, 2, 3, 2.5, 4)
df2 <- cbind.data.frame(obj,x,var,y)
col <- c("prop1" = "#E69F00",
"prop2" = "#9E0142",
"prop3" = "#56B4E9")
shp <- c("prop1" = 0,
"prop2" = 1,
"prop3" = 2)
plot <- ggplot(data = df2 %>%
filter(obj %in% c(
"cmpd 1",
"cmpd 2"
)),
aes(x = x,
y = y,
colour = var,
shape = var)) +
geom_point(size=2) +
scale_shape_manual(values=shp) +
scale_color_manual(values=col) +
facet_grid(.~obj)
plot

Related

Adding images to end of geom_line

I have a dataframe that consists of 5 vectors:
name <- c("a", "a", "b", "b", "b")
game <- c(1, 2, 1, 2, 3)
pts <- c(3, 6, 1, 6, 7)
cum_pts <- c(3, 9, 1, 7, 14)
image <- (image1, image1, image2, image2, image2)
df <- data.frame(name, game, pts, cum_pts, image)
I want to make a line plot of the two different values of "name", with the image associated with each name at the very end of the respective lines.
I can do that with this code, where I use geom_image for each associated image:
df %>%
ggplot(aes(x = game, y = cum_pts, group = name)) +
geom_line() +
geom_image(data = filter(df, name == "a"), aes(x = max(game), y = max(cum_pts), image = pics), size = 0.08) +
geom_image(data = filter(df, name == "b"), aes(x = max(game), y = max(cum_pts), image = pics), size = 0.08)
and it gives me this, which is what I want for this df:
Ultimately my dataframe will consist of many more names than just 2, and having to use a separate geom_image line for each name seems inefficient. Is there any way I can just use one line of code for all the images that will be placed at the end of their respective lines?
There is no need to filter your df for each name and add the images one by one. Instead you could use a dataframe with one row per name and add the images with one call to geom_image. In my code below I create the df for the images using dplyr::slice_max to pick the row with the max(game) for each name:
library(ggplot2)
library(ggimage)
library(dplyr)
image1 <- "https://www.r-project.org/logo/Rlogo.png"
image2 <- "https://ggplot2.tidyverse.org/logo.png"
df_image <- df |>
group_by(name) |>
slice_max(order_by = game, n = 1)
ggplot(df, aes(x = game, y = cum_pts, group = name)) +
geom_line() +
geom_image(data = df_image, aes(x = game, y = cum_pts, image = image), size = 0.08)

Bubble chart with round country flag symbols in ggplot/ggimage

I try to make a bubble chart with ggplot/ggimage by including the country flag. Here is a reproducible example:
library(dplyr)
library(ggplot)
library(ggimage)
A <- data.frame(X = c(1,4,5), Y = c(10, 1, 5), Z = c(1, 2, 3)/30, Country = c("FR", "BE", "IT"), CountryFlag = paste0("https://flagcdn.com/h20/", str_to_lower(Country), ".png"))
A_plot <- ggplot(A, mapping = aes(x = X, y = Y, size = I(Z), image = CountryFlag)) +
geom_image()
X11(); print(A_plot)
Except downloading the rounded flags, is that possible to transform the flag as rounded or is it possible to insert the country in a bubble. I tried the following code but it doesn't work:
library(dplyr)
library(ggplot)
library(ggimage)
A <- data.frame(X = c(1,4,5), Y = c(10, 1, 5), Z = c(1, 2, 3), Country = c("FR", "BE", "IT"), CountryFlag = paste0("https://flagcdn.com/h20/", str_to_lower(Country), ".png"))
A_plot <- ggplot(A, mapping = aes(x = X, y = Y, size = Z, image = CountryFlag)) +
geom_point(alpha = 0.5, col = "lightblue") +
geom_image()
X11(); print(A_plot)
I get the error:
Error in `[<-`(`*tmp*`, !is.na(alpha), 4, value = alpha[!is.na(alpha)]) :
(subscript) logical subscript too long
In addition: Warning message:
In rep(colour, length.out = length(alpha)) :
'x' is NULL so the result will be NULL
Any suggestion is welcome.
Here's a function that adds a circular mask to each flag. If we start with your plot,
A_plot
We get the urls and create some local filenames:
flags <- A_plot$data$CountryFlag
png_files <- sapply(strsplit(flags, "/"), function(x) x[length(x)])
Now we create some images with a circular mask and save them locally:
OK <- Map(function(flag, png) {
im <- magick::image_read(flag)
im <- magick::image_resize(im, magick::geometry_size_percent(500, 2000))
ii <- magick::image_info(im)
width <- ii$width
fig <- magick::image_draw(magick::image_blank(height, height))
symbols(width/2, width/2, circles=(width/2), bg='black', inches=FALSE, add=TRUE)
im2 <- magick::image_composite(im, fig, operator='copyopacity')
magick::image_write(im2, png)
}, flag = flags, png = png_files)
Now write these file paths as our image locations in the plot object:
A_plot$data$CountryFlag <- png_files
Which changes our plot to:
A_plot
For completeness, we should tidy up after ourselves once the plot is drawn:
sapply(png_files, unlink)
Funnily enough, someone just asked a question which pointed to a package for an easy way to get exactly what you want. The {ggflags} package (not on CRAN!) introduces geom_flag, which uses readily available round flag icons from the EmojiOne set as points.
library(tidyverse)
# devtools::install_github("jimjam-slam/ggflags")
library(ggflags)
Country <- c("FR", "BE", "IT")
A <- data.frame(
X = c(1, 4, 5), Y = c(10, 1, 5), Z = c(1, 2, 3), Country = tolower(Country))
ggplot(A, mapping = aes(x = X, y = Y, size = Z*3, country = Country)) +
geom_flag()+
# you can then set the scale as usual by using scale_size
scale_size_identity(guide = guide_legend())
Created on 2022-04-03 by the reprex package (v2.0.1)

ggplot from two tibbles; scatterplot with contours in background

I have two tibbles -
tbl1 contains real data : X, Y and choice.
tbl2 is synthetic tibble to calculate contours of predicted choice P.
library(tidyverse)
# tibble1
X <- c(1, 3, 5)
Y <- c(1, 5, 3)
choice <- c(0, 1, 1)
tbl1 <- tibble(X,Y,choice)
# tibble2
X <- seq(0, 5, 0.1)
Y <- seq(0, 5, 0.1)
tbl2 <- crossing(X,Y)
tbl2 <- tbl2 %>%
mutate(V = (X + Y - 4)/2,
P = 1/(1+exp(-V)))
I wish to create a single ggplot with
scatterplot X vs Y from tbl1 (with color = choice)
filled contours of P from tbl2 in the background
Thanks
Perhaps this?
library(ggplot2)
ggplot(tbl2, aes(X, Y)) +
geom_contour_filled(aes(z = P), alpha = 0.3) +
geom_point(aes(color = factor(choice)), size = 5, data = tbl1) +
guides(fill = guide_none()) +
labs(color = "Choice")

Equivalent of gganimate::transition_events on plotly

In R, using gganimate, one can make an animated plot where events appear and disappear with time. For example:
library(lubridate)
library(gganimate)
df=data.frame(
x=c(1,2,3,4),
y=c(1,2,3,4),
start=c(1,2,3,4),
end=c(5,6,7,8),
en=as_date(1),
ex=as_date(1))
ggplot(data=df, aes(x=x,y=y))+
geom_point()+
gganimate::transition_events(
start=start,
end=end,
enter_length = as.numeric(en),
exit_length = as.numeric(ex))
This produces a plot in which points appear according to column "start" and desappear according to column "end".
I wonder if there is an easy way to achieve the same in with plotly (preferably using ggplotly()), getting a slider to move along the time.
Here is an example using ggplotly. The result however isn't exactly the same:
library(plotly)
library(lubridate)
df = data.frame(
x = c(1, 2, 3, 4),
y = c(1, 2, 3, 4),
start = c(1, 2, 3, 4),
end = c(5, 6, 7, 8),
en = as_date(1),
ex = as_date(1)
)
frame_list <- Map(seq, from = df$start, to = df$end)
DF <- data.frame(x = rep(df$x, times = lengths(frame_list)),
y = rep(df$y, times = lengths(frame_list)),
frame = unlist(frame_list))
p <- ggplot(DF, aes(x, y)) +
geom_point(aes(size = y, frame = frame))
fig <- ggplotly(p)
fig %>%
animation_opts(
frame = 0,
easing = "linear",
redraw = FALSE,
mode = "immediate"
)
fig

Using loop for gpplot2 cause issue on displaying the legends

I have an R code that creates a linear regression. I am having some problems with the legends in a graph. I would like to use the dates specified in the trendDateRange as the legend with different colors. Since these dates are in YYYY-MM-DD format. I only need the YYYY-MM. So for example, the trendDateRage1 = c("2015-01-01", "2015-12-31") and I want to display "2015-01 - 2015-12" as a legend with a any colour. When I run this in a for loop, it's only displaying 1 legend which uses the last trendDateRange i.e trendDateRange3 which displays "2013-01 - 2013-12". It does not display the legend for the other 2 dates. I do not have any problem with graphs although they're using the same colour. I would like to see different colours for each legend even though they have different line types.
If I run the code below showing individual graphs, it's working with the proper legend. I get the legend for each graph.
Month_Names <- c("2010-11","2010-12",
"2011-01","2011-02","2011-03","2011-04","2011-05","2011-06","2011-07","2011-08","2011-09","2011-10","2011-11","2011-12",
"2012-01","2012-02","2012-03","2012-04","2012-05","2012-06","2012-07","2012-08","2012-09","2012-10","2012-11","2012-12",
"2013-01","2013-02","2013-03","2013-04","2013-05","2013-06","2013-07","2013-08","2013-09","2013-10","2013-11","2013-12",
"2014-01","2014-02","2014-03","2014-04","2014-05","2014-06","2014-07","2014-08","2014-09","2014-10","2014-11","2014-12",
"2015-01","2015-02","2015-03","2015-04","2015-05","2015-06","2015-07","2015-08","2015-09","2015-10","2015-11","2015-12",
"2016-01","2016-02","2016-03","2016-04","2016-05","2016-06","2016-07","2016-08","2016-09","2016-10","2016-11","2016-12",
"2017-01")
Actual_volume <- c(54447,57156,
52033,49547,58718,53109,56488,60095,54683,60863,56692,55283,55504,56633,
53267,52587,54680,55569,60013,56985,59709,61281,54188,59832,56489,55819,
59295,52692,56663,59698,61232,57694,63111,60473,58984,64050,54957,63238,
59460,54430,58901,61088,60496,62984,66895,62720,65591,67815,58289,72002,
61054,60329,69283,68002,63196,72267,71058,69539,71379,70925,68704,76956,
65863,70494,77348,70214,74770,77480,69721,83034,76761,77927,79768,81836,
75381)
df_data <- data.frame(Month_Names, Actual_volume)
trendDateRange1 <- c("2010-11-01", "2017-01-31")
trendDateRange2 <- c("2012-01-01", "2012-12-31")
trendDateRange3 <- c("2013-01-01", "2013-12-31")
numoftrends <- 3
list_of_df <- list()
list_of_df<- lapply(1:numoftrends, function(j) {
trend.period <- get(paste0("trendDateRange", j))
trend1 <- substr(trend.period[1], 1, 7)
trend2 <- substr(trend.period[2], 1, 7)
TRx <- subset(df_data, as.character(Month_Names) >= trend1 &
as.character(Month_Names) <= trend2)
})
i = 1
trend.period <- get(paste0("trendDateRange", i))
trend1 <- substr(trend.period[1], 1, 7)
trend2 <- substr(trend.period[2], 1, 7)
Trend.dates <- paste0(trend1, '-' ,trend2)
plot = ggplot() +
geom_line(data = list_of_df[[i]],
aes(x = Month_Names, y = Actual_volume, group = 1 , colour = Trend.dates),
lty = i + 1)
print(ggplotly(plot))
i = 2
trend.period <- get(paste0("trendDateRange", i))
trend1 <- substr(trend.period[1], 1, 7)
trend2 <- substr(trend.period[2], 1, 7)
Trend.dates <- paste0(trend1, '-' ,trend2)
plot = ggplot() +
geom_line(data = list_of_df[[i]],
aes(x=Month_Names, y = Actual_volume, group = 1 , colour = Trend.dates),
lty = i + 1)
print(ggplotly(plot))
i = 3
trend.period <- get(paste0("trendDateRange", i))
trend1 <- substr(trend.period[1], 1, 7)
trend2 <- substr(trend.period[2], 1, 7)
Trend.dates <- paste0(trend1, '-' ,trend2)
plot = ggplot() +
geom_line(data = list_of_df[[i]],
aes(x = Month_Names, y = Actual_volume, group = 1 , colour = Trend.dates),
lty = i+1)
print(ggplotly(plot))
But when I put this in the loop to make it one graph with each legend it does not work
plot = ggplot()
for (i in seq_along(list_of_df)) {
trend.period = get(paste0("trendDateRange", i))
trend1 = substr(trend.period[1], 1, 7)
trend2 = substr(trend.period[2], 1, 7)
Trend.dates = paste0(trend1, '-' ,trend2)
plot = plot + geom_line(aes(x = Month_Names, y = Actual_volume, group = 1 , colour = Trend.dates),
data = list_of_df[[i]], lty = i + 1)
}
print(ggplotly(plot))
You'll have a much easier time working with ggplot2 if you combine the three datasets into one with an aesthetic that separates them, rather than adding them together in a for loop.
There are a number of ways you could do this, but here's an example using the dplyr and tidyr packages. It would replace everything after your df_data <- line.
library(ggplot2)
library(dplyr)
library(tidyr)
trends <- data_frame(Start = c("2010-11", "2012-01", "2013-01"),
End = c("2017-01", "2012-12", "2013-12"))
combined_data <- df_data %>%
crossing(trends) %>%
mutate(Month_Names = as.character(Month_Names),
TrendName = paste(Start, End, sep = "-")) %>%
filter(Month_Names >= Start,
Month_Names <= End)
# rotated x-axes to make plot slightly more readable
ggplot(combined_data, aes(Month_Names, y = Actual_volume,
group = TrendName,
color = TrendName)) +
geom_line() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
If you combine your list into a data.frame with an ID representing which element the observation came from and parse the dates, getting a decent plot is pretty simple:
library(dplyr)
library(ggplot2)
list_of_df %>%
bind_rows(.id = 'id') %>%
mutate(date = as.Date(paste0(Month_Names, '-01'))) %>%
ggplot(aes(date, Actual_volume, color = id)) +
geom_line()
or without dplyr,
df <- do.call(rbind,
Map(function(df, i){df$id <- i; df},
df = list_of_df,
i = as.character(seq_along(list_of_df))))
df$date <- as.Date(paste0(df$Month_Names, '-01'))
ggplot(df, aes(date, Actual_volume, color = id)) + geom_line()
which returns the same thing.
If you'd like more descriptive group labels, set the names of the list elements or define id as a string pasted together from the formatted minimums and maximums of the parsed dates.
Here is a solution using ggplotly.
nrows <- unlist(lapply(list_of_df,nrow))
df <- data.frame(do.call(rbind,list_of_df), Grp = factor(rep(1:3, nrows)))
plot <- ggplot(aes(x=Month_Names, y=Actual_volume, group = Grp,
colour=Grp), data=df) + geom_line()
print(ggplotly(plot))

Resources