I have a dataframe that consists of 5 vectors:
name <- c("a", "a", "b", "b", "b")
game <- c(1, 2, 1, 2, 3)
pts <- c(3, 6, 1, 6, 7)
cum_pts <- c(3, 9, 1, 7, 14)
image <- (image1, image1, image2, image2, image2)
df <- data.frame(name, game, pts, cum_pts, image)
I want to make a line plot of the two different values of "name", with the image associated with each name at the very end of the respective lines.
I can do that with this code, where I use geom_image for each associated image:
df %>%
ggplot(aes(x = game, y = cum_pts, group = name)) +
geom_line() +
geom_image(data = filter(df, name == "a"), aes(x = max(game), y = max(cum_pts), image = pics), size = 0.08) +
geom_image(data = filter(df, name == "b"), aes(x = max(game), y = max(cum_pts), image = pics), size = 0.08)
and it gives me this, which is what I want for this df:
Ultimately my dataframe will consist of many more names than just 2, and having to use a separate geom_image line for each name seems inefficient. Is there any way I can just use one line of code for all the images that will be placed at the end of their respective lines?
There is no need to filter your df for each name and add the images one by one. Instead you could use a dataframe with one row per name and add the images with one call to geom_image. In my code below I create the df for the images using dplyr::slice_max to pick the row with the max(game) for each name:
library(ggplot2)
library(ggimage)
library(dplyr)
image1 <- "https://www.r-project.org/logo/Rlogo.png"
image2 <- "https://ggplot2.tidyverse.org/logo.png"
df_image <- df |>
group_by(name) |>
slice_max(order_by = game, n = 1)
ggplot(df, aes(x = game, y = cum_pts, group = name)) +
geom_line() +
geom_image(data = df_image, aes(x = game, y = cum_pts, image = image), size = 0.08)
Related
I would like to link variables I have in a dataframe i.e. ('prop1', 'prop2', 'prop3') to specific colours and shapes in the plot. However, I also want to exclude data (using dplyr::filter) to customise the plot display WITHOUT changing the points and shapes used for a specific variable. A minimal example is given below.
library(ggplot2)
library(dplyr)
library(magrittr)
obj <- c("cmpd 1","cmpd 1","cmpd 1","cmpd 2","cmpd 2")
x <- c(1, 2, 4, 7, 3)
var <- c("prop1","prop2","prop3","prop2","prop3")
y <- c(1, 2, 3, 2.5, 4)
col <- c("#E69F00","#9E0142","#56B4E9","#9E0142","#56B4E9")
shp <- c(0,1,2,1,2)
df2 <- cbind.data.frame(obj,x,var,y,col,shp)
plot <- ggplot(data = df2 %>%
filter(obj %in% c(
"cmpd 1",
"cmpd 2"
)),
aes(x = x,
y = y,
colour = as.factor(var),
shape = as.factor(var))) +
geom_point(size=2) +
#scale_shape_manual(values=shp) +
#scale_color_manual(values=col) +
facet_grid(.~obj)
plot
However, when I redact cmpd1 (just hashing in code) the colour and shape of prop2 and prop3 for cmpd2 change (please see plot2).
To this end, I tried adding in scale_shape_manual and scale_color_manual to the code (currently hashed) and linked these to specific vars (col and shp) in the dataframe (df2), but the same problem arises that both the shape and color of these variables changes when excluding one of the conditions?
Any and all help appreciated.
Try something like this:
library(tidyverse)
obj <- c("cmpd 1","cmpd 1","cmpd 1","cmpd 2","cmpd 2")
x <- c(1, 2, 4, 7, 3)
var <- c("prop1","prop2","prop3","prop2","prop3")
y <- c(1, 2, 3, 2.5, 4)
df2 <- cbind.data.frame(obj,x,var,y)
col <- c("prop1" = "#E69F00",
"prop2" = "#9E0142",
"prop3" = "#56B4E9")
shp <- c("prop1" = 0,
"prop2" = 1,
"prop3" = 2)
plot <- ggplot(data = df2 %>%
filter(obj %in% c(
"cmpd 1",
"cmpd 2"
)),
aes(x = x,
y = y,
colour = var,
shape = var)) +
geom_point(size=2) +
scale_shape_manual(values=shp) +
scale_color_manual(values=col) +
facet_grid(.~obj)
plot
I have a dataframe in this format, but with several hundred more rows:
dfex = data.frame(dot = c('A', 'B', 'C', 'D', 'E', 'F'),
group = c('A1', 'A1', 'A1', 'A2', 'A2', 'A2'),
x1 = c(1, 2, 3, 4, 5, 6),
x2 = c(4, 5, 6, 1, 2, 3),
y = c(1, 2, 3, 4, 5, 6))
I want to create different graphs based on the value in group, so one graph will only have group A1 rows and the other graph only has group A2 rows.
On each graph, there should be two different lines for the x1-y pair and the x2-y pair. Preferably I could have the correlation for each of these lines listed as well.
I'm familiar with ggplot2, so using that would be great.
Here is an amazing paint drawing for a better idea of what I mean:
I agree with #camille, it is better to reshape the data to long format before plotting.
library(tidyverse)
dfex %>%
gather(key, value, -c(dot, group, y)) %>%
ggplot() +
aes(value, y, color = key) +
geom_line() +
facet_wrap(.~group)
The below code will split into two parts. The facet_wrap will divide the graph into two columns on the group. I have created two lines because of the variables being stored in separate columns.
ggplot(dfex) +
geom_line(mapping = aes(x = x1, y = y, color = "blue")) +
geom_line(mapping = aes(x = x2, y = y, color = "red")) +
facet_wrap(. ~group)
Or additionally to gather the data into a more tidy format,
gather(dfex, "xVar", "x", 3:4) %>%
ggplot() +
geom_line(mapping = aes(x = x, y = y, color = xVar)) +
facet_wrap(. ~group)
Welcome to Tidyville.
Below is a small df showing the populations of cities in Tidyville. Some cities belong to the A state and some the B state.
I wish to highlight the cities that decreased in population in red. Mission accomplished so far.
But there are many states in Tidyville. Is there a way to use ggplot's faceting faceting to show a plot for each state. I'm uncertain because I'm new and I do a little calculation outside the ggplot call to identify the cities that decreased in population.
library(ggplot2)
library(tibble)
t1 <- tibble (
y2001 = c(3, 4, 5, 6, 7, 8, 9, 10),
y2016 = c(6, 3, 9, 2, 8, 2, 11, 15),
type = c("A", "A", "B", "B", "A", "A", "B", "B")
)
years <- 15
y2001 <- t1$y2001
y2016 <- t1$y2016
# Places where 2016 pop'n < 2001 pop'n
yd <- y2016 < y2001
decrease <- tibble (
y2001 = t1$y2001[yd],
y2016 = t1$y2016[yd]
)
# Places where 2016 pop'n >= 2001 pop'n
yi <- !yd
increase <- tibble (
y2001 = t1$y2001[yi],
y2016 = t1$y2016[yi]
)
ggplot() +
# Decreasing
geom_segment(data = decrease, aes(x = 0, xend = years, y = y2001, yend = y2016),
color = "red") +
# Increasing or equal
geom_segment(data = increase, aes(x = 0, xend = years, y = y2001, yend = y2016),
color = "black")
I think this would be much easier if you just put your data in a tidy format like ggplot2 expects. Here's a possible solution using tidyverse functions
library(tidyverse)
t1 %>%
rowid_to_column("city") %>%
mutate(change=if_else(y2016 < y2001, "decrease", "increase")) %>%
gather(year, pop, y2001:y2016) %>%
ggplot() +
geom_line(aes(year, pop, color=change, group=city)) +
facet_wrap(~type) +
scale_color_manual(values=c("red","black"))
This results in
Your intermediary steps are unnecessary and lose some of your data. We'll keep what you created first:
t1 <- tibble (
y2001 = c(3, 4, 5, 6, 7, 8, 9, 10),
y2016 = c(6, 3, 9, 2, 8, 2, 11, 15),
type = c("A", "A", "B", "B", "A", "A", "B", "B")
)
years <- 15
But instead of doing all the separating and subsetting, we'll just create a dummy variable for whether or not y2016 > y2001.
t1$incr <- as.factor(ifelse(t1$y2016 >= t1$y2001, 1, 0))
Then we can extract the data argument to the ggplot() call to make it more efficient. We'll only use one geom_segment() argument and set the color() argument to be that dummy variable we created before. We then need to pass a vector of colors to scale_fill_manual()'s value argument. Finally, add the facet_grid() argument. If you're only faceting on one variable, you put a period on the opposite side of the tilde. Period first mean's they'll be paneled side-by-side, period last means they'll be stacked on top of each toher
ggplot(data = t1) +
geom_segment(aes(x = 0, xend = years, y = y2001, yend = y2016, color=incr)) +
scale_fill_manual(values=c("black", "red")) +
facet_grid(type~.)
I believe you don't need to create two new datasets, you can add a column to t1.
t2 <- t1
t2$decr <- factor(yd + 0L, labels = c("increase", "decrease"))
I have left the original t1 intact and altered a copy, t2.
Now in order to apply ggplot facets, maybe this is what you are looking for.
ggplot() +
geom_segment(data = t2, aes(x = 0, xend = years, y = y2001, yend = y2016), color = "red") +
facet_wrap(~ decr)
If you want to change the colors, use the new column decr as an value tocolor. Note that this argument changes its position, it is now aes(..., color = decr).
ggplot() +
geom_segment(data = t2, aes(x = 0, xend = years, y = y2001, yend = y2016, color = decr)) +
facet_wrap(~ decr)
require(dplyr)
t1<-mutate(t1,decrease=y2016<y2001)
ggplot(t1)+facet_wrap(~type)+geom_segment(aes(x = 0, xend = years, y = y2001, yend = y2016, colour=decrease))
Task: I would like to reorder a factor variable by the difference between the factor variable when a second variable equals 1 and the factor variable when the second variable equals 0. Here is a reproducible example to clarify:
# Package
library(tidyverse)
# Create fake data
df1 <- data.frame(place = c("A", "B", "C"),
avg = c(3.4, 4.5, 1.8))
# Plot, but it's not in order of value
ggplot(df1, aes(x = place, y = avg)) +
geom_point(size = 4)
# Now put it in order
df1$place <- factor(df1$place, levels = df1$place[order(df1$avg)])
# Plots in order now
ggplot(df1, aes(x = place, y = avg)) +
geom_point(size = 4)
# Adding second, conditional variable (called: new)
df2 <- data.frame(place = c("A", "A", "B", "B", "C", "C"),
new = rep(0:1, 3),
avg = c(3.4, 2.3, 4.5, 4.2, 2.1, 1.8))
ggplot(df2, aes(x = place, y = avg, col = factor(new))) +
geom_point(size = 3)
Goal: I would like to order and plot the factor variable place by the difference of avg between place when new is 1 and place when new is 0
You can create the levels for the place column by:
library(tidyr)
df2$place <- factor(df2$place, levels=with(spread(df2, new, avg), place[order(`1` - `0`)]))
ggplot(df2, aes(x = place, y = avg, col = factor(new))) +
geom_point(size = 3) + labs(color = 'new')
gives:
If I understand the goal correctly, then factor A has the biggest difference:
avg(new = 0) - avg(new = 1) = 1.1
So you can spread the data frame to calculate the difference, then gather, then plot avg versus place, reordered by diff. Or if you want A first, by -diff.
But let me know if I didn't understand correctly :)
df2 %>%
spread(new, avg) %>%
mutate(diff = `0` - `1`) %>%
gather(new, avg, -diff, -place) %>%
ggplot(aes(reorder(place, diff), avg)) +
geom_point(aes(color =factor(new)), size = 3)
Calculate the column first using dplyr:
df2 %>% group_by(place) %>% mutate(diff=diff(avg))
ggplot(df2, aes(x=place, y=diff, color=diff)+
geom_point(size=3)
I am trying to create a line plot in ggplot2 that has different line sized based on condition.
I accomplished this with this code:
library(zoo)
library(ggplot2)
dat_1 <- data.frame(
"date" =seq(as.yearmon(Sys.Date()) - 4, as.yearmon(Sys.Date()), 1/12),
"value" = rnorm(49),
"ID" = "A",
"Condition" = sample(c("AA", "AB"), size = 49, replace = TRUE)
)
dat_2 <- data.frame(
"date" =seq(as.yearmon(Sys.Date()) - 4, as.yearmon(Sys.Date()), 1/12),
"value" = rnorm(49),
"ID" = "B",
"Condition" = (sample(c("AA", "AB"), size = 49, replace = TRUE))
)
dat <- rbind(dat_1, dat_2)
ggplot(dat, aes(x = date, y = value, color = ID, group = ID)) + geom_line(aes(size = Condition)) + scale_size_manual(values = c(0.5, 2.5))
It is what I wanted to get (This is just a dummy, so not optimized sizes).
But, if you look closer you can see that lines are actually collection of boxes and not smooth lines:
How can I fix this?
It is also possible that it might be due to the fact that my x axis are yearmon. But they are scaled as continues (by ggplot) in this case, anyway.