Welcome to Tidyville.
Below is a small df showing the populations of cities in Tidyville. Some cities belong to the A state and some the B state.
I wish to highlight the cities that decreased in population in red. Mission accomplished so far.
But there are many states in Tidyville. Is there a way to use ggplot's faceting faceting to show a plot for each state. I'm uncertain because I'm new and I do a little calculation outside the ggplot call to identify the cities that decreased in population.
library(ggplot2)
library(tibble)
t1 <- tibble (
y2001 = c(3, 4, 5, 6, 7, 8, 9, 10),
y2016 = c(6, 3, 9, 2, 8, 2, 11, 15),
type = c("A", "A", "B", "B", "A", "A", "B", "B")
)
years <- 15
y2001 <- t1$y2001
y2016 <- t1$y2016
# Places where 2016 pop'n < 2001 pop'n
yd <- y2016 < y2001
decrease <- tibble (
y2001 = t1$y2001[yd],
y2016 = t1$y2016[yd]
)
# Places where 2016 pop'n >= 2001 pop'n
yi <- !yd
increase <- tibble (
y2001 = t1$y2001[yi],
y2016 = t1$y2016[yi]
)
ggplot() +
# Decreasing
geom_segment(data = decrease, aes(x = 0, xend = years, y = y2001, yend = y2016),
color = "red") +
# Increasing or equal
geom_segment(data = increase, aes(x = 0, xend = years, y = y2001, yend = y2016),
color = "black")
I think this would be much easier if you just put your data in a tidy format like ggplot2 expects. Here's a possible solution using tidyverse functions
library(tidyverse)
t1 %>%
rowid_to_column("city") %>%
mutate(change=if_else(y2016 < y2001, "decrease", "increase")) %>%
gather(year, pop, y2001:y2016) %>%
ggplot() +
geom_line(aes(year, pop, color=change, group=city)) +
facet_wrap(~type) +
scale_color_manual(values=c("red","black"))
This results in
Your intermediary steps are unnecessary and lose some of your data. We'll keep what you created first:
t1 <- tibble (
y2001 = c(3, 4, 5, 6, 7, 8, 9, 10),
y2016 = c(6, 3, 9, 2, 8, 2, 11, 15),
type = c("A", "A", "B", "B", "A", "A", "B", "B")
)
years <- 15
But instead of doing all the separating and subsetting, we'll just create a dummy variable for whether or not y2016 > y2001.
t1$incr <- as.factor(ifelse(t1$y2016 >= t1$y2001, 1, 0))
Then we can extract the data argument to the ggplot() call to make it more efficient. We'll only use one geom_segment() argument and set the color() argument to be that dummy variable we created before. We then need to pass a vector of colors to scale_fill_manual()'s value argument. Finally, add the facet_grid() argument. If you're only faceting on one variable, you put a period on the opposite side of the tilde. Period first mean's they'll be paneled side-by-side, period last means they'll be stacked on top of each toher
ggplot(data = t1) +
geom_segment(aes(x = 0, xend = years, y = y2001, yend = y2016, color=incr)) +
scale_fill_manual(values=c("black", "red")) +
facet_grid(type~.)
I believe you don't need to create two new datasets, you can add a column to t1.
t2 <- t1
t2$decr <- factor(yd + 0L, labels = c("increase", "decrease"))
I have left the original t1 intact and altered a copy, t2.
Now in order to apply ggplot facets, maybe this is what you are looking for.
ggplot() +
geom_segment(data = t2, aes(x = 0, xend = years, y = y2001, yend = y2016), color = "red") +
facet_wrap(~ decr)
If you want to change the colors, use the new column decr as an value tocolor. Note that this argument changes its position, it is now aes(..., color = decr).
ggplot() +
geom_segment(data = t2, aes(x = 0, xend = years, y = y2001, yend = y2016, color = decr)) +
facet_wrap(~ decr)
require(dplyr)
t1<-mutate(t1,decrease=y2016<y2001)
ggplot(t1)+facet_wrap(~type)+geom_segment(aes(x = 0, xend = years, y = y2001, yend = y2016, colour=decrease))
Related
I have a dataframe that consists of 5 vectors:
name <- c("a", "a", "b", "b", "b")
game <- c(1, 2, 1, 2, 3)
pts <- c(3, 6, 1, 6, 7)
cum_pts <- c(3, 9, 1, 7, 14)
image <- (image1, image1, image2, image2, image2)
df <- data.frame(name, game, pts, cum_pts, image)
I want to make a line plot of the two different values of "name", with the image associated with each name at the very end of the respective lines.
I can do that with this code, where I use geom_image for each associated image:
df %>%
ggplot(aes(x = game, y = cum_pts, group = name)) +
geom_line() +
geom_image(data = filter(df, name == "a"), aes(x = max(game), y = max(cum_pts), image = pics), size = 0.08) +
geom_image(data = filter(df, name == "b"), aes(x = max(game), y = max(cum_pts), image = pics), size = 0.08)
and it gives me this, which is what I want for this df:
Ultimately my dataframe will consist of many more names than just 2, and having to use a separate geom_image line for each name seems inefficient. Is there any way I can just use one line of code for all the images that will be placed at the end of their respective lines?
There is no need to filter your df for each name and add the images one by one. Instead you could use a dataframe with one row per name and add the images with one call to geom_image. In my code below I create the df for the images using dplyr::slice_max to pick the row with the max(game) for each name:
library(ggplot2)
library(ggimage)
library(dplyr)
image1 <- "https://www.r-project.org/logo/Rlogo.png"
image2 <- "https://ggplot2.tidyverse.org/logo.png"
df_image <- df |>
group_by(name) |>
slice_max(order_by = game, n = 1)
ggplot(df, aes(x = game, y = cum_pts, group = name)) +
geom_line() +
geom_image(data = df_image, aes(x = game, y = cum_pts, image = image), size = 0.08)
I'm wondering how to reproduce the following figure using R.
The data used in the figure are sparse functional data of bone mineral density. Basically each participant's bone mineral level is observed a few times during the experiment. But the observation times and number of observations for each participant are different.
The figure is from article 'Principal component models for sparse functional data'.
You can find it here Principal component models for sparse functional data or Principal component models for sparse functional data
You could reproduce the figure with made-up data like this:
library(ggplot2)
# Create sample data
set.seed(8) # Makes data reproducible
ages <- runif(40, 8, 24)
df <- do.call(rbind, lapply(seq_along(ages), function(x) {
age <- ages[x] + cumsum(runif(sample(2:5, 1), 1, 2))
y <- (tanh((age - 10)/pi - pi/2) + 2.5)/3
y <- y + rnorm(1, 0, 0.1)
y <- y + cumsum(rnorm(length(y), 0, 0.02))
data.frame(ID = x, age = age, BMD = y)
}))
# Draw plot
ggplot(df, aes(x = age, y = BMD)) +
geom_path(aes(group = ID), color = 'gray70', na.rm = TRUE) +
geom_point(color = 'gray70', na.rm = TRUE) +
geom_smooth(color = 'black', se = FALSE, formula =y ~ s(x, bs = "cs"),
method = 'gam', na.rm = TRUE) +
theme_classic(base_size = 16) +
scale_x_continuous(limits = c(8, 28)) +
labs(y = 'Spinal Bone Density', x = 'Age') +
theme(panel.border = element_rect(fill = NA))
Without knowing your own data structure however, it's difficult to say how applicable you will find this to your own use case.
You can do this in ggplot2 as long as you have data in long format and with a grouping variable such as id in my example:
dat <- tibble::tribble(
~id, ~age, ~bone_dens,
1, 10, 0.6,
1, 15, 0.8,
1, 19, 1.12,
2, 11, 0.7,
2, 18, 1.1,
3, 16, 1.1,
3, 18, 1.2,
3, 25, 1.0)
You first plot the dots with geom_point(), then you add the lines that join dots with the same id with geom_line():
dat |>
ggplot(aes(x = age, y = bone_dens)) +
geom_point() +
geom_line(aes(group = id))
Output will look like this - you'll be able to customise it like any other ggplot.
I'm trying to visualize some data that looks like this
line1 <- data.frame(x = c(4, 24), y = c(0, -0.42864), group = "group1")
line2 <- data.frame(x = c(4, 12 ,24), y = c(0, 2.04538, 3.4135), group = "group2")
line3 <- data.frame(x = c(4, 12, 24), y = c(0, 3.14633, 3.93718), group = "group3")
line4 <- data.frame(x = c(0, 3, 7, 12, 18), y = c(0, -0.50249, 0.11994, -0.68694, -0.98949), group = "group4")
line5 <- data.frame(x = c(0, 3, 7, 12, 18, 24), y = c(0, -0.55753, -0.66006, 0.43796, 1.38723, 3.17906), group = "group5")
df <- do.call(rbind, list(line1, line2, line3, line4, line5))
What I'm trying to do is plot the least squares line (and points) for each group on the same plot. And I'd like the colour of the lines and points to correspond to the group.
All I've been able to do is plot the points according to their group
ggplot(data = df, aes(x, y, colour = group)) + geom_point(aes(size = 10))
But I have no idea how to add in the lines as well and make their colours correspond to the points that they are fitting.
I'd really appreciate any help with this. It's turning out to be so much harder than I though it would be.
You can simply add a geom_smooth layer to your plot
ggplot(data = df, aes(x, y, colour = group)) + geom_point(aes(size = 10)) +
geom_smooth(method="lm",se=FALSE)
method="lm" specifies that you want a linear model
se=FALSE to avoid plotting confidence intervals
I'm starting with animated charts and using gganimate package. I've found that when generating a col chart animation over time, values of variables change from original. Let me show you an example:
Data <- as.data.frame(cbind(c(1,1,1,2,2,2,3,3,3),
c("A","B","C","A","B","C","A","B","C"),
c(20,10,15,20,20,20,30,25,35)))
colnames(Data) <- c("Time","Object","Value")
Data$Time <- as.integer(Data$Time)
Data$Value <- as.numeric(Data$Value)
Data$Object <- as.character(Data$Object)
p <- ggplot(Data,aes(Object,Value)) +
stat_identity() +
geom_col() +
coord_cartesian(ylim = c(0,40)) +
transition_time(Time)
p
The chart obtained loks like this:
Values obtained in the Y-axis are between 1 and 6. It seems that the original value of 10 corresponds to a value of 1 in the Y-axis. 15 is 2, 20 is 3 and so on...
Is there a way for keeping the original values in the chart?
Thanks in advance
Your data changed when you coerced a factor variable into numeric. (see data section how to efficiently define a data.frame)
You were missing a position = "identity" for your bar charts to stay at the same place. I added a fill = Time for illustration.
Code
p <- ggplot(Data, aes(Object, Value, fill = Time)) +
geom_col(position = "identity") +
coord_cartesian(ylim = c(0, 40)) +
transition_time(Time)
p
Data
Data <- data.frame(Time = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
Object = c("A", "B", "C", "A", "B", "C", "A", "B", "C"),
Value = c(20, 10, 15, 20, 20, 20, 30, 25, 35))
I'm trying to create a line graph depicting different trajectories over time for two groups/conditions. I have two groups for which the data 'eat' was collected at five time points (1,2,3,4,5).
I'd like the lines to connect the mean point for each group at each of five time points, so I'd have two points at Time 1, two points at Time 2, and so on.
Here's a reproducible example:
#Example data
library(tidyverse)
library(ggplot2)
eat <- sample(1:7, size = 30, replace = TRUE)
df <- data.frame(id = rep(c(1, 2, 3, 4, 5, 6), each = 5),
Condition = rep(c(0, 1), each = 15),
time = c(1, 2, 3, 4, 5),
eat = eat
)
df$time <- as.factor(df$time)
df$Condition <- as.factor(df$Condition)
#Create the plot.
library(ggplot2)
ggplot(df, aes(x = time, y = eat, fill = Condition)) + geom_line() +
geom_point(size = 4, shape = 21) +
stat_summary(fun.y = mean, colour = "red", geom = "line")
The problem is, I need my lines to go horizontally (ie to show two different colored lines moving across the x-axis). But this code just connects the dots vertically:
If I don't convert Time to a factor, but only convert Condition to a factor, I get a mess of lines. The same thing happens in my actual data, as well.
I'd like it to look like this aesthetically, with the transparent error envelopes wrapping each line. However, I don't want it to be curvy, I want the lines to be straight, connecting the means at each point.
Here's the lines running in straight segments through the means of each time, with the range set to be the standard deviation of the points at the time. One stat.summary makes the mean line with the colour aesthetic, the other makes the area using the inherited fill aesthetic. ggplot2::mean_se is a convenient function that takes a vector and returns a data frame with the mean and +/- some number of standard errors. This is the right format for thefun.data argument to stat_summary, which passes these values to the geom specified. Here, geom_ribbon accepts ymin and ymax values to plot a ribbon across the graph.
library(tidyverse)
set.seed(12345)
eat <- sample(1:7, size = 30, replace = T)
df <- data.frame(
Condition = rep(c(0, 1), each = 15),
time = c(1, 2, 3, 4, 5),
eat = eat
)
df$Condition <- as.factor(df$Condition)
ggplot(df, aes(x = time, y = eat, fill = Condition)) +
geom_point(size = 4, shape = 21, colour = "black") +
stat_summary(geom = "ribbon", fun.data = mean_se, alpha = 0.2) +
stat_summary(
mapping = aes(colour = Condition),
geom = "line",
fun.y = mean,
show.legend = FALSE
)
Created on 2018-07-09 by the reprex package (v0.2.0).
Here's my best guess at what you want:
# keep time as numeric
df$time = as.numeric(as.character(df$time))
ggplot(df, aes(x = time, y = eat, group = Condition)) +
geom_smooth(
aes(fill = Condition, linetype = Condition),
method = "lm",
level = 0.65,
color = "black",
size = 0.3
) +
geom_point(aes(color = Condition))
Setting the level = 0.65 is about +/- 1 standard deviation on the linear model fit.
I think this code will get you most of the way there
library(tidyverse)
eat <- sample(1:7, size = 30, replace = TRUE)
tibble(id = rep(c(1, 2, 3, 4, 5, 6), each = 5),
Condition = factor(rep(c(0, 1), each = 15)),
time = factor(rep(c(1, 2, 3, 4, 5), 6)),
eat = eat) %>%
ggplot(aes(x = time, y = eat, fill = Condition, group = Condition)) +
geom_point(size = 4, shape = 21) +
geom_smooth()
geom_smooth is what you were looking for, I think. This creates a linear model out of the points, and as long as your x value is a factor, it should use the mean and connect the points that way.