I am using ggplot to visualize the gapminder data set. Can someone help me get the legend to show round points with their sizes respective to the population of the country?
df1<-gapminder[!(gapminder$country=="Kuwait"),]
blackline <- df1 %>%
group_by(continent, year) %>%
summarise(average = weighted.mean(gdpPercap))
p <- ggplot(data = df1, mapping = aes(x = year, y = gdpPercap)) +
scale_x_continuous(breaks = seq(1960, 2000, by = 20)) +
theme_bw() +
labs(x = "Life Expectancy",
y = "GDP Per Capita",
color = "Continent",
size = "Population (100K)") +
# colored dots
geom_point(df1, mapping = aes(col = continent, size = pop/100000)) +
# colored lines
geom_line(data = df1, aes(color = continent, group = country)) +
facet_grid(cols = vars(continent)) +
# weighted average black line
geom_line(data = blackline, aes(x = year, y = average, size = 1)) +
geom_point(data = blackline, aes(x = year, y = average, size = 1000))
p
This could be achieved by setting show.legend=FALSE in the geoms for the blackline:
library(gapminder)
library(ggplot2)
library(dplyr)
df1<-gapminder[!(gapminder$country=="Kuwait"),]
blackline <- df1 %>%
group_by(continent, year) %>%
summarise(average = weighted.mean(gdpPercap))
#> `summarise()` regrouping output by 'continent' (override with `.groups` argument)
p <- ggplot(data = df1, mapping = aes(x = year, y = gdpPercap)) +
scale_x_continuous(breaks = seq(1960, 2000, by = 20)) +
theme_bw() +
labs(x = "Life Expectancy",
y = "GDP Per Capita",
color = "Continent",
size = "Population (100K)") +
# colored dots
geom_point(aes(col = continent, size = pop/100000)) +
# colored lines
geom_line(aes(color = continent, group = country)) +
facet_grid(cols = vars(continent)) +
# weighted average black line
geom_line(data = blackline, aes(x = year, y = average, size = 1), show.legend = FALSE) +
geom_point(data = blackline, aes(x = year, y = average, size = 1000), show.legend = FALSE)
p
Related
I am trying to plot one column by Date (different color points for each animal category) and on the same graph, plot a second column by Date as well. The second column has entries for the days but only for certain categories, Large Dog. There is no adoption_with_discount for small or medium dogs (please see the reproducible example data set, example_data). When I plot them separately they visualize fine but not when plotted together. I thought I would just overlay a separate geom but that is not working.
I want to combine the two plots into one. My goal is for the points plot to have the line graph on top of it. I am trying to visualize the adoption as points colored by animal and put a line on the same graph of adoption_with_discount.
Thank you for your help!
# Make example -----------------------------------------------------------
# Here is an example data set
# You can see in the `adoption_with_discount` the values I want to add as a line.
library(lubridate)
library(tidyverse)
example_days <- data.frame(Date = c(seq.Date(from = as.Date('2022-03-01'), to = as.Date('2022-04-30'), by = 'days')))
example_small <-
example_days %>%
mutate(animal = "Small Dog")
a <-sample(100:150, nrow(example_small), rep = TRUE)
example_small <-
example_small %>%
mutate(adoption = a,
adoption_with_discount = NA)
example_med <-
example_days %>%
mutate(animal = "Medium Dog")
b <-sample(150:180, nrow(example_med), rep = TRUE)
example_med <-
example_med %>%
mutate(adoption = b,
adoption_with_discount = NA)
example_large <-
example_days %>%
mutate(animal = "Large Dog")
c <-sample(150:200, nrow(example_large), rep = TRUE)
example_large <-
example_large %>%
mutate(adoption = c)
example_large <-
example_large %>%
mutate(adoption_with_discount = adoption - 15)
example_data <- rbind(example_small, example_med, example_large)
# Plot --------------------------------------------------------------------
ggplot(data = example_data) +
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
ggtitle("Dog Adoption by Size") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
# Plot with Fee -----------------------------------------------------------
# This is where the problem is occurring
# When I want to add a line that plots the adoption with discount by day
# on top of the points, it does not populate.
ggplot(data = example_data) +
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
# See if just Discount will Plot -----------------------------------------
#This plots separately
ggplot(data = example_large) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Discount") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
While subsetting is an option to fix the issue, the reason why no line is plotted is simply the missing grouping, i.e. in geom_line you are trying to plot observations for all three dog types as one group or line. However, because of the NAs no line will show up. An easy option to solve that would be to explicitly map animal on the group aes. Additionally I added na.rm=TRUE to silent the warning about removed NAs. Finally I right aligned your axis labels by adding hjust=1:
library(ggplot2)
ggplot(data = example_data) +
geom_point(mapping = aes(
x = Date,
y = adoption,
color = animal
)) +
geom_line(
mapping = aes(
x = Date,
y = adoption_with_discount,
group = animal
),
color = "black",
na.rm = TRUE
) +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Based on discussion here I found that you can use subset argument in the aes of geom_line to select values that are not NAs in adoption_with_discount column.
ggplot(data = example_data) +
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45)) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount,
subset = !is.na(adoption_with_discount)),
color = "black") +
ggtitle("Discount") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
The result:
It looks like it is the NA that are included in the geom_line portion that is creating the issue so you can filter those out before plotting the line:
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
geom_line(data=example_data %>% filter(!is.na(adoption_with_discount)),
mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
I have a wide dataset that records the blood glucose values from 10 subjects.
library(dplyr)
df_wide = data.frame(
ID = seq(1, 10),
gender = sample(0:1, 10, replace = T),
glucose_0 = sample(100:125, 10, replace = T),
glucose_60 = sample(180:200, 10, replace = T),
glucose_120 = sample(130:160, 10, replace = T),
glucose_180 = sample(100:125, 10, replace = T)
)
I then transformed it into a long dataset using gather:
df_long = df_wide %>%
gather("glucose_0", "glucose_60", "glucose_120", "glucose_180", key = Time, value = glucose) %>%
arrange(ID)
To show how the glucose values changed from 0 min to 180 min, I then made the following line chart:
df_long %>%
ggplot(aes(x = Time, y = glucose, group = ID)) +
geom_line(aes(linetype = as.factor(gender))) +
geom_point() +
theme_classic() +
scale_x_discrete(limits = c("glucose_0", "glucose_60", "glucose_120", "glucose_180"),
labels = c("0", "60", "120", "180")) +
theme(legend.position = "bottom") +
labs(
x = "Time",
y = "Glucose",
fill = "Gender"
)
Finally, to show the glucose at each time point, I also made a barplot:
df_long %>%
ggplot(aes(x = Time, y = glucose, fill = as.factor(gender))) +
geom_bar(stat = 'identity', position = position_dodge()) +
theme_classic() +
scale_x_discrete(limits = c("glucose_0", "glucose_60", "glucose_120", "glucose_180"))
My question is: How to combine the line chart and the barplot into one figure that looks like this?
In order to plot the mean glucose levels as both bar and line
df_long %>%
group_by(gender, Time) %>%
mutate(glucose = mean(glucose)) %>%
ggplot(aes(x = Time, y = glucose, fill = as.factor(gender))) +
geom_bar(stat = 'identity', position = position_dodge()) +
geom_line(aes(linetype=as.factor(gender), group=ID)) +
theme_classic() +
scale_x_discrete(limits = c("glucose_0", "glucose_60", "glucose_120", "glucose_180"))
Are you looking for such a solution?
library(tidyverse)
df_wide %>%
pivot_longer(
starts_with("glucose")
) %>%
mutate(gender = fct_inorder(factor(gender))) %>%
arrange(ID) %>%
ggplot(aes(x = name, y = value)) +
geom_col(aes(fill = gender, group=gender), width = 0.5, position = position_dodge())+
stat_summary(aes(group = gender), fun = mean, geom = 'line', size=1, alpha=0.9) +
stat_summary(aes(group = gender), fun = mean, geom = 'point', size=2, alpha=0.9) +
theme_classic() +
scale_x_discrete(limits = c("glucose_0", "glucose_60", "glucose_120", "glucose_180"),
labels = c("0", "60", "120", "180")) +
theme(legend.position = "bottom") +
labs(
x = "Time",
y = "Glucose",
fill = "Gender"
)
I have a line chart with metrics for different years and months. I want to add a median line for each month.
library(dplyr)
library(lubridate)
library(ggplot2)
set.seed(52)
data <- tibble(
date = seq.Date(from = as.Date("2017-01-01"), to = date("2020-12-31"), by = "month")
) %>%
mutate(
metric = rnorm(n(), mean = 5, sd = 2),
month = month(date, label = TRUE),
year = as.factor(year(date))
)
ggplot(data, aes(x = month, y = metric, color = year, group = year)) +
geom_line()
I tried using geom_hline() to do this but it only produces a line for the total median.
ggplot(data, aes(x = month, y = metric, color = year, group = year)) +
geom_line() +
geom_hline(aes(yintercept = median(metric), group = month))
Created on 2021-11-18 by the reprex package (v2.0.1)
What is the easiest way to create a median line for each month? Like this:
Is this what you are looking for...?
ggplot(data, aes(x = month, y = metric, color = year, group = year)) +
geom_line() +
geom_tile(data = function(df) df %>% group_by(month) %>%
summarise(metric = median(metric)),
aes(x = month, y = metric), height = 0.05, inherit.aes = FALSE)
Alternatively, thanks to #user20650's comment below, this also works...
ggplot(data, aes(x = month, y = metric, color = year, group = year)) +
geom_line() +
stat_summary(aes(x = month, y = metric),
fun = median,
geom = "tile",
height = 0.05,
inherit.aes = FALSE)
You could use ave.
I didn't get it if you need a median line for each year or through the months, so I will show you both.
Median for each year
ggplot(data, aes(x = month, y = metric, color = year, group = year)) +
geom_line() +
geom_line(aes(y = ave(metric, year, FUN = median)))
Median through months
ggplot(data, aes(x = month, y = metric, color = year, group = year)) +
geom_line() +
geom_line(aes(y = ave(metric, month, FUN = median), colour = "median"), size = 2)
EDIT:
since you are interested in a horizontal line, here you have an alternative solution:
ggplot(data, aes(x = month, y = metric, color = year, group = year)) +
geom_line() +
stat_summary(aes(x = month, y=metric), fun = median, geom = "point", shape = "-", size = 15, inherit.aes = FALSE)
Not sure if this is what you're looking for, but you can simply create a new variable in your dataset with a median metric. Your first group observations by month, then calculate a median metric and then ungroup the data.
data <- data %>% group_by(month) %>% mutate(median_metric=median(metric)) %>% ungroup()
Once you do it, you just specify yintercept to vary by median_metric
ggplot(data ) +
geom_line(aes(x = month, y = metric, color = year, group = year))+
geom_hline(aes(yintercept = median_metric))
I have a dataframe which I would like to create a bargraph with primary and secondary axes.
geom_bar(data=top_ten_S, aes(x=Combination, y=AvgTopline), stat="identity",fill="red") +
coord_flip() +
geom_text(
data=top_ten_S,
aes(
x=Combination, y=AvgTopline,
label=paste0("R",round(AvgTopline,0)),
hjust=ifelse(AvgTopline < max(top_ten_S$AvgTopline) / 1.5, -0.1, 1.1), # <- Here lies the magic
),
)
my df looks like
top_ten_S <- data.frame(Combination = c("a", "b", "c"),
Nrcustomers = c(20, 200, 1900),
AvgTopline = c(1000,3000,1500))
I am only able to plot one column with the above code - I would like to a secondary axes so that I could plot Combination against NrCustomers and AvgTopline in
Method 1
top_ten_S %>%
gather(key, value, -Combination) %>%
ggplot(aes(x = Combination, y = value, fill = key)) +
geom_bar(stat = "identity", position = "dodge") +
labs(fill = "")
Method 2
top_ten_S %>%
gather(key, value, -Combination) %>%
ggplot(aes(x = Combination, y = value, fill = key)) +
geom_bar(stat = "identity", position = "dodge", show.legend = FALSE) +
facet_grid(. ~ key)
Edit: Method 2
top_ten_S %>%
gather(key, value, -Combination) %>%
ggplot(aes(x = Combination, y = value, fill = key)) +
geom_bar(stat = "identity", position = "dodge", show.legend = FALSE) +
facet_grid(. ~ key, space = "free_y", scales = "free_y") +
theme(axis.text.x = element_text(angle = 60))
What I would like to do is use both the position = "fill" and the position = "dodge" arguments of geom_bar() at the same time somehow. Using some sample data
set.seed(1234)
df <- data.frame(
Id = rep(1:10, each = 12),
Month = rep(1:12, times = 10),
Value = sample(1:2, 10 * 12, replace = TRUE)
)
I'm able to create the following graph
df.plot <- ggplot(df, aes(x = as.factor(Month), fill = as.factor(Value))) +
geom_bar(position = "fill") +
scale_x_discrete(breaks = 1:12) +
scale_y_continuous(labels = percent) +
labs(x = "Month", y = "Value")
I like the scaling and labeling of this graph but I want to be able to unstack it. However when I do the following
df.plot2 <- ggplot(df, aes(x = as.factor(Month), fill = as.factor(Value))) +
geom_bar(position = "dodge", aes(y = (..count..)/sum(..count..))) +
scale_x_discrete(breaks = 1:12) +
scale_y_continuous(labels = percent) +
labs(x = "Month", y = "Value")
The bars are in the position and scaling that I want but the y-axis labels represent the percentage of each bar relative to the total count, not the count within each month.
All in all I want the visuals of the second graph with the labeling of the first graph. Is there a relatively easy way to automate this?
Expanding on my comment:
library(ggplot2)
library(dplyr)
library(tidyr)
library(scales)
df1 <- df %>%
group_by(Month) %>%
summarise(Value1 = sum(Value == 1) / n(),
Value2 = sum(Value == 2) / n()) %>%
gather(key = Group,value = Val,Value1:Value2)
df.plot2 <- ggplot(df1, aes(x = as.factor(Month),
y = Val,
fill = as.factor(Group))) +
geom_bar(position = "dodge",stat = "identity") +
scale_y_continuous(labels = percent_format()) +
scale_x_discrete(breaks = 1:12) +
labs(x = "Month", y = "Value")