Special Stacked Bar Chart R ggplot - r

Can you help me make the following bar chart in R? I have some simplified dummy data that i am using to recreate, and then my plan is to manipulate the data in the same way. No need to do the abline. The most important parts are the waterfall aspect.
ï..labels value
1 start 100
2 january 120
3 febuary 140
4 march 160
5 april 180
6 may 130
7 june 140
8 july 170
9 august 160
10 september 180
11 october 190
12 november 210
13 december 200
14 end 200

This gets you the waterfall effect:
library(tidyverse)
df <-
tibble::tribble(
~month, ~month_name, ~value,
1, "start", 100,
2, "january", 120,
3, "febuary", 140,
4, "march", 160,
5, "april", 180,
6, "may", 130,
7, "june", 140,
8, "july", 170,
9, "august", 160,
10, "september", 180,
11, "october", 190,
12, "november", 210,
13, "december", 200,
14, "end", 200
) %>%
mutate(
type = case_when(
month == min(month) ~ "Initial",
month == max(month) ~ "Final",
value > lag(value) ~ "Increase",
TRUE ~ "Decrease"
),
finish = value,
start = if_else(month == max(month), 0, replace_na(lag(value), 0))
)
df %>%
ggplot(aes(xmin = month - 0.3, xmax = month + 0.3, ymin = start, ymax = finish, fill = type)) +
geom_rect() +
scale_x_continuous(
breaks = 1:14,
labels = df %>% select(month_name) %>% pull()
) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none"
)
You should be able to take care of the formatting and colors from here ;)

Related

Plotting bar plots with R

Am trying to create an R dodged geom bar with this data but am not getting a plot that i need
Department Male Female
<chr> <int> <int>
1 "Admin Offices" 3 6
2 "Executive Office" 0 1
3 "IT/IS" 28 22
4 "Production " 83 126
5 "Sales" 16 15
6 "Software Engineering" 5 6
What i tried seems pretty wrong so anyone can help
I can only guess what you need actually. However The result could be this:
library(dplyr)
library(tidyr)
library(ggplot2)
df<-data.frame(Department = c("Admin Offices",
"Executive Office" ,
"IT/IS",
"Production",
"Sales",
"Software Engineering" ),
Male = c(3, 0, 28, 83, 16, 5),
Female = c(6, 1, 22, 126, 15, 6))
df %>% pivot_longer(cols = c("Male", "Female")) %>%
transmute(Department, Gender = as.factor(name), Value = value) %>%
ggplot() +
geom_bar(aes(x = Department, y = Value, fill = Gender), stat = "identity", position = position_dodge(0.9))
And the same with barplot from base R:
df<-data.frame(Department = c("Admin Offices",
"Executive Office" ,
"IT/IS",
"Production",
"Sales",
"Software Engineering" ),
Male = c(3, 0, 28, 83, 16, 5),
Female = c(6, 1, 22, 126, 15, 6))
barplot(t(df[,-1]), beside=T,
names.arg=df$Department,
legend.text=names(df[,-1]))

How do I write a function to plot a line graph for each factor in a dataframe?

I have a dataframe, the head of which looks like this:
|trackName | week| sum|
|:--------------------|----:|---:|
|New Slang | 1| 493|
|You're Somebody Else | 1| 300|
|Mushaboom | 1| 297|
|San Luis | 1| 296|
I am interested in plotting a line graph for each of the 346 unique trackNames in the dataframe, with week on the x-axis and sum on the y-axis. To automate this process, I wrote the following function:
charts <- function(df) {
songs <- df
lim <- nrow(songs)
x <- 1
song_names <- as_tibble(unique(songs$trackName))
while (x <= lim) {
song <- song_names[x, 1]
plot.name <- paste(paste(song), "plot.png", sep = "_")
songs %>% filter(trackName == paste(song[x, 1])) %>%
ggplot(., aes(x = week, y = sum), group = 1) +
geom_line() +
labs(
x = "Week",
y = "Sum of Listens",
title = paste("Week by Week Listening Interest for", song, sep = " "),
subtitle = "Calculated by plotting the sum of percentages of the song listened per week, starting from first listen"
) +
ggsave(plot.name,
width = 20,
height = 15,
units = "cm")
x <- x + 1
}
}
However when I run charts(df), only the following error shows up and then it quits:
> charts(mini)
geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?
>
What am I doing wrong here and what does this error mean?
A sample of the dataframe in DPUT format:
structure(list(trackName = c("New Slang", "You're Somebody Else",
"Mushaboom", "San Luis", "The Trapeze Swinger", "Flightless Bird, American Mouth",
"tere bina - Acoustic", "Only for a Moment", "Upward Over the Mountain",
"Virginia May", "Never to Be Forgotten Kinda Year", "Little Talks",
"Jhak Maar Ke", "Big Rock Candy Mountain", "Sofia", "Aaoge Tum Kabhi",
"Deathcab", "Dil Mere", "Choke", "Phir Le Aya Dil", "Lucille",
"tere bina - Acoustic", "Dil Mere", "Only for a Moment", "This Is The Life",
"San Luis", "Main Bola Hey!", "Choo Lo", "Yeh Zindagi Hai", "Aaftaab",
"Never to Be Forgotten Kinda Year", "Khudi", "Flightless Bird, American Mouth",
"Mere Bina", "Simple Song", "Dil Haare", "Dil Hi Toh Hai", "You're Somebody Else",
"Sofia", "Who's Laughing Now", "Main Bola Hey!", "Lucille", "Eenie Meenie",
"tere bina - Acoustic", "New Slang", "Aaftaab", "Mamma Mia",
"July", "Yeh Zindagi Hai", "Someone You Loved"), week = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3), sum = c(493, 300, 297, 296, 292, 234, 214,
200, 200, 197, 192, 187, 185, 181, 175, 172, 141, 119, 106, 103,
579, 574, 501, 462, 428, 378, 320, 307, 306, 301, 301, 300, 300,
300, 300, 300, 296, 294, 251, 242, 3534, 724, 696, 512, 479,
400, 302, 300, 300, 300)), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))
How about using purrr::walk instead?
library(tidyverse)
library(hrbrthemes)
walk(unique(songs$trackName),
~{ggsave(plot = ggplot(filter(songs, trackName == .x), aes(x = week, y = sum), group = 1) +
geom_line(color = ft_cols$yellow) +
labs(x = "Week", y = "Sum of Listens", title = paste("Week by Week Listening Interest for", .x, sep = " "),
subtitle = "Calculated by plotting the sum of percentages of the song listened per week, starting from first listen") +
theme_ft_rc(),
file = paste0(.x,"_plot.png"), width = 20, height = 15, units = "cm")})
Note: the question was subsequently edited to remove the hrbrthemes package requirement.
You can split the dataset for each trackName and create a png file for it.
library(tidyverse)
charts <- function(df) {
df %>%
group_split(trackName) %>%
map(~{
track <- first(.x$trackName)
ggplot(.x, aes(x = factor(week), y = sum, group = 1)) +
geom_line() +
labs(
x = "Week",
y = "Sum of Listens",
title = paste("Week by Week Listening Interest for", track),
subtitle = "Calculated by plotting the sum of percentages of the song listened per week, starting from first listen"
) -> plt
ggsave(paste0(track,'.png'), plt, width = 20, height = 15, units = "cm")
})
}
charts(songs)

ggplot boxplot with custom X-Axis and grouping and sorting on separate values

I'm trying to create a boxplot based on timeseries data for multiple years. I want to group observations from multiple years by a variable "DAP" (similar to day of year 0-365), order them by day from November to March but only display the Month on the X-Axis.
I can create a custom order and X-Axis by creating a factor with each month, that works
level_order <- c('November', 'December', 'January', 'February', 'March')
plot <- ggplot(data = df, aes(y = y, x = factor(Month,level = level_order), group=DAP)) +
geom_boxplot(fill="grey85", width = 2.0) +
scale_x_discrete(limits = level_order)
plot
Now I'm stuck making the alignment on the X-Axis according to the days of the month. For example the first datapoint from November 26th needs to more right, closer to December.
Changing the X-Axis to "Date" creates monthly labels for each year and also removed the grouping.
plot <- ggplot(data = df, aes(y = y, x = Date, group=DAP)) +
geom_boxplot(fill="grey85")
plot + scale_x_date(date_breaks = "1 month", date_labels = "%B")
Setting the X-Axis to "DAP" instead of date gives me the correct order and spacing , but I need to display month on the X-Axis. How can I combine this last graph with the X-Axis labeling of graph 1?
plot <- ggplot(data = df, aes(y = y, x = DAP, group=DAP)) +
geom_boxplot(fill="grey85")
plot
and here a sample of the dataset
DAP Date Month y
1 47 2010-11-26 November 0.6872708
21 116 2011-02-03 February 0.7643213
41 68 2011-12-17 December 0.7021531
61 137 2012-02-24 February 0.7178306
81 92 2013-01-10 January 0.7330749
101 44 2013-11-23 November 0.6610618
121 113 2014-01-31 January 0.7961012
141 68 2014-12-17 December 0.7510821
161 137 2015-02-24 February 0.7799938
181 92 2016-01-10 January 0.6861423
201 47 2016-11-26 November 0.7155526
221 116 2017-02-03 February 0.7397810
241 72 2017-12-21 December 0.7259670
261 144 2018-03-03 March 0.6725775
281 106 2019-01-24 January 0.7637322
301 65 2019-12-14 December 0.7184616
321 134 2020-02-21 February 0.6760159
The following approach uses tidyverse. The date is separated into year-month-day and those newly created columns are made numeric. In the ggplot part position_dodge2(preserve = "single") is used which keeps the boxwidth the same. scale_x_discrete helps to redefine x-axis breaks and tick labels. width = 1 controls the distance between the boxes.
library(tidyverse)
df <- tibble::tribble(
~DAP, ~Date, ~Month, ~y,
47, "2010-11-26", "November", 0.6872708,
116, "2011-02-03", "February", 0.7643213,
68, "2011-12-17", "December", 0.7021531,
137, "2012-02-24", "February", 0.7178306,
92, "2013-01-10", "January", 0.7330749,
44, "2013-11-23", "November", 0.6610618,
113, "2014-01-31", "January", 0.7961012,
68, "2014-12-17", "December", 0.7510821,
137, "2015-02-24", "February", 0.7799938,
92, "2016-01-10", "January", 0.6861423,
47, "2016-11-26", "November", 0.7155526,
116, "2017-02-03", "February", 0.7397810,
72, "2017-12-21", "December", 0.7259670,
144, "2018-03-03", "March", 0.6725775,
106, "2019-01-24", "January", 0.7637322,
65, "2019-12-14", "December", 0.7184616,
134, "2020-02-21", "February", 0.6760159
)
df$Date <- as.Date(df$Date)
df %>%
separate(Date, sep = "-", into = c("year", "month", "day")) %>%
mutate_at(vars("year":"day"), as.numeric) %>%
select(-c(year, Month)) %>%
ggplot(aes(
x = factor(month, level = c(11, 12, 1, 2, 3)), y = y,
group = DAP, color = factor(month)
)) +
geom_boxplot(width = 1, lwd = 0.2, position = position_dodge2(preserve = "single")) +
scale_x_discrete(
breaks = c(11, 12, 1, 2, 3),
labels = c("November", "December", "January", "February", "March")
) +
labs(x = "") +
theme(legend.position = "none")
Try this. To get the right order, spacing and labels I make a new date. As year seems to be not relevant I set the year for obs November and December to 2019,
and for the other obs to 2020.
df <- structure(list(DAP = c(
47L, 116L, 68L, 137L, 92L, 44L, 113L,
68L, 137L, 92L, 47L, 116L, 72L, 144L, 106L, 65L, 134L
), Date = c(
"2010-11-26",
"2011-02-03", "2011-12-17", "2012-02-24", "2013-01-10", "2013-11-23",
"2014-01-31", "2014-12-17", "2015-02-24", "2016-01-10", "2016-11-26",
"2017-02-03", "2017-12-21", "2018-03-03", "2019-01-24", "2019-12-14",
"2020-02-21"
), Month = c(
"November", "February", "December",
"February", "January", "November", "January", "December", "February",
"January", "November", "February", "December", "March", "January",
"December", "February"
), y = c(
0.6872708, 0.7643213, 0.7021531,
0.7178306, 0.7330749, 0.6610618, 0.7961012, 0.7510821, 0.7799938,
0.6861423, 0.7155526, 0.739781, 0.725967, 0.6725775, 0.7637322,
0.7184616, 0.6760159
)), row.names = c(NA, -17L), class = "data.frame")
library(ggplot2)
# Make a new Date to get the correct order as with DAP.
# Set year for obs November and Decemeber to 2019,
# for other Obs to 2020,
df$Date1 <- gsub("20\\d{2}-(1\\d{1})", "2019-\\1", df$Date)
df$Date1 <- gsub("20\\d{2}-(0\\d{1})", "2020-\\1", df$Date1)
df$Date1 <- as.Date(df$Date1)
# use new date gives correcr order, spacing and labels
# Also adjusted limits
plot <- ggplot(data = df, aes(y = y, x = Date1, group = DAP)) +
geom_boxplot(fill = "grey85")
plot +
scale_x_date(date_breaks = "1 month", date_labels = "%B", limits = c(as.Date("2019-11-01"), as.Date("2020-03-31")))

Combining 2 columns with values in R

I'm working on a dataset about football. So I've made some time series analyses. I've calculated the amount of goals and the amount of goals in the previous month. Now I'm trying to plot it into a graph.
I'm trying to make a group bar chart with the goals of a certain month and from the previous month next to it.
This is the code that I'm using:
df_eredivisie %>%
group_by(month= month(date, label = TRUE)) %>%
summarise(goals = sum(FTHG + FTAG)) %>%
mutate(last = lag(goals, 1))
So this is the result (Sorry can't post pictures :/ ):
month goals last
Jan 69 NA
Feb 121 69
Mar 116 121
Apr 155 116
May 78 155
Aug 88 78
Sep 124 88
Oct 91 124
Nov 91 91
Dec 128 91
Could someone help me with the grouped bar chart? I've tried to combine the columns, so I could do fill and have the goals and last with different colours. But I couldn't figure out how to do that.
Your data need to be in long format, then it's simple:
library(ggplot2)
library(tidyverse)
df <- tribble(~month, ~goals, ~last,
"Jan", 69, NA,
"Feb", 121, 69,
"Mar", 116, 121,
"Apr", 155, 116,
"May", 78, 155,
"Aug", 88, 78,
"Sep", 124, 88,
"Oct", 91, 124,
"Nov", 91, 91,
"Dec", 128, 91)
df %>%
pivot_longer(cols = 2:3, names_to = "category") %>%
mutate(month = factor(month, levels = month.abb)) %>%
ggplot(aes(x = month, y = value, fill = category)) +
geom_col(position = "dodge")
#> Warning: Removed 1 rows containing missing values (geom_col).
Created on 2020-06-07 by the reprex package (v0.3.0)
If you reverse the factors, it looks like this:
df %>%
pivot_longer(cols = 2:3, names_to = "category") %>%
mutate(month = factor(month, levels = month.abb)) %>%
ggplot(aes(x = month, y = value, fill = forcats::fct_rev(category))) +
geom_col(position = "dodge")
#> Warning: Removed 1 rows containing missing values (geom_col).
Created on 2020-06-07 by the reprex package (v0.3.0)
So it works, but the second column does not add any information, as you can see the previous month right next to it...

sum range rows in data frame r

I have the data frame below, and would like to add one column SumNextMonths calculating for each month the sum of the SalesForecasts of the next 3months.
For example:
SumNextMonths in Feb = SalesForecastsof Mar + Apr +May.
d<- data.frame(Period =c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"),
SalesForecasts = c(100, 100, 200, 100, 300, 300, 100, 100, 200, 300,100, 100, 500, 500, 1000, 500, 1500, 1500, 500, 500, 1000, 1500, 500, 500))
Would anyone know how to proceed?
Thank you,
Cheers
A generic answer could be:
library(dplyr)
library(zoo)
d %>%
mutate(temp_lead = lead(SalesForecasts, n = 1),
SalesforNext3Months = rollapply(data = temp_lead,
width = 3,
FUN = sum,
align = "left",
fill = NA,
na.rm = T))%>%
select(-temp_lead)
Period SalesForecasts SalesforNext3Months
1 Jan 100 400
2 Feb 100 600
3 Mar 200 700
4 Apr 100 700
5 May 300 500
6 Jun 300 400

Resources