sum range rows in data frame r - r

I have the data frame below, and would like to add one column SumNextMonths calculating for each month the sum of the SalesForecasts of the next 3months.
For example:
SumNextMonths in Feb = SalesForecastsof Mar + Apr +May.
d<- data.frame(Period =c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"),
SalesForecasts = c(100, 100, 200, 100, 300, 300, 100, 100, 200, 300,100, 100, 500, 500, 1000, 500, 1500, 1500, 500, 500, 1000, 1500, 500, 500))
Would anyone know how to proceed?
Thank you,
Cheers

A generic answer could be:
library(dplyr)
library(zoo)
d %>%
mutate(temp_lead = lead(SalesForecasts, n = 1),
SalesforNext3Months = rollapply(data = temp_lead,
width = 3,
FUN = sum,
align = "left",
fill = NA,
na.rm = T))%>%
select(-temp_lead)
Period SalesForecasts SalesforNext3Months
1 Jan 100 400
2 Feb 100 600
3 Mar 200 700
4 Apr 100 700
5 May 300 500
6 Jun 300 400

Related

How to calculate the pre and post values ​each time an event appears on a dataframe in R

I have a database like this example here, and I need to calculate the average, the sum and the standard deviation of the 10 minutes prior to the event and also of the 10 minutes from the event. The numbers in dataframes are just to illustrate.
df = data.frame(seq(ISOdatetime(2020,5,4,0,0,0), ISOdatetime(2020,5,4,0,19,0), by=(60*1)))
colnames(df) = "date"
df$results = c(18, 25, 35, 96, 100, 50, 48, 67, 36, 98, 46, 75, 67, 36, 98, 46, 75, 67, 36, 98)
df$events = c(NA, NA, NA, NA, NA, NA, "event A", NA, NA, "event B", NA, "event C", NA, NA, NA, NA, NA, NA, "event D", NA)
I would like to join these calculations in a new data frame, containing, the event name and date, and the values ​​of average, sum and standard deviation in the pre and also in the later moment. Thank you for any help.
Edit:
I would like to make the previous values ​​(of sum, average and standard deviation) to the event (that is, the 10 minutes immediately preceding), and the subsequent values ​​from of the event (that is, if the event happened 10:01, I would like to take the "01" + nine minutes ahead. So, I need separate the values in "avg.pre", "sd.pre", "sum.pre" and "avg.pos", "sd.pos" and "sum.pos"
new.df = data.frame(c("event A", "event B", "event C", "event D")); colnames(new.df) = "events"
new.df$date.pre = c("2020-05-04 00:06:00", "2020-05-04 00:09:00", " 2020-05-04 00:11:00", " 2020-05-04 00:18:00")
new.df$avg.pre = c(53.14, 52.77...)
new.df$sd.pre = c(32.72, 31.18,...)
new.df$sum.pre = c(372, 475, 68,...)
new.df$avg.pos = c(61.71, 64.4, 25,...)
new.df$sd.pos = c(23.26, 23.06, 46,...)
new.df$sum.pos = c(617, 644, 68,...)
You can try this approach filter data for 10 mins before and after each event :
library(dplyr)
purrr::map_df(unique(na.omit(df$evets)), function(x) {
i <- df$date[df$evets == x & !is.na(df$evets)]
bind_cols(data.frame(events = x,
date = i), df %>%
filter(between(date, i - 10*60, i)) %>%
summarise(pre.avg = mean(results, na.rm = TRUE),
pre.sd = sd(results, na.rm = TRUE),
pre.sum = sum(results, na.rm = TRUE)),
df %>%
filter(between(date, i, i + 10*60)) %>%
summarise(post.avg = mean(results, na.rm = TRUE),
post.sd = sd(results, na.rm = TRUE),
post.sum = sum(results, na.rm = TRUE)))
})
# events date pre.avg pre.sd pre.sum post.avg post.sd post.sum
#1 event A 2020-05-04 00:06:00 53.14286 32.72323 372 62.90909 22.42969 692
#2 event B 2020-05-04 00:09:00 57.30000 31.18066 573 67.45455 24.10960 742
#3 event C 2020-05-04 00:11:00 61.45455 27.30701 676 66.44444 23.47931 598
#4 event D 2020-05-04 00:18:00 61.81818 23.49391 680 67.00000 43.84062 134

ggplot boxplot with custom X-Axis and grouping and sorting on separate values

I'm trying to create a boxplot based on timeseries data for multiple years. I want to group observations from multiple years by a variable "DAP" (similar to day of year 0-365), order them by day from November to March but only display the Month on the X-Axis.
I can create a custom order and X-Axis by creating a factor with each month, that works
level_order <- c('November', 'December', 'January', 'February', 'March')
plot <- ggplot(data = df, aes(y = y, x = factor(Month,level = level_order), group=DAP)) +
geom_boxplot(fill="grey85", width = 2.0) +
scale_x_discrete(limits = level_order)
plot
Now I'm stuck making the alignment on the X-Axis according to the days of the month. For example the first datapoint from November 26th needs to more right, closer to December.
Changing the X-Axis to "Date" creates monthly labels for each year and also removed the grouping.
plot <- ggplot(data = df, aes(y = y, x = Date, group=DAP)) +
geom_boxplot(fill="grey85")
plot + scale_x_date(date_breaks = "1 month", date_labels = "%B")
Setting the X-Axis to "DAP" instead of date gives me the correct order and spacing , but I need to display month on the X-Axis. How can I combine this last graph with the X-Axis labeling of graph 1?
plot <- ggplot(data = df, aes(y = y, x = DAP, group=DAP)) +
geom_boxplot(fill="grey85")
plot
and here a sample of the dataset
DAP Date Month y
1 47 2010-11-26 November 0.6872708
21 116 2011-02-03 February 0.7643213
41 68 2011-12-17 December 0.7021531
61 137 2012-02-24 February 0.7178306
81 92 2013-01-10 January 0.7330749
101 44 2013-11-23 November 0.6610618
121 113 2014-01-31 January 0.7961012
141 68 2014-12-17 December 0.7510821
161 137 2015-02-24 February 0.7799938
181 92 2016-01-10 January 0.6861423
201 47 2016-11-26 November 0.7155526
221 116 2017-02-03 February 0.7397810
241 72 2017-12-21 December 0.7259670
261 144 2018-03-03 March 0.6725775
281 106 2019-01-24 January 0.7637322
301 65 2019-12-14 December 0.7184616
321 134 2020-02-21 February 0.6760159
The following approach uses tidyverse. The date is separated into year-month-day and those newly created columns are made numeric. In the ggplot part position_dodge2(preserve = "single") is used which keeps the boxwidth the same. scale_x_discrete helps to redefine x-axis breaks and tick labels. width = 1 controls the distance between the boxes.
library(tidyverse)
df <- tibble::tribble(
~DAP, ~Date, ~Month, ~y,
47, "2010-11-26", "November", 0.6872708,
116, "2011-02-03", "February", 0.7643213,
68, "2011-12-17", "December", 0.7021531,
137, "2012-02-24", "February", 0.7178306,
92, "2013-01-10", "January", 0.7330749,
44, "2013-11-23", "November", 0.6610618,
113, "2014-01-31", "January", 0.7961012,
68, "2014-12-17", "December", 0.7510821,
137, "2015-02-24", "February", 0.7799938,
92, "2016-01-10", "January", 0.6861423,
47, "2016-11-26", "November", 0.7155526,
116, "2017-02-03", "February", 0.7397810,
72, "2017-12-21", "December", 0.7259670,
144, "2018-03-03", "March", 0.6725775,
106, "2019-01-24", "January", 0.7637322,
65, "2019-12-14", "December", 0.7184616,
134, "2020-02-21", "February", 0.6760159
)
df$Date <- as.Date(df$Date)
df %>%
separate(Date, sep = "-", into = c("year", "month", "day")) %>%
mutate_at(vars("year":"day"), as.numeric) %>%
select(-c(year, Month)) %>%
ggplot(aes(
x = factor(month, level = c(11, 12, 1, 2, 3)), y = y,
group = DAP, color = factor(month)
)) +
geom_boxplot(width = 1, lwd = 0.2, position = position_dodge2(preserve = "single")) +
scale_x_discrete(
breaks = c(11, 12, 1, 2, 3),
labels = c("November", "December", "January", "February", "March")
) +
labs(x = "") +
theme(legend.position = "none")
Try this. To get the right order, spacing and labels I make a new date. As year seems to be not relevant I set the year for obs November and December to 2019,
and for the other obs to 2020.
df <- structure(list(DAP = c(
47L, 116L, 68L, 137L, 92L, 44L, 113L,
68L, 137L, 92L, 47L, 116L, 72L, 144L, 106L, 65L, 134L
), Date = c(
"2010-11-26",
"2011-02-03", "2011-12-17", "2012-02-24", "2013-01-10", "2013-11-23",
"2014-01-31", "2014-12-17", "2015-02-24", "2016-01-10", "2016-11-26",
"2017-02-03", "2017-12-21", "2018-03-03", "2019-01-24", "2019-12-14",
"2020-02-21"
), Month = c(
"November", "February", "December",
"February", "January", "November", "January", "December", "February",
"January", "November", "February", "December", "March", "January",
"December", "February"
), y = c(
0.6872708, 0.7643213, 0.7021531,
0.7178306, 0.7330749, 0.6610618, 0.7961012, 0.7510821, 0.7799938,
0.6861423, 0.7155526, 0.739781, 0.725967, 0.6725775, 0.7637322,
0.7184616, 0.6760159
)), row.names = c(NA, -17L), class = "data.frame")
library(ggplot2)
# Make a new Date to get the correct order as with DAP.
# Set year for obs November and Decemeber to 2019,
# for other Obs to 2020,
df$Date1 <- gsub("20\\d{2}-(1\\d{1})", "2019-\\1", df$Date)
df$Date1 <- gsub("20\\d{2}-(0\\d{1})", "2020-\\1", df$Date1)
df$Date1 <- as.Date(df$Date1)
# use new date gives correcr order, spacing and labels
# Also adjusted limits
plot <- ggplot(data = df, aes(y = y, x = Date1, group = DAP)) +
geom_boxplot(fill = "grey85")
plot +
scale_x_date(date_breaks = "1 month", date_labels = "%B", limits = c(as.Date("2019-11-01"), as.Date("2020-03-31")))

Combining 2 columns with values in R

I'm working on a dataset about football. So I've made some time series analyses. I've calculated the amount of goals and the amount of goals in the previous month. Now I'm trying to plot it into a graph.
I'm trying to make a group bar chart with the goals of a certain month and from the previous month next to it.
This is the code that I'm using:
df_eredivisie %>%
group_by(month= month(date, label = TRUE)) %>%
summarise(goals = sum(FTHG + FTAG)) %>%
mutate(last = lag(goals, 1))
So this is the result (Sorry can't post pictures :/ ):
month goals last
Jan 69 NA
Feb 121 69
Mar 116 121
Apr 155 116
May 78 155
Aug 88 78
Sep 124 88
Oct 91 124
Nov 91 91
Dec 128 91
Could someone help me with the grouped bar chart? I've tried to combine the columns, so I could do fill and have the goals and last with different colours. But I couldn't figure out how to do that.
Your data need to be in long format, then it's simple:
library(ggplot2)
library(tidyverse)
df <- tribble(~month, ~goals, ~last,
"Jan", 69, NA,
"Feb", 121, 69,
"Mar", 116, 121,
"Apr", 155, 116,
"May", 78, 155,
"Aug", 88, 78,
"Sep", 124, 88,
"Oct", 91, 124,
"Nov", 91, 91,
"Dec", 128, 91)
df %>%
pivot_longer(cols = 2:3, names_to = "category") %>%
mutate(month = factor(month, levels = month.abb)) %>%
ggplot(aes(x = month, y = value, fill = category)) +
geom_col(position = "dodge")
#> Warning: Removed 1 rows containing missing values (geom_col).
Created on 2020-06-07 by the reprex package (v0.3.0)
If you reverse the factors, it looks like this:
df %>%
pivot_longer(cols = 2:3, names_to = "category") %>%
mutate(month = factor(month, levels = month.abb)) %>%
ggplot(aes(x = month, y = value, fill = forcats::fct_rev(category))) +
geom_col(position = "dodge")
#> Warning: Removed 1 rows containing missing values (geom_col).
Created on 2020-06-07 by the reprex package (v0.3.0)
So it works, but the second column does not add any information, as you can see the previous month right next to it...

R: Spreading the words in wordcloud

I'm struggling to spread the words when using wordcloud function.
data = tibble(Day = c("January", "February", "March" , "April", "May", "June", "July", "August", "Semptember", "October", "November", "December"),
Freq = c(1294, 1073, 1071, 1019, 938, 912, 703, 680, 543, 201, 190, 343))
set.seed(10)
wordcloud(words = data$Day, freq = data$Freq, min.freq = 1,
random.order=T, scale=c(3,.5), rot.per = 0)
I tried to save the output using ggsave function but this is what I got:
Desired output:
I couldn't find a way to do this in wordcloud but wordcloud2 gives rather more flexibility. I managed to cobble together this with help from another SO question for saving as an image file.
#packages enable saving to png or pdf via html, see link at end of answer
library(webshot)
webshot::install_phantomjs()
library("htmlwidgets")
library(tibble)
library(wordcloud2)
data = tibble(Day = c("January", "February", "March" , "April", "May", "June", "July", "August", "Semptember", "October", "November", "December"),
Freq = c(1294, 1073, 1071, 1019, 938, 912, 703, 680, 543, 201, 190, 343))
set.seed(10)
# control appearance with wordcloud2 arguments. The padding between words is controlled by `gridsize`.
# You have to play around with `size`, `gridSize` and the image size
eg <- wordcloud2(data, size = 0.4, rotateRatio = 0, color = "black", gridSize = 75)
# save as html
saveWidget(wc,"wc.html", selfcontained = F)
# and then as image:png
webshot("wc.html","wc.png", delay = 5, vwidth = 480, vheight = 480)
For saving the image to file see: How to Save the wordcloud in R
And you end up with:
Created on 2020-05-18 by the reprex package (v0.3.0)

Special Stacked Bar Chart R ggplot

Can you help me make the following bar chart in R? I have some simplified dummy data that i am using to recreate, and then my plan is to manipulate the data in the same way. No need to do the abline. The most important parts are the waterfall aspect.
ï..labels value
1 start 100
2 january 120
3 febuary 140
4 march 160
5 april 180
6 may 130
7 june 140
8 july 170
9 august 160
10 september 180
11 october 190
12 november 210
13 december 200
14 end 200
This gets you the waterfall effect:
library(tidyverse)
df <-
tibble::tribble(
~month, ~month_name, ~value,
1, "start", 100,
2, "january", 120,
3, "febuary", 140,
4, "march", 160,
5, "april", 180,
6, "may", 130,
7, "june", 140,
8, "july", 170,
9, "august", 160,
10, "september", 180,
11, "october", 190,
12, "november", 210,
13, "december", 200,
14, "end", 200
) %>%
mutate(
type = case_when(
month == min(month) ~ "Initial",
month == max(month) ~ "Final",
value > lag(value) ~ "Increase",
TRUE ~ "Decrease"
),
finish = value,
start = if_else(month == max(month), 0, replace_na(lag(value), 0))
)
df %>%
ggplot(aes(xmin = month - 0.3, xmax = month + 0.3, ymin = start, ymax = finish, fill = type)) +
geom_rect() +
scale_x_continuous(
breaks = 1:14,
labels = df %>% select(month_name) %>% pull()
) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none"
)
You should be able to take care of the formatting and colors from here ;)

Resources