Create dataframe with month start and end in R - r

I want to create a dataframe from a given start and end date:
start_date <- as.Date("2020-05-17")
end_date <- as.Date("2020-06-23")
For each row in this dataframe, I should have the start day and end day of the month, so the expected output is:
start end month year
2020-05-17 2020-05-31 May 2020
2020-06-01 2020-06-23 June 2020
I have tried to create a sequence, but I'm stuck on what to do next:
day_seq <- seq(start_date, end_date, 1)
Please, a base R or tidyverse solution will be greatly appreciated.

1) yearmon Using start_date and end_date from the question create a yearmon sequence and then each of the desired columns is a simple one line computation. The stringAsFactors line can be omitted under R 4.0 onwards as that is the default there.
library(zoo)
ym <- seq(as.yearmon(start_date), as.yearmon(end_date), 1/12)
data.frame(start = pmax(start_date, as.Date(ym)),
end = pmin(end_date, as.Date(ym, frac = 1)),
month = month.name[cycle(ym)],
year = as.integer(ym),
stringsAsFactors = FALSE)
giving:
start end month year
1 2020-05-17 2020-05-31 May 2020
2 2020-06-01 2020-06-23 June 2020
2) Base R This follows similar logic and gives the same answer. We first define a function month1 which given a Date class vector x returns a Date vector the same length but for the first of the month.
month1 <- function(x) as.Date(cut(x, "month"))
months <- seq(month1(start_date), month1(end_date), "month")
data.frame(start = pmax(start_date, months),
end = pmin(end_date, month1(months + 31) - 1),
month = format(months, "%B"),
year = as.numeric(format(months, "%Y")),
stringsAsFactors = FALSE)

A while ago that I used the tidyverse, but here is my go at things..
sample data
different sample data to tagckle some problems wher the year changes..
start_date <- as.Date("2020-05-17")
end_date <- as.Date("2021-06-23")
code
library( tidyverse )
library( lubridate )
#create a sequence of days from start to end
tibble( date = seq( start_date, end_date, by = "1 day" ) ) %>%
mutate( month = lubridate::month( date ),
year = lubridate::year( date ),
end = as.Date( paste( year, month, lubridate::days_in_month(date), sep = "-" ) ) ) %>%
#the end of the last group is now always larger than tghe maximum date... repair!
mutate( end = if_else( end > max(date), max(date), end ) ) %>%
group_by( year, month ) %>%
summarise( start = min( date ),
end = max( end ) ) %>%
select( start, end, month, year )
output
# # A tibble: 14 x 4
# # Groups: year [2]
# start end month year
# <date> <date> <dbl> <dbl>
# 1 2020-05-17 2020-05-31 5 2020
# 2 2020-06-01 2020-06-30 6 2020
# 3 2020-07-01 2020-07-31 7 2020
# 4 2020-08-01 2020-08-31 8 2020
# 5 2020-09-01 2020-09-30 9 2020
# 6 2020-10-01 2020-10-31 10 2020
# 7 2020-11-01 2020-11-30 11 2020
# 8 2020-12-01 2020-12-31 12 2020
# 9 2021-01-01 2021-01-31 1 2021
# 10 2021-02-01 2021-02-28 2 2021
# 11 2021-03-01 2021-03-31 3 2021
# 12 2021-04-01 2021-04-30 4 2021
# 13 2021-05-01 2021-05-31 5 2021
# 14 2021-06-01 2021-06-23 6 2021

For the specific period in your question, you may use:
library(lubridate)
start_date <- as.Date("2020-05-17")
end_date <- as.Date("2020-06-23")
start <- c(start_date, floor_date(end_date, unit = 'months'))
end <- c(ceiling_date(start_date, unit = 'months'), end_date)
month <- c(as.character(month(start[1], label = TRUE)),
as.character(month(start[2], label = TRUE)))
year <- c(year(start[1]), year(start[2]))
data.frame(start, end, month, year, stringsAsFactors = FALSE)

Here is one approach using intervals with lubridate. You would create a full interval between the 2 dates of interest, and then intersect with monthly ranges for each month (first to last day each month).
library(tidyverse)
library(lubridate)
start_date <- as.Date("2020-05-17")
end_date <- as.Date("2021-08-23")
full_int <- interval(start_date, end_date)
month_seq = seq(start_date, end_date, by = "month")
month_int = interval(floor_date(month_seq, "month"), ceiling_date(month_seq, "month") - days(1))
data.frame(interval = intersect(full_int, month_int)) %>%
mutate(start = int_start(interval),
end = int_end(interval),
month = month.abb[month(start)],
year = year(start)) %>%
select(-interval)
Output
start end month year
1 2020-05-17 2020-05-31 May 2020
2 2020-06-01 2020-06-30 Jun 2020
3 2020-07-01 2020-07-31 Jul 2020
4 2020-08-01 2020-08-31 Aug 2020
5 2020-09-01 2020-09-30 Sep 2020
6 2020-10-01 2020-10-31 Oct 2020
7 2020-11-01 2020-11-30 Nov 2020
8 2020-12-01 2020-12-31 Dec 2020
9 2021-01-01 2021-01-31 Jan 2021
10 2021-02-01 2021-02-28 Feb 2021
11 2021-03-01 2021-03-31 Mar 2021
12 2021-04-01 2021-04-30 Apr 2021
13 2021-05-01 2021-05-31 May 2021
14 2021-06-01 2021-06-30 Jun 2021
15 2021-07-01 2021-07-31 Jul 2021
16 2021-08-01 2021-08-23 Aug 2021

Related

can't convert month number to month date in R [duplicate]

This question already has answers here:
How can I use name of a month in x-axis in ggplot2
(2 answers)
Closed 7 months ago.
I would like to create a ggplot with different tree types in Spain.
I used that code
library(dplyr)
library(reshape)
set.seed(123)
library(ggplot2)
library(tidyr)
df_long <- pivot_longer(df7,
cols = c(Birch, Palm, Oak),
values_to = "m3",
names_to = "Trees")
# Plot
ggplot(df_long,
aes(
x = Month,
y = Integral,
color = Trees
)) +
geom_line() +
ggtitle("trees in Spain") +
xlab("Month") + scale_x_continuous(breaks = seq(1, 12, by = 1), limits = c(1,12)) +
ylab(" m3")
But unfortunately the month names are not shown, just the number but I would like to have the month name
If your months are integers you can use the built in constants month.abb and month.name
library(dplyr)
df <- data.frame(month_nums = 1:12)
df |>
mutate(
month_abb = month.abb[month_nums],
month_full = month.name[month_nums]
)
# MONTH month_abb month_full
# 1 1 Jan January
# 2 2 Feb February
# 3 3 Mar March
# 4 4 Apr April
# 5 5 May May
# 6 6 Jun June
# 7 7 Jul July
# 8 8 Aug August
# 9 9 Sep September
# 10 10 Oct October
# 11 11 Nov November
# 12 12 Dec December
If they are dates you can use format():
df <- data.frame(
month = seq(from = as.Date("2020-01-01"), to = as.Date("2020-12-31"), by = "month")
)
df |>
mutate(
month_abb = format(month, "%b"),
month_full = format(month, "%B")
)
# month month_abb month_full
# 1 2020-01-01 Jan January
# 2 2020-02-01 Feb February
# 3 2020-03-01 Mar March
# 4 2020-04-01 Apr April
# 5 2020-05-01 May May
# 6 2020-06-01 Jun June
# 7 2020-07-01 Jul July
# 8 2020-08-01 Aug August
# 9 2020-09-01 Sep September
# 10 2020-10-01 Oct October
# 11 2020-11-01 Nov November
# 12 2020-12-01 Dec December

Extract information for date from other columns

I working with monthly data. Data is in a specific format in two columns Month and Year. Below you can see a sample of data:
df<-data.frame(
Month=c("m1","m2","m3","m4","m5","m6","m7","m8","m9","m10","m11","m12"),
Year=c("2020","2020","2020","2020","2020","2020","2020","2020","2020","2020","2020","2020"))
Now I want to convert this data, from that format into the format shown below or more precisely in column Date
So can anybody help me how to solve this problem?
In base R you can do:
df$Date <- as.Date(paste0(df$Year, gsub("m", "-", df$Month, fixed = TRUE), "-01"))
Here is an option using parse_number and my function -
library(dplyr)
library(readr)
library(lubridate)
df %>%
mutate(Month = parse_number(Month),
Date = my(paste(Month, Year)))
# Month Year Date
#1 1 2020 2020-01-01
#2 2 2020 2020-02-01
#3 3 2020 2020-03-01
#4 4 2020 2020-04-01
#5 5 2020 2020-05-01
#6 6 2020 2020-06-01
#7 7 2020 2020-07-01
#8 8 2020 2020-08-01
#9 9 2020 2020-09-01
#10 10 2020 2020-10-01
#11 11 2020 2020-11-01
#12 12 2020 2020-12-01
Base R option -
transform(df, Date = as.Date(paste(1, sub('m', '', Month), Year), '%d %m %Y'))
Try
as.Date(paste0(df$Year, '-', gsub('\\D+', '', df$Month), '-01'))
#[1] "2020-01-01" "2020-02-01" "2020-03-01" "2020-04-01" "2020-05-01" "2020-06-01" "2020-07-01" "2020-08-01" "2020-09-01" "2020-10-01" "2020-11-01" "2020-12-01"

How to convert week numbers into date format using R

I am trying to convert a column in my dataset that contains week numbers into weekly Dates. I was trying to use the lubridate package but could not find a solution. The dataset looks like the one below:
df <- tibble(week = c("202009", "202010", "202011","202012", "202013", "202014"),
Revenue = c(4543, 6764, 2324, 5674, 2232, 2323))
So I would like to create a Date column with in a weekly format e.g. (2020-03-07, 2020-03-14).
Would anyone know how to convert these week numbers into weekly dates?
Maybe there is a more automated way, but try something like this. I think this gets the right days, I looked at a 2020 calendar and counted. But if something is off, its a matter of playing with the (week - 1) * 7 - 1 component to return what you want.
This just grabs the first day of the year, adds x weeks worth of days, and then uses ceiling_date() to find the next Sunday.
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
separate(week, c("year", "week"), sep = 4, convert = TRUE) %>%
mutate(date = ceiling_date(ymd(paste(year, "01", "01", sep = "-")) +
(week - 1) * 7 - 1, "week", week_start = 7))
# # A tibble: 6 x 4
# year week Revenue date
# <int> <int> <dbl> <date>
# 1 2020 9 4543 2020-03-01
# 2 2020 10 6764 2020-03-08
# 3 2020 11 2324 2020-03-15
# 4 2020 12 5674 2020-03-22
# 5 2020 13 2232 2020-03-29
# 6 2020 14 2323 2020-04-05

Second to last Wednesday of month in R

In R, how can I produce a list of dates of all 2nd to last Wednesdays of the month in a specified date range? I've tried a few things but have gotten inconsistent results for months with five Wednesdays.
To generate a regular sequence of dates you can use seq with dates for parameter from and to. See the seq.Date documentation for more options.
Create a data frame with the date, the month and weekday. And then obtain the second to last wednesday for each month with the help of aggregate.
day_sequence = seq(as.Date("2020/1/1"), as.Date("2020/12/31"), "day")
df = data.frame(day = day_sequence,
month = months(day_sequence),
weekday = weekdays(day_sequence))
#Filter only wednesdays
df = df[df$weekday == "Wednesday",]
result = aggregate(day ~ month, df, function(x){head(tail(x,2),1)})
tail(x,2) will return the last two rows, then head(.., 1) will give you the first of these last two.
Result:
month day
1 April 2020-04-22
2 August 2020-08-19
3 December 2020-12-23
4 February 2020-02-19
5 January 2020-01-22
6 July 2020-07-22
7 June 2020-06-17
8 March 2020-03-18
9 May 2020-05-20
10 November 2020-11-18
11 October 2020-10-21
12 September 2020-09-23
There are probably simpler ways of doing this but the function below does what the question asks for. it returns a named vector of days such that
They are between from and to.
Are weekday day, where 1 is Monday.
Are n to last of the month.
By n to last I mean the nth counting from the end of the month.
whichWeekday <- function(from, to, day, n, format = "%Y-%m-%d"){
from <- as.Date(from, format = format)
to <- as.Date(to, format = format)
day <- as.character(day)
d <- seq(from, to, by = "days")
m <- format(d, "%Y-%m")
f <- c(TRUE, m[-1] != m[-length(m)])
f <- cumsum(f)
wed <- tapply(d, f, function(x){
i <- which(format(x, "%u") == day)
x[ tail(i, n)[1] ]
})
y <- as.Date(wed, origin = "1970-01-01")
setNames(y, format(y, "%Y-%m"))
}
whichWeekday("2019-01-01", "2020-03-31", 4, 2)
# 2019-01 2019-02 2019-03 2019-04 2019-05
#"2019-01-23" "2019-02-20" "2019-03-20" "2019-04-17" "2019-05-22"
# 2019-06 2019-07 2019-08 2019-09 2019-10
#"2019-06-19" "2019-07-24" "2019-08-21" "2019-09-18" "2019-10-23"
# 2019-11 2019-12 2020-01 2020-02 2020-03
#"2019-11-20" "2019-12-18" "2020-01-22" "2020-02-19" "2020-03-18"

How to divide monthly totals by the seasonal monthly ratio in R

I am trying de-seasonalize my data by dividing my monthly totals by the average seasonality ratio per that month. I have two data frames. avgseasonality that has 12 rows of the average seasonality ratio per month. The problem is since the seasonality ratio is the ratio of each month averaged only has 12 rows and the ordertotal data frame has 147 rows.
deseasonlize <- transform(avgseasonalityratio, deseasonlizedtotal =
df1$OrderTotal / avgseasonality$seasonalityratio)
This runs but it does not pair the months appropriately. It uses the first ratio of april and runs it on the first ordertotal of december.
> avgseasonality
Month seasonalityratio
1 April 1.0132557
2 August 1.0054602
3 December 0.8316988
4 February 0.9813396
5 January 0.8357475
6 July 1.1181648
7 June 1.0439899
8 March 1.1772450
9 May 1.0430667
10 November 0.9841149
11 October 0.9595041
12 September 0.8312318
> df1
# A tibble: 157 x 3
DateEntLabel OrderTotal `d$Month`
<dttm> <dbl> <chr>
1 2005-12-01 00:00:00 512758. December
2 2006-01-01 00:00:00 227449. January
3 2006-02-01 00:00:00 155652. February
4 2006-03-01 00:00:00 172923. March
5 2006-04-01 00:00:00 183854. April
6 2006-05-01 00:00:00 239689. May
7 2006-06-01 00:00:00 237638. June
8 2006-07-01 00:00:00 538688. July
9 2006-08-01 00:00:00 197673. August
10 2006-09-01 00:00:00 144534. September
# ... with 147 more rows
I need the ordertotal and ratio of each month respectively. The calculations would for each month respectively be such as (december) 512758/0.8316988 = 616518.864762 The output for the calculations would be in their new column that corresponds with the month and ordertotal. Please any help is greatly appreciated!
Easiest way would be to merge() your data first, then do the operation. You can use R base merge() function, though I will show here using the tidyverse left_join() function. I see that one of your columns has a strange name d$Month, renameing this to Month will simplify the merge!
Reproducible example:
library(tidyverse)
df_1 <- data.frame(Month = c("Jan", "Feb"), seasonalityratio = c(1,2))
df_2 <- data.frame(Month = rep(c("Jan", "Feb"),each=2), OrderTotal = 1:4)
df_1 %>%
left_join(df_2, by = "Month") %>%
mutate(eseasonlizedtotal = OrderTotal / seasonalityratio)
#> Month seasonalityratio OrderTotal eseasonlizedtotal
#> 1 Jan 1 1 1.0
#> 2 Jan 1 2 2.0
#> 3 Feb 2 3 1.5
#> 4 Feb 2 4 2.0
Created on 2019-01-30 by the reprex package (v0.2.1)

Resources