R: Turn Months into Quarters - r

I have a dataset that looks like this:
> ex
# A tibble: 10 × 2
tenor delivery_window
<chr> <chr>
1 month Nov 22
2 quarter Jan 22
3 year Cal 24
4 year Cal 22
5 month Feb 22
6 quarter Jan 21
7 month Sep 22
8 quarter Jan 21
9 month Jun 21
10 month Aug 21
And which I want to turn into something like this:
> ex
# A tibble: 10 × 3
tenor delivery_window new_tenor
<chr> <chr> <chr>
1 month Nov 22 Nov 22
2 quarter Jan 22 Q1 22
3 year Cal 24 Cal 24
4 year Cal 22 Cal 22
5 month Feb 22 Feb 22
6 quarter Jan 21 Q1 21
7 month Sep 22 Sep 22
8 quarter Jan 21 Q1 21
9 month Jun 21 Jun 21
10 month Aug 21 Aug 21
That is, if the tenor is quarter, I want to show only the quarter corresponding to the delivery window, not the month. Monthly and Yearly tenors can remain as they are.
Can someone please give me a hint as to how to achieve this? Thank you in advance.
EDIT
The new_tenor should be Q1 YY for months from Jan to Mar, Q2 YY for months from Apr YY to Jun YY, Q3 YY for months from Jul YY to Sep YY, and Q4 YY for months from Oct YY to Dec YY.

We can convert to yearqtr with as.yearqtr (from zoo), and use case_when to replace the elements in 'delivery_window' with the converted value
library(dplyr)
library(stringr)
library(zoo)
ex <- ex %>%
mutate(new_tenor = case_when(tenor == 'quarter'
~ str_replace(as.yearqtr(paste('1', delivery_window),
'%d %b %Y'), "(\\d+) (\\w+)", "\\2 \\1")
, TRUE ~ delivery_window))

Related

Conditional rolling counting function

I would like to implement a rolling count function for the working days in a month. Weekends (Saturday and Sunday) should be assigned a NA.
A replicable example:
#Change language if your are in a non-English location like me
Sys.setlocale("LC_TIME", "C")
workdays <- c("Mon","Tue","Wed","Thu","Fri")
dataset <- data.frame(Date = seq(as.Date("2020-03-01"),as.Date("2020-04-01")-1,"days"))
dataset$Day <- format(dataset$Date,format="%d")
dataset$WeekDay <- format(dataset$Date,format="%a")
dataset$Month <- format(dataset$Date,format="%m")
dataset$Year <- format(dataset$Date,format="%y")
dataset$Workday <- dataset$WeekDay %in% workdays
I wanted to use dplry grouped by the respective month and year to sum conditionally for the working days.
dataset %>%
group_by(Month,Year) %>%
mutate(WorkdayNo = ???)
In my example, the first ten rows should then look like this:
[1] NA 1 2 3 4 5 NA NA 6 7 (...)
cumsum with ifelse should help -
library(dplyr)
dataset %>%
group_by(Month,Year) %>%
mutate(WorkdayNo = if_else(Workday, cumsum(Workday), NA_integer_)) %>%
ungroup
# Date Day WeekDay Month Year Workday WorkdayNo
# <date> <chr> <chr> <chr> <chr> <lgl> <int>
# 1 2020-03-01 01 Sun 03 20 FALSE NA
# 2 2020-03-02 02 Mon 03 20 TRUE 1
# 3 2020-03-03 03 Tue 03 20 TRUE 2
# 4 2020-03-04 04 Wed 03 20 TRUE 3
# 5 2020-03-05 05 Thu 03 20 TRUE 4
# 6 2020-03-06 06 Fri 03 20 TRUE 5
# 7 2020-03-07 07 Sat 03 20 FALSE NA
# 8 2020-03-08 08 Sun 03 20 FALSE NA
# 9 2020-03-09 09 Mon 03 20 TRUE 6
#10 2020-03-10 10 Tue 03 20 TRUE 7
# … with 21 more rows

Creating Calendar df in R

I am currently creating a Calendar df to join to my other dfs and originally code it in the following way:
Date <- seq(as.Date("2020-01-01"), as.Date("2021-12-31"), by="days")
Calendar <- data.frame(Date)
Calendar$DateNo <- format(Calendar$Date, format = "%d")
Calendar$NameDay <- format(Calendar$Date, format = "%A")
Calendar$MonthNo <- format(Calendar$Date, format = "%m")
Calendar$NameMonth <- format(Calendar$Date, format = "%B")
Calendar$NameMonthShort <- format(Calendar$Date, format = "%b")
Calendar$Week <- format(Calendar$Date, format = "%V")
Calendar$Year <- format(Calendar$Date, format = "%Y")
Calendar$Quarter <- quarter(Calendar$Date, with_year = F, fiscal_start = 7)
Calendar$Month_Year <-paste(Calendar$NameMonthShort,Calendar$Year,sep="-")
Calendar$Quarter_Year <-paste(Calendar$Quarter,Calendar$Year,sep="-")
After some issues with plotting my data into ggplot I came across an alternate way of creating it using lubridate package with mutate. My new code is as follows:
Date <- seq(as.Date("2020-01-01"), as.Date("2021-12-31"), by="days")
Calendar <- data.frame(Date)
Calendar <- Calendar %>%
mutate(
DateNo = day(Date),
NameDay = wday(Date,label = TRUE),
MonthNo = month(Date),
NameMonth = month(Date, label = TRUE),
NameMonthShort = month(Date, label = TRUE),
Week = week(Date),
Year = year(Date),
Quarter = quarter(Date, with_year = F, fiscal_start = 7))
The issues I am encountering are that I can't add the unabbreviated date/month and not sure if I can add Month_Year/Quarter_Year inside the mutate so that the values are factored in. Is it possible to add those values in or do I have to add them how I did previously? Thanks!
You might find it easier if you use built-in as.POSIXlt, no lubridate needed. Just apply it on your sequence and you'd get a list-type format,
Date <- as.POSIXlt(seq(as.Date("2020-01-01"), as.Date("2020-06-30"), by="7 days"))
## Note: shortened for sake of brevity
that has the desired information already stored in objects that can be accessed by $.
attr(Date, "names")
# [1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday" "isdst"
There are some minor conversions needed due to the storage format, and some helper functions like weekdays, quarters, and strftime. In addition we may use the built-in constants month.name and month.abb.
Calendar <- data.frame(Date,
DateNo=Date$mday,
NameDay=weekdays(Date),
MonthNo=Date$mon + 1,
NameMonth=month.name[Date$mon + 1],
NameMonthShort=month.abb[Date$mon + 1],
Week=strftime(Date, "%V"),
Year=1900 + Date$year,
Quarter=quarters(Date)
)
Result
Calendar
# Date DateNo NameDay MonthNo NameMonth NameMonthShort Week Year Quarter
# 1 2020-01-01 1 Wednesday 1 January Jan 01 2020 Q1
# 2 2020-01-08 8 Wednesday 1 January Jan 02 2020 Q1
# 3 2020-01-15 15 Wednesday 1 January Jan 03 2020 Q1
# 4 2020-01-22 22 Wednesday 1 January Jan 04 2020 Q1
# 5 2020-01-29 29 Wednesday 1 January Jan 05 2020 Q1
# 6 2020-02-05 5 Wednesday 2 February Feb 06 2020 Q1
# 7 2020-02-12 12 Wednesday 2 February Feb 07 2020 Q1
# 8 2020-02-19 19 Wednesday 2 February Feb 08 2020 Q1
# 9 2020-02-26 26 Wednesday 2 February Feb 09 2020 Q1
# 10 2020-03-04 4 Wednesday 3 March Mar 10 2020 Q1
# 11 2020-03-11 11 Wednesday 3 March Mar 11 2020 Q1
# 12 2020-03-18 18 Wednesday 3 March Mar 12 2020 Q1
# 13 2020-03-25 25 Wednesday 3 March Mar 13 2020 Q1
# 14 2020-04-01 1 Wednesday 4 April Apr 14 2020 Q2
# 15 2020-04-08 8 Wednesday 4 April Apr 15 2020 Q2
# 16 2020-04-15 15 Wednesday 4 April Apr 16 2020 Q2
# 17 2020-04-22 22 Wednesday 4 April Apr 17 2020 Q2
# 18 2020-04-29 29 Wednesday 4 April Apr 18 2020 Q2
# 19 2020-05-06 6 Wednesday 5 May May 19 2020 Q2
# 20 2020-05-13 13 Wednesday 5 May May 20 2020 Q2
# 21 2020-05-20 20 Wednesday 5 May May 21 2020 Q2
# 22 2020-05-27 27 Wednesday 5 May May 22 2020 Q2
# 23 2020-06-03 3 Wednesday 6 June Jun 23 2020 Q2
# 24 2020-06-10 10 Wednesday 6 June Jun 24 2020 Q2
# 25 2020-06-17 17 Wednesday 6 June Jun 25 2020 Q2
# 26 2020-06-24 24 Wednesday 6 June Jun 26 2020 Q2
Long month names are easy to add by including abbr=FALSE switch to month().
Pasting quarters or months to years needs a second mutate as below.
Edit Since paste creates character vectors and not factors, you will need to specify factor levels manually:
monthlevels = c(
'Jan-2020','Feb-2020','Mar-2020','Apr-2020','May-2020','Jun-2020',
'Jul-2020','Aug-2020','Sep-2020','Oct-2020','Nov-2020','Dec-2020',
'Jan-2021','Feb-2021','Mar-2021','Apr-2021','May-2021','Jun-2021',
'Jul-2021','Aug-2021','Sep-2021','Oct-2021','Nov-2021','Dec-2021')
quarterlevels = c('1-2020','2-2020','3-2020','4-2020','1-2021','2-2021','3-2021','4-2021')
Calendar %>%
mutate(
DateNo = day(Date),
NameDay = wday(Date,label = TRUE),
MonthNo = month(Date),
NameMonth = month(Date, label = TRUE, abbr=FALSE), ## added abbr=FALSE
NameMonthShort = month(Date, label = TRUE),
Week = week(Date),
Year = year(Date),
Quarter = quarter(Date, with_year = F, fiscal_start = 7)) %>%
## added second mutate() to paste fields created by the first mutate
mutate(
QuarterYear = factor(paste(Quarter, Year, sep='-'), levels=quarterlevels),
MonthYear = factor(paste(NameMonthShort,Year,sep="-"),levels=monthlevels
) %>% head()
Returns:
Date DateNo NameDay MonthNo NameMonth NameMonthShort Week Year Quarter
1 2020-01-01 1 Wed 1 January Jan 1 2020 3
2 2020-01-02 2 Thu 1 January Jan 1 2020 3
3 2020-01-03 3 Fri 1 January Jan 1 2020 3
4 2020-01-04 4 Sat 1 January Jan 1 2020 3
5 2020-01-05 5 Sun 1 January Jan 1 2020 3
6 2020-01-06 6 Mon 1 January Jan 1 2020 3
QuarterYear MonthYear
1 3-2020 Jan-2020
2 3-2020 Jan-2020
3 3-2020 Jan-2020
4 3-2020 Jan-2020
5 3-2020 Jan-2020
6 3-2020 Jan-2020

Unpivot or transpose columns into rows R [duplicate]

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 2 years ago.
Given the dataframe below:
dt <- data.frame("Year"=2020,
"Month"=c("Jan","Jan","Feb"),
"Location"=c("Store_1","Store_1","Store_2"),
"Apples"=c(100, 150, 120),
"Oranges"=c(50, 70, 50))
Year Month Location Apples Oranges
1 2020 Jan Store_1 100 50
2 2020 Jan Store_1 150 70
3 2020 Feb Store_2 120 50
How can I turn this table into the following table, basically by maintaining the first three columns and unpivoting the next two columns.
Year Month Location Type Values
1 2020 Jan Store_1 Apple 100
2 2020 Jan Store_1 Apple 150
3 2020 Feb Store_2 Apple 120
4 2020 Jan Store_1 Orange 50
5 2020 Jan Store_1 Orange 70
6 2020 Feb Store_2 Orange 50
Any hints or tips on this?
We can use pivot_longer from tidyr
library(dplyr)
library(tidyr)
dt %>%
pivot_longer(cols = Apples:Oranges, names_to = 'Type',
values_to = 'Values') %>%
arrange(Year, Type)
-output
# A tibble: 6 x 5
# Year Month Location Type Values
# <dbl> <chr> <chr> <chr> <dbl>
#1 2020 Jan Store_1 Apples 100
#2 2020 Jan Store_1 Apples 150
#3 2020 Feb Store_2 Apples 120
#4 2020 Jan Store_1 Oranges 50
#5 2020 Jan Store_1 Oranges 70
#6 2020 Feb Store_2 Oranges 50

How to find maximum value from dataframe with specific condition? [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 2 years ago.
I have a dataframe named employee with 100 rows like this :
Date Name ride food income bonus sallary
1 01 Jan 2020 Ludociel 10 6 330000 0 330000
2 01 Jan 2020 Estarossa 15 8 465000 100000 565000
3 01 Jan 2020 Tarmiel 8 10 420000 100000 520000
4 01 Jan 2020 Sariel 5 8 315000 0 315000
5 01 Jan 2020 Escanor 15 7 435000 100000 535000
6 01 Jan 2020 Ban 13 9 465000 100000 565000
7 01 Jan 2020 Meliodas 6 15 540000 100000 640000
8 01 Jan 2020 King 15 12 585000 100000 685000
9 01 Jan 2020 Zeldris 15 11 555000 100000 655000
10 01 Jan 2020 Rugal 15 6 405000 100000 505000
11 02 Jan 2020 Ludociel 14 6 390000 100000 490000
12 02 Jan 2020 Estarossa 12 14 600000 100000 700000
...
100 10 Jan 2020 Rugal 13 10 495000 100000 595000
The problem is I want to find which employee that has the highest total sallary from 1 Jan to 10 Jan. My expected output is just a vector like this :
[1] "varName" is the highest with total sallary "varTotal_sallary"
I have tried using for loop + if clause and it only return total of 1 name only, and every name will have the function.
function_ludociel<-function(name, date, sallary){
total=integer()
for(i in 100){
if(date[i]=="01 Jan 2020" & name[i]=="Ludociel"){
total=sum(sallary)
}
}
return(total)
}
ludociel=function_ludociel(employee$name,employee$date,employee$sallary)
After that I planned to combine them in 1 variable and use max(), but i know it is silly to code.
Anyone have solution for this? Thankyou very much...
Convert date to actual date class
Use aggregate to calculate total salary from 1st Jan to 10th Jan
Select row with maximum salary
Print the result.
employee$Date <- as.Date(employee$Date, '%d %b %Y')
sub_data <- aggregate(sallary~Name, employee,
subset = Date >= as.Date('2020-01-01') &
Date <= as.Date('2020-01-10'), sum)
max_data <- sub_data[which.max(sub_data$sallary), ]
sprintf('%s has the highest salary %d', max_data$Name, max_data$sallary)

Apply automation in R to change rows of numbers into date

I have created a simple data.frame of 1 column:
x<-as.data.frame(replicate(1, sample(1:27, 1250, rep=TRUE)))
So x will be a column with repeated values from 1 to 27.
I wish to change these values into dates, eg.
x[x==1]<-"31 June 2018"
x[x==2]<-"1 July 2018"
x[x==3]<-"2 July 2018"
Is there a faster way to do this?
I believe I can do this using apply... but I have not much experience using apply..
Thank you for your suggestions.
Here's one way with as.Date() -
x$date <- as.Date(x$V1, origin = "2018-06-30")
head(x)
V1 date
1 5 2018-07-05
2 19 2018-07-19
3 13 2018-07-13
4 9 2018-07-09
5 10 2018-07-10
6 21 2018-07-21
If you want the format to be as per your post -
x$date <- as.Date(x$V1, origin = "2018-06-30") %>% format("%d %B %Y")
head(x)
V1 date
1 5 05 July 2018
2 19 19 July 2018
3 13 13 July 2018
4 9 09 July 2018
5 10 10 July 2018
6 21 21 July 2018

Resources