Calculate average of last 3 non holiday weekdays - r

I have a dataframe for number of profile hits with date time, week, weekday across various categories.
For sample data refer below (Input Data). What I am looking for is to output a dataframe with average of last 3 weekdays of non holiday weeks from Sunday to Saturday across all categories.
As you can see in the below required output, none of the data from holiday week is considered. Is there any easy way of achieving this without use of loops? If yes how can we do this?
required output:
CAT Day Avg
A SUN =(1 + 3+99) /3
A MON =(6+67+ 45) /3
A TUE = (2+ 53+ 68)/3
A WED
A THU
A FRI
A SAT
Input data:
CAT DATE WEEJ DAY Hits Holiday Week
A 9/3/2016 2016-35 SAT 58 No
A 9/2/2016 2016-35 FRI 9 No
A 9/1/2016 2016-35 THU 20 No
A 8/31/2016 2016-35 WED 92 No
A 8/30/2016 2016-35 TUE 2 No
A 8/29/2016 2016-35 MON 6 No
A 8/28/2016 2016-35 SUN 1 No
A 8/27/2016 2016-34 SAT 58 Yes
A 8/26/2016 2016-34 FRI 56 Yes
A 8/25/2016 2016-34 THU 40 Yes
A 8/24/2016 2016-34 WED 42 Yes
A 8/23/2016 2016-34 TUE 59 Yes
A 8/22/2016 2016-34 MON 21 Yes
A 8/21/2016 2016-34 SUN 98 Yes
A 8/20/2016 2016-33 Sat 2 No
A 8/19/2016 2016-33 FRI 85 No
A 8/18/2016 2016-33 THU 29 No
A 8/17/2016 2016-33 WED 37 No
A 8/16/2016 2016-33 TUE 53 No
A 8/15/2016 2016-33 MON 67 No
A 8/14/2016 2016-33 SUN 3 No
A 8/13/2016 2016-32 SAT 35 No
A 8/12/2016 2016-32 FRI 24 No
A 8/11/2016 2016-32 THU 94 No
A 8/10/2016 2016-32 WED 81 No
A 8/9/2016 2016-32 TUE 68 No
A 8/8/2016 2016-32 MON 45 No
A 8/7/2016 2016-32 SUN 99 No

We can use data.table
library(data.table)
setDT(df1)[order(-as.IDate(DATE, "%m/%d/%Y"), toupper(DAY))
][HolidayWeek=="No",.(Ave = sum(Hits[1:3])/.N) , by = .(DAY=toupper(DAY))]
# DAY Ave
#1: SAT 31.66667
#2: FRI 39.33333
#3: THU 47.66667
#4: WED 70.00000
#5: TUE 41.00000
#6: MON 39.33333
#7: SUN 34.33333
If it is the average of the 3 'Hits'
setDT(df1)[order(-as.IDate(DATE, "%m/%d/%Y"), toupper(DAY))
][HolidayWeek=="No",.(Ave = mean(Hits[1:3])) , by = .(DAY=toupper(DAY))]

Here's a solution with dplyr:
library(dplyr)
answer <- x %>% filter(Holiday=="No") %>% group_by(Day) %>%
top_n(3,desc(Date)) %>% summarise(Avg = sum(Hits)/n())
It removes all Holiday's, then for every 'DAY' it then takes the last three dates for each of those days and finally summarizes the number of hits and divide by the number of those days, giving you the average.
Please note your 'days' of week aren't all Uppercase.

library(data.table)
setDT(df)[Holiday_Week == 'No', .(Avg = sum(head(Hits, 3))/.N), by = .(CAT, DAY = tolower(DAY))]
# CAT DAY Avg
#1: A sat 31.66667
#2: A fri 39.33333
#3: A thu 47.66667
#4: A wed 70.00000
#5: A tue 41.00000
#6: A mon 39.33333
#7: A sun 34.33333

A base R solution
do.call("rbind",
lapply(split(df,df[,c("Holiday","CAT","DAY")]),
function(x) if (x$Holiday[1]=="Yes") {
NULL
} else {
data.frame(CAT=x$CAT[1],
DAY=x$DAY[1],
MN=mean(tail(x[order(x$DATE),],3)$Hits))}))
# CAT DAY MN
#No.A.FRI A FRI 39.33333
#No.A.MON A MON 39.33333
#No.A.SAT A SAT 31.66667
#No.A.SUN A SUN 34.33333
#No.A.THU A THU 47.66667
#No.A.TUE A TUE 41.00000
#No.A.WED A WED 70.00000

Average by day split for non holidays and holidays
Library(data.table)
data <- Input data
setDT(data)[, mean(Hits), by = .(DAY, Holiday) ]
Perhaps use tolower(DAY) as there are some naming differences in your data.
For just no holiday:
setDT(data)[Holiday == "No", mean(Hits), by = tolower(DAY) ]

Related

Is it possible to convert year-week date format to the first day of the week`?

I have a Year-Week format date. Is it possible to convert it to the first day of the week i.e. 201553 is 2015-12-28 and 201601 is 2016-01-04.
I found here how to do it, however, it does not work correctly on my dates. Could you help to do it without ISOweek package.
date<-c(201553L, 201601L, 201602L, 201603L, 201604L, 201605L, 201606L,
201607L, 201608L, 201609L)
as.POSIXct(paste(date, "0"),format="%Y%u %w")
Here's a way,
date<-data.frame(first = c(201553L, 201601L, 201602L, 201603L, 201604L, 201605L, 201606L,
201607L, 201608L, 201609L))
First separate the week and year from integer,
library(stringr)
library(dplyr)
date = date %>% mutate(week = str_sub(date$first,5,6))
date = date %>% mutate(year = str_sub(date$first,1,4))
The use aweek package to find the date,
library(aweek)
date = date %>% mutate(actual_date = get_date(week = date$week, year = date$year))
first week year actual_date
1 201553 53 2015 2015-12-28
2 201601 01 2016 2016-01-04
3 201602 02 2016 2016-01-11
4 201603 03 2016 2016-01-18
5 201604 04 2016 2016-01-25
6 201605 05 2016 2016-02-01
7 201606 06 2016 2016-02-08
8 201607 07 2016 2016-02-15
9 201608 08 2016 2016-02-22
10 201609 09 2016 2016-02-29

Convert date with Time Zone formats in R

I have my dates in the following format :- Wed Apr 25 2018 00:00:00 GMT-0700 (Pacific Standard Time) or 43167 or Fri May 18 2018 00:00:00 GMT-0700 (PDT) all mixed in 1 column. What would be the easiest way to convert all of these in a simple YYYY-mm-dd (2018-04-13) format? Here is the column:
dates <- c('Fri May 18 2018 00:00:00 GMT-0700 (PDT)',
'43203',
'Wed Apr 25 2018 00:00:00 GMT-0700 (Pacific Standard Time)',
'43167','43201',
'Fri May 18 2018 00:00:00 GMT-0700 (PDT)',
'Tue May 29 2018 00:00:00 GMT-0700 (Pacific Standard Time)',
'Tue May 01 2018 00:00:00 GMT-0700 (PDT)',
'Fri May 25 2018 00:00:00 GMT-0700 (Pacific Standard Time)',
'Fri Apr 06 2018 00:00:00 GMT-0700 (PDT)','43173')
Expected format:2018-05-18, 2018-04-13, 2018-04-25, ...
I believe similar questions have been asked several times before. However, there
is a crucial point which needs special attention:
What is the origin for the dates given as integer (or as character string which can be converted to integer to be exact)?
If the data is imported from the Windows version of Excel, origin = "1899-12-30" has to be used. For details, see the Example section in help(as.Date) and the Other Applications section of the R Help Desk article by Gabor Grothendieck and Thomas Petzoldt.
For conversion of the date time strings, the mdy_hms() function from the lubridate package is used. In addition, I am using data.table syntax for its conciseness:
library(data.table)
data.table(dates)[!dates %like% "^\\d+$", new_date := as.Date(lubridate::mdy_hms(dates))][
is.na(new_date), new_date := as.Date(as.integer(dates), origin = "1899-12-30")][]
dates new_date
1: Fri May 18 2018 00:00:00 GMT-0700 (PDT) 2018-05-18
2: 43203 2018-04-13
3: Wed Apr 25 2018 00:00:00 GMT-0700 (Pacific Standard Time) 2018-04-25
4: 43167 2018-03-08
5: 43201 2018-04-11
6: Fri May 18 2018 00:00:00 GMT-0700 (PDT) 2018-05-18
7: Tue May 29 2018 00:00:00 GMT-0700 (Pacific Standard Time) 2018-05-29
8: Tue May 01 2018 00:00:00 GMT-0700 (PDT) 2018-05-01
9: Fri May 25 2018 00:00:00 GMT-0700 (Pacific Standard Time) 2018-05-25
10: Fri Apr 06 2018 00:00:00 GMT-0700 (PDT) 2018-04-06
11: 43173 2018-03-14
Apparently, the assumption to choose the origin which belongs to the Windows version of Excel seems to hold.
If only a vector of Date values is required:
data.table(dates)[!dates %like% "^\\d+$", new_date := as.Date(lubridate::mdy_hms(dates))][
is.na(new_date), new_date := as.Date(as.integer(dates), origin = "1899-12-30")][, new_date]
[1] "2018-05-18" "2018-04-13" "2018-04-25" "2018-03-08" "2018-04-11" "2018-05-18"
[7] "2018-05-29" "2018-05-01" "2018-05-25" "2018-04-06" "2018-03-14"

Convert character to Date (Thu Jun 14 *** 2018-05-14) in r [duplicate]

This question already has an answer here:
R convert string date (e.g. "October 1, 2014") to Date format
(1 answer)
Closed 4 years ago.
I have a dataframe which is about World Cup matches that include date,location,match_name etc.
In this dataframe I want to convert date column as date in format "2018-05-06"
Here is my file;
date match_name price
1 Thu Jun 14 Russia v Saudi Arabia €453.92
2 Fri Jun 15 Egypt v Uruguay €90.00
3 Tue Jun 19 Russia v Egypt €297.45
4 Wed Jun 20 Uruguay v Saudi Arabia €95.00
and here is my expectation;
date match_name price
1 2018-05-14 Russia v Saudi Arabia €453.92
2 2018-05-15 Egypt v Uruguay €90.00
3 2018-05-19 Russia v Egypt €297.45
4 2018-05-20 Uruguay v Saudi Arabia €95.00
This sure is not the easiest way to do it, But I just wanted you to have a quick answer.
library(stringr)
library(dplyr)
Data=data.frame(date=c("Thu Jun 14","Fri Jun 15","Tue Jun 19","Wed Jun 20"),match_name=c("a","b","c","d"),price=c(1,2,3,4))
Data$date=as.character(Data$date)
regexp <- "[[:digit:]]+"
Data=mutate(Data,datenum=str_extract(Data$date, regexp))
Data=mutate(Data,monthnum=str_extract(Data$date, regexp))
Data=mutate(Data,monthname=str_extract(Data$date,"Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec"))
Data=mutate(Data,monthnum=if(Data$monthname=="Jan")
"01"
else if(Data$monthname=="Feb")
"02"
else if(Data$monthname=="Mar")
"03"
else if(Data$monthname=="Apr")
"04"
else if(Data$monthname=="May")
"05"
else if(Data$monthname=="Jun")
"06"
else if(Data$monthname=="Jul")
"07"
else if(Data$monthname=="Aug")
"08"
else if(Data$monthname=="Sep")
"09"
else if(Data$monthname=="Oct")
"10"
else if(Data$monthname=="Nov")
"11"
else if(Data$monthname=="Dec")
"12"
)
mutate(Data,Final_Date=paste0("2018-",monthnum,"-",datenum))
Resulting in
date match_name price datenum monthnum monthname Final_Date
1 Thu Jun 14 a 1 14 06 Jun 2018-06-14
2 Fri Jun 15 b 2 15 06 Jun 2018-06-15
3 Tue Jun 19 c 3 19 06 Jun 2018-06-19
4 Wed Jun 20 d 4 20 06 Jun 2018-06-20
OK, let's say you have this data.frame:
myDF <-as.data.frame(x=list(date=c("Thu Jun 14","Fri Jun 15","Tue Jun 19","Wed Jun 20")))
Which constructs the following data.frame:
date
1 Thu Jun 14
2 Fri Jun 15
3 Tue Jun 19
4 Wed Jun 20
Assuming that each game is in 2018:
#for handling month abbreviations in English:
Sys.setlocale("LC_TIME", "en_US.UTF-8")
myDF$date <- as.Date(paste0(substr(myDF$date,5,10),", 2018"),format="%b %d, %Y")
The resulting myDF:
date
1 2018-06-14
2 2018-06-15
3 2018-06-19
4 2018-06-20
You can change 2018 to any year you like where necessary.
To convert a variable "date" to the format '2018-05-14', you need to perform the following function:
conv_date <- function(var, year){
var <- as.Date(paste0(var, " ", year), '%a %b %d %Y')
return(var)
}
where:
var - variable in your data table (i.e 'date')
year - the year you need
Example:
yours_df$date <- conv_date(yours_df$date, 2018)

No of monthly days between two dates

diff(seq(as.Date("2016-12-21"), as.Date("2017-04-05"), by="month"))
Time differences in days
[1] 31 31 28
The above code generates no of days in the month Dec, Jan and Feb.
However, my requirement is as follows
#Results that I need
#monthly days from date 2016-12-21 to 2017-04-05
11, 31, 28, 31, 5
#i.e 11 days of Dec, 31 of Jan, 28 of Feb, 31 of Mar and 5 days of Apr.
I even tried days_in_month from lubridate but not able to achieve the result
library(lubridate)
days_in_month(c(as.Date("2016-12-21"), as.Date("2017-04-05")))
Dec Apr
31 30
Try this:
x = rle(format(seq(as.Date("2016-12-21"), as.Date("2017-04-05"), by=1), '%b'))
> setNames(x$lengths, x$values)
# Dec Jan Feb Mar Apr
# 11 31 28 31 5
Although we have seen a clever replacement of table by rle and a pure table solution, I want to add two approaches using grouping. All approaches have in common that they create a sequence of days between the two given dates and aggregate by month but in different ways.
aggregate()
This one uses base R:
# create sequence of days
days <- seq(as.Date("2016-12-21"), as.Date("2017-04-05"), by = 1)
# aggregate by month
aggregate(days, list(month = format(days, "%b")), length)
# month x
#1 Apr 5
#2 Dez 11
#3 Feb 28
#4 Jan 31
#5 Mrz 31
Unfortunately, the months are ordered alphabetically as it happened with the simple table() approach. In these situations, I do prefer the ISO8601 way of unambiguously naming the months:
aggregate(days, list(month = format(days, "%Y-%m")), length)
# month x
#1 2016-12 11
#2 2017-01 31
#3 2017-02 28
#4 2017-03 31
#5 2017-04 5
data.table
Now that I've got used to the data.table syntax, this is my preferred approach:
library(data.table)
data.table(days)[, .N, .(month = format(days, "%b"))]
# month N
#1: Dez 11
#2: Jan 31
#3: Feb 28
#4: Mrz 31
#5: Apr 5
The order of months is kept as they have appeared in the input vector.

R Split series into month

I have the below data with date and count. Please help in transforming this one row where months are columns. And rows are data of each year
Date count
=================
2011-01-01 10578
2011-02-01 9330
2011-03-01 10686
2011-04-01 10260
2011-05-01 10032
2011-06-01 9762
2011-07-01 10308
2011-08-01 9966
2011-09-01 10146
2011-10-01 10218
2011-11-01 8826
2011-12-01 9504
to
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
------------------------------------------------------------------------------
2011 10578 9330 10686 10260 10032 9762 10308 9966 10146 10218 8826 9504
2012 ....
This is a perfect task for ts in R base. Suppose your data.frame is xthen using ts will produce the output you want.
> ts(x$count, start=c(2011,01,01), end=c(2011,12,01), frequency=12)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2011 10578 9330 10686 10260 10032 9762 10308 9966 10146 10218 8826 9504
If your data is in x try something like this:
library(reshape2)
res <- dcast(transform(x, month = format(Date, format="%m"),
year = format(Date, "%Y")),
year ~ month, value.var="count")
rownames(res) <- res$year
res <- res[,-1]
names(res) <- toupper(month.abb[as.numeric(names(res))])
res
This assumes that x$Date is already a date. If not, you will need to first convert is to a date:
x$Date <- as.Date(x$Date)

Resources