How to find all dates from the present and previous month? - r

I have a table
EmployeeSalary:
Date | Salary
01.12.2016 | 2000
01.02.2016 | 3000
03.02.2016 | 5000
01.03.2017 | 1000
30.01.2017 | 5000
10.03.2017 | 1300
When the System Date is 13.03.2017. How to get the present month dates and the past month Dates (i.e., from February 1 to System date).
My code is :
start= format(Sys.Date() - 30, '%Y-%m-01')
end=Sys.time()
while (start<end)
{
print(EmployeeSalary)
EmployeeSalary$"Date" = EmployeeSalary$"Date"+1
}
Error which I get:
Error: non-numeric argument to binary Operator
Expected Output is :
EmployeeSalary:
Date | Salary
01.02.2016 | 3000
03.02.2016 | 5000
01.03.2017 | 1000
10.03.2017 | 1300

Here is one way:
R> dates <- seq(Sys.Date(), length=62, by=-1)
R> mon <- function(d) as.integer(format(d, "%m")) %% 12
R> dates[mon(dates) >= mon(Sys.Date())-1]
[1] "2017-03-13" "2017-03-12" "2017-03-11" "2017-03-10" "2017-03-09"
[6] "2017-03-08" "2017-03-07" "2017-03-06" "2017-03-05" "2017-03-04"
[11] "2017-03-03" "2017-03-02" "2017-03-01" "2017-02-28" "2017-02-27"
[16] "2017-02-26" "2017-02-25" "2017-02-24" "2017-02-23" "2017-02-22"
[21] "2017-02-21" "2017-02-20" "2017-02-19" "2017-02-18" "2017-02-17"
[26] "2017-02-16" "2017-02-15" "2017-02-14" "2017-02-13" "2017-02-12"
[31] "2017-02-11" "2017-02-10" "2017-02-09" "2017-02-08" "2017-02-07"
[36] "2017-02-06" "2017-02-05" "2017-02-04" "2017-02-03" "2017-02-02"
[41] "2017-02-01"
R>
We create sequence of dates going backwards. We then create helper function to get the (integer-valued) month for a date.
Given those two, we index the date sequence down to the ones matching your criteria: from this months or the preceding month.
And by taking 'month modulo 12' we also catch the case of January comparing to December.

Related

Combine different date fields to a single Timestamp column in pyspark

If the dataframe is like below:
year
month
day
weekday
hour
2017
January
1
Sunday
0
2018
September
22
Saturday
11
Then I need to add another column with values of type timestamp like the following:
2017-01-01 00:00:00
2018-09-22 11:00:00
I'm trying unix_timestamp after concatenating the fields into string type but not working.
you can concat the elements into a string and use to_timestamp (or from_unixtime(unix_timestamp())) with the appropriate datetime pattern.
here's the example
data_sdf. \
withColumn('ts',
func.to_timestamp(func.concat_ws(' ', 'year', 'month', 'day', 'hour'),
'yyyy MMMM d H'
)
). \
show(truncate=False)
# +----+---------+---+--------+----+-------------------+
# |year|month |day|weekday |hour|ts |
# +----+---------+---+--------+----+-------------------+
# |2017|January |1 |Sunday |0 |2017-01-01 00:00:00|
# |2018|September|22 |Saturday|11 |2018-09-22 11:00:00|
# +----+---------+---+--------+----+-------------------+

Create time series in R starting 10/1/2019 to 12/30/2019 not including weekends or holidays

I am trying to create a time series using the ts() function. My data set has 63 values with the starting date as 10-01-2019 and the last date as 12-31-2019. This data set skips weekends and holidays. I am trying this:
ts(data, start = c(2019,10), end = c(2019, 12), frequency = 260)
since there are 260 days a year not including weekends, but that isn't working. I keep getting a time series with the wrong number of observations (there should still be 63 values right?) I am confused with how to set this up. If anyone could help me, that would be greatly appreciated.
Thank you!!
ts is normally used for monthly, quarterly and annual data, not daily data. If you want to do it anyways you can use ts(data) which will use an index of 1, 2, etc. Using data in the Note at the end:
ts(data)
## Time Series:
## Start = 1
## End = 3
## Frequency = 1
## [1] 1 2 3
If you have a Date class vector d the same length as data you can use zoo or xts and either use that or convert it to ts using as.ts like this (where the index is the number of days since the UNIX Epoch). If you want to specify a frequency you can add a frequency= argument to the zoo call.
library(zoo)
z <- zoo(data, d)
z
## 2019-10-01 2019-10-03 2019-10-04
## 1 2 3
as.ts(z)
## Time Series:
## Start = 18170
## End = 18173
## Frequency = 1
## [1] 1 NA 2 3
Note
data <- 1:3
d <- as.Date(c("2019-10-01", "2019-10-03", "2019-10-04"))
You could use the bizdays and timeDate packages as follows:
library(bizdays)
library(timeDate)
create.calendar(name='America/New_York', holidays = as.Date(holidayNYSE(2019)), weekdays = c('saturday', 'sunday'), start.date=as.Date('2019-01-01'), end.date = as.Date('2019-12-31'))
bizseq(as.Date('2019-10-01'), as.Date('2019-12-31'), 'America/New_York')
[1] "2019-10-01" "2019-10-02" "2019-10-03" "2019-10-04" "2019-10-07" "2019-10-08" "2019-10-09"
[8] "2019-10-10" "2019-10-11" "2019-10-14" "2019-10-15" "2019-10-16" "2019-10-17" "2019-10-18"
[15] "2019-10-21" "2019-10-22" "2019-10-23" "2019-10-24" "2019-10-25" "2019-10-28" "2019-10-29"
[22] "2019-10-30" "2019-10-31" "2019-11-01" "2019-11-04" "2019-11-05" "2019-11-06" "2019-11-07"
[29] "2019-11-08" "2019-11-11" "2019-11-12" "2019-11-13" "2019-11-14" "2019-11-15" "2019-11-18"
[36] "2019-11-19" "2019-11-20" "2019-11-21" "2019-11-22" "2019-11-25" "2019-11-26" "2019-11-27"
[43] "2019-11-29" "2019-12-02" "2019-12-03" "2019-12-04" "2019-12-05" "2019-12-06" "2019-12-09"
[50] "2019-12-10" "2019-12-11" "2019-12-12" "2019-12-13" "2019-12-16" "2019-12-17" "2019-12-18"
[57] "2019-12-19" "2019-12-20" "2019-12-23" "2019-12-24" "2019-12-26" "2019-12-27" "2019-12-30"
[64] "2019-12-31"
There are 64 days in this sequence; your 63 days may be because of differences in the holiday calendar.

How can I create a sequence of year-week string values based on existing dates?

I am plotting weekly figures that cross over from 2018 into 2019 and the tick marks on my X-axis represent the year then week.
For example:
2018-50, 2018-51, 2018-52, 2018-53, 2019-01, 2019-02, 2019-03
I have two data frames and the dates in either aren't always going to be the same. As such, one solution I have thought of that might work is to find the lowest yearWeek value in either data frame, and the maximum yearWeek value in either data frame, and to then create a sequence using those two values. Note that both values could either exist within a single data frame or one data frame could have the lowest/earliest value and the other the highest/latest value.
Both data frames look like this:
week yearWeek month day date
1 31 2018-31 2018-08-01 Wed 2018-08-01
2 31 2018-31 2018-08-01 Thu 2018-08-02
3 31 2018-31 2018-08-01 Fri 2018-08-03
4 31 2018-31 2018-08-01 Sat 2018-08-04
5 32 2018-32 2018-08-01 Sun 2018-08-05
6 32 2018-32 2018-08-01 Mon 2018-08-06
I have looked for a solution and this answer is almost there, but not quite.
The problems with this solution are:
The single-figure week number don't have a 0 before them; and
Despite specifying seq(31:53), for example, the output starts from 1 (I know why this happens); and
There doesn't seem to be a way to stop the count at 53 using this method (2018 had a (short) 53rd week which I would like to include) and resume from 2019-01 onwards.
I want to be able to set the X-axis range from 2018-31 (31st week of 2018) to 2019-13 (13th week of 2019).
Something like this:
In short, how can I create a sequence of year-week values ranging from the minimum date value to the maximum date value (in this case 2018-31-2019-13)?
I think this would work for you
x1 <- c(31:53)
x2 <- sprintf("%02d", c(1:13))
paste(c(rep(2018, length(x1)), rep(2019, length(x2))), c(x1, x2), sep = "-")
# [1] "2018-31" "2018-32" "2018-33" "2018-34" "2018-35" "2018-36" "2018-37"
# "2018-38" "2018-39" "2018-40" "2018-41" "2018-42" "2018-43" "2018-44"
# "2018-45" "2018-46" "2018-47" "2018-48" "2018-49" "2018-50" "2018-51"
# "2018-52" "2018-53" "2019-01" "2019-02" "2019-03" "2019-04" "2019-05"
# "2019-06" "2019-07" "2019-08" "2019-09" "2019-10" "2019-11" "2019-12" "2019-13"
For the updated question we can do
#rbind both the dataset
df <- rbind(df1, df2)
#convert them to date
df$Date <- as.Date(df$date)
#Generate a sequence from min date to maximum date, format them
# to year-week combination and select only the unique ones
unique(format(seq(min(df$Date), max(df$Date), by = "day"), "%Y-%W"))
Define two sequences, and then restrict to the range you want:
years <- c("2018", "2019")
months <- sprintf("%02d", c(1:52))
result <- apply(expand.grid(years, months), 1, function(x) paste(x,collapse="-"))
result <- result[result >= "2018-31" & result <= "2019-13"]
result
[1] "2019-01" "2019-02" "2019-03" "2019-04" "2019-05" "2019-06" "2019-07"
[8] "2019-08" "2019-09" "2019-10" "2019-11" "2019-12" "2019-13" "2018-31"
[15] "2018-32" "2018-33" "2018-34" "2018-35" "2018-36" "2018-37" "2018-38"
[22] "2018-39" "2018-40" "2018-41" "2018-42" "2018-43" "2018-44" "2018-45"
[29] "2018-46" "2018-47" "2018-48" "2018-49" "2018-50" "2018-51" "2018-52"
Note that the pruning off of dates we don't want works here even using text date strings, because all dates are fixed width strings, and are left zero padded, if necessary. So, sorting therefore works as it would for actual numbers.
here is a possibility using the str_pad function from the stringr package:
weeks <- str_pad(41:65 %% 53 + 1, 2, "left", "0")
years <- ifelse(41:65 <= 52, "2018", "2019")
paste(years, weeks, sep = "-")
[1] "2018-42" "2018-43" "2018-44" "2018-45" "2018-46" "2018-47" "2018-48" "2018-49" "2018-50" "2018-51" "2018-52" "2018-53" "2019-01" "2019-02" "2019-03" "2019-04" "2019-05" "2019-06" "2019-07" "2019-08" "2019-09"
[22] "2019-10" "2019-11" "2019-12" "2019-13"
As I just learned from the other two answers sprintf provides a base alternative to str_pad. So you can also use
weeks <- sprintf("%02d", 41:65 %% 53 + 1)
Here is a possibility using strftime:
weeks <- seq(from = ISOdate(2018,12,10), to = ISOdate(2019,4,1), by="week")
strftime(weeks,format="%Y-%W")

How to Get the Same Weekday Last Year Given any Given Year?

I would like to get the same day last year given any year. How can I best do this in R. For example, given Sunday 2010/01/03, I would like to obtain the Sunday of the same week the year before.
# "Sunday"
weekdays(as.Date("2010/01/03", format="%Y/%m/%d"))
# "Saturday"
weekdays(as.Date("2009/01/03", format="%Y/%m/%d"))
To find the same weekday one year ago, simply subtract 52 weeks or 364 days from the given date:
d <- as.Date("2010-01-03")
weekdays(d)
#[1] "Sunday"
d - 52L * 7L
#[1] "2009-01-04"
weekdays(d - 52L * 7L)
#[1] "Sunday"
Please note that the calendar year has 365 days (or 366 days in a leap year) which is one or two days more than 52 weeks. So, the calendar date of the same weekday one year ago moves on by one or two days. (Or, it explains why New Year's Eve is always on a different weekday.)
Using lubridate the following formula will give you the corresponding weekday in the same week in the previous year:
as.Date(dDate - 364 - ifelse(weekdays( dDate - 363) == weekdays( dDate ), 1, 0))
Where dDate is some date, i.e. dDate <- as.Date("2016-02-29"). The ifelse accounts for leap years.
Here's a simple algorithm. subtract 365 days from the day of interest. Adjust that day to the closest matching day of the week using the Tableau code below (easily translatable into other languages). This is equivalent to the rule in the table below (with 1 = Monday and 7 = Sunday). Basically you adjust day - 365 to be on the correct day of the week either in the same week if that moves <= 3 days otherwise you use the matching weekday from the previous/next week. It'll choose whichever leads to the least difference in terms of # of days.
[day prior year raw] = [day] - 365
[matching day prior year] =
if abs(datepart('weekday',[day]) - datepart('weekday',[day prior year raw]))<= 3
then [day prior year raw]+datepart('weekday',[day]) - datepart('weekday',[day prior year raw])
else [day prior year raw]+(if datepart('weekday',[day]) > datepart('weekday',[day prior year raw])
then -7+(datepart('weekday',[day]) - datepart('weekday',[day prior year raw]))
else 7+(datepart('weekday',[day]) - datepart('weekday',[day prior year raw])) end
)
end)
Look at ?years in package lubridate. This creates a period object which correctly spans a period, across leap years.
> library(lubridate)
> # set the reference date
> d1 = as.Date("2017/01/03", format="%Y/%m/%d")
>
> # verify across years and leap years
> d1 - years(1)
[1] "2016-01-03"
> d1 - years(2)
[1] "2015-01-03"
> d1 - years(3)
[1] "2014-01-03"
> d1 - years(4)
[1] "2013-01-03"
> d1 - years(5)
[1] "2012-01-03"
>
> weekdays(d1 - years(1))
[1] "Sunday"
> weekdays(d1 - years(2))
[1] "Saturday"
>
> # feb 29 on year period in yields NA
> ymd("2016/02/29") - years(1)
[1] NA
>
> # feb 29 in a non-leap year fails to convert
> ymd("2015/02/29") - years(1)
[1] NA
Warning message:
All formats failed to parse. No formats found.
>
> # feb 29, leap year with 4 year period works.
> ymd("2016/02/29") - years(4)
[1] "2012-02-29"
>

Finding the first day of specific months in R

I currently have a column "Month" & a column "DayWeek" with the Month and Day of the week written out. Using the code below I can get a column with a 1 for each Wednesday in Feb, May, Aug & Nov. Im struggling to find a way to get a column with 1s just for the first Wednesday of each of the 4 months I just mentioned. Any ideas or do I have to create a loop for it?
testPrices$Rebalance <- ifelse((testPrices$Month=="February" & testPrices$DayWeek == "Wednesday"),1,ifelse((testPrices$Month=="May" & testPrices$DayWeek == "Wednesday"),1,ifelse((testPrices$Month=="August" & testPrices$DayWeek == "Wednesday"),1,ifelse((testPrices$Month=="November" & testPrices$DayWeek == "Wednesday"),1,0))))
Well, without a reproducible example, I couldn't come up with a complete solution, but here is a way to generate the first Wednesday date of each month. In this example, I start at 1 JAN 2013 and go out 36 months, but you can figure out what's appropriate for you. Then, you can check against the first Wednesday vector produced here to see if your dates are members of the first Wednesday of the month group and assign a 1, if so.
# I chose this as an origin
orig <- "2013-01-01"
# generate vector of 1st date of the month for 36 months
d <- seq(as.Date(orig), length=36, by="1 month")
# Use that to make a list of the first 7 dates of each month
d <- lapply(d, function(x) as.Date(seq(1:7),origin=x)-1)
# Look through the list for Wednesdays only,
# and concatenate them into a vector
do.call('c', lapply(d, function(x) x[strftime(x,"%A")=="Wednesday"]))
Output:
[1] "2013-01-02" "2013-02-06" "2013-03-06" "2013-04-03" "2013-05-01" "2013-06-05" "2013-07-03"
[8] "2013-08-07" "2013-09-04" "2013-10-02" "2013-11-06" "2013-12-04" "2014-01-01" "2014-02-05"
[15] "2014-03-05" "2014-04-02" "2014-05-07" "2014-06-04" "2014-07-02" "2014-08-06" "2014-09-03"
[22] "2014-10-01" "2014-11-05" "2014-12-03" "2015-01-07" "2015-02-04" "2015-03-04" "2015-04-01"
[29] "2015-05-06" "2015-06-03" "2015-07-01" "2015-08-05" "2015-09-02" "2015-10-07" "2015-11-04"
[36] "2015-12-02"
Note: I adapted this code from answers found here and here.
I created a sample dataset to work with like this (Thanks #Frank!):
orig <- "2013-01-01"
d <- data.frame(date=seq(as.Date(orig), length=1000, by='1 day'))
d$Month <- months(d$date)
d$DayWeek <- weekdays(d$date)
d$DayMonth <- as.numeric(format(d$date, '%d'))
From a data frame like this, you can extract the first Wednesday of specific months using subset, like this:
subset(d, Month %in% c('January', 'February') & DayWeek == 'Wednesday' & DayMonth < 8)
This takes advantage of the fact that the day number (1..31) will always be between 1 to 7, and obviously there will be precisely one such day. You could do similarly for 2nd, 3rd, 4th Wednesday, changing the condition to accordingly, for example DayMonth > 7 & DayMonth < 15.

Resources