Checking if dates are between a range [duplicate] - r

This question already has answers here:
R - check if string contains dates within specific date range
(2 answers)
Closed 7 years ago.
I have a column with start dates and end dates (plus times).
Then I'd have 31 separate columns, one for each day of the month that contains a 1 or 0 if the start and end dates encompass the day in the column.
I have converted the date values into dates using strptime. I know how to use difftime.
The bit i'm stuck on is actually doing the comparison and checking whether the start date is before or after the date of the column. e.g. i want to know if the start and end date includes the 1st of Jan, then the 2nd of Jan......if the start date is the 5th, i should return 0 for those 2 columns but I don't know how to make the comparison.
Added some sample data
Col 1 Start Date: 01/01/2015 17:00:00
Col 2 End Date: 14/01/2015 10:55:00
Col 3 Jan-01: 1
Col 3 Jan-02: 1
So in columns 3, i'd want to check if start and end date encompasses the 1st of Jan.
The start date can be at some point on the 1st of Jan, e.g. 4pm. If this is the case, i'd like column 3 to return 0.5 days.
Col 1 Start Date: 01/01/2015 00:00:00
Col 2 End Date: 01/01/2015 16:00:00
Col 3 Jan-01: 0.6667
Col 3 Jan-02: 0
Hopefully this is now more clear. I think the complexity of having time and not just returning a Boolean result means this is not a duplicate question.

Since you haven't provided any reproducible date I have created it just to illustrate comparison of two dates and give 1 if TRUE else 0.
Since you mentioned strptimeI am using the same here.
Syntax: ifelse(date1 < date2, 1,0)
> ifelse(strptime(as.Date("2015-12-16"), format = "%Y-%m-%d") < strptime(as.Date("2015-12-17"), format = "%Y-%m-%d"),1,0)
[1] 1
> ifelse(strptime(as.Date("2015-12-18"), format = "%Y-%m-%d") < strptime(as.Date("2015-12-17"), format = "%Y-%m-%d"),1,0)
[1] 0
You can use the same logic to compare two dates.

Related

R; lubridate addition of months gives NA [duplicate]

This question already has answers here:
Add a month to a Date [duplicate]
(8 answers)
Closed 1 year ago.
I am calculating with dates in a For loop. I combine data from two dataframes. Tibble 1 contains variable A, tibble 2 contains variable B and C.
A is a numerical variable, B and C are both dates.
I want to assign variable A a new variable if date B is within the interval of date C + 16 months.
I used the following:
if (B < C + months(16)) { Df1$A = A+1 }
For some dates this does not work. For example October 30th + 16 months = february 30th. The conditional expression fails as there is no true or falls and the for loop stops.
Is there a way to change C + months(16) to the last day of the month if the specific date (february 30th in the example above) does not exist?
You can use %m+% to add 16 Months.
library(lubridate)
ymd('2000-10-30') %m+% months(16)
#[1] "2002-02-28"

Using lubridate with multiple date formats

I have a column of dates that was stored in the format 8/7/2001, 10/21/1990, etc. Two values are just four-digit years. I converted the entire column to class Date using the following code.
lubridate::parse_date_time(eventDate, orders = c('mdy', 'Y'))
It works great, except the values that were just years are converted to yyyy-01-01 and I want them to just be yyyy. Is there a way to keep lubridate from adding on any information that wasn't already there?
Edit: Code to create data frame
id = (1:5)
eventDate = c("10/7/2001", "1989", NA, "5/5/2016", "9/18/2011")
df <- data.frame(id, eventDate)
I do not think is possible to convert your values to Dates, and keep the "yyyy" values intact. And by transforming your "yyyy" values into "yyyy-01-01" the lubridate is doing the right thing. Because dates have order, and if you have other values in your column that have days and months defined, all the other values needs to have these components too.
For example. If I produce the data.frame below. If I ask R, to order the table, according to the date column, the date in the first line ("2020"), comes before the value in the second row ("2020-02-28")? Or comes after it? The value "2020" being the year of 2020, it can actually means every possible day in this year, so how R should treate it? By adding the first day of the year, lubridate is defining these components, and avoiding that R get confused by it.
dates <- c("2020", "2020-02-28", "2020-02-20", "2020-01-10", "2020-05-12")
id <- 1:5
df <- data.frame(
id,
dates
)
id dates
1 1 2020
2 2 2020-02-28
3 3 2020-02-20
4 4 2020-01-10
5 5 2020-05-12
So if you want to mantain the "yyyy" intact, is very likely that they should not rest in your eventDate column, with other values that are in a different structure ("dd/mm/yyyy"). Now if is really necessary to mantain these values intact, I think is best, to keep the values of eventDate column as characters, and store these values as Dates in another column, like this:
df$as_dates <- lubridate::parse_date_time(df$eventDate, orders = c('mdy', 'Y'))
id eventDate as_dates
1 1 10/7/2001 2001-10-07
2 2 1989 1989-01-01
3 3 <NA> <NA>
4 4 5/5/2016 2016-05-05
5 5 9/18/2011 2011-09-18

In R, finding the start and end dates for each interval after using diff()

I am using diff() to find the difference in variables down a column. However, I would also like to display the dates the difference is found over.
For example:
Dates <- c("2017-06-07","2017-06-10","2017-06-15","2017-07-07","2017-07-12","2017-07-18")
Variable<-c(5,6,7,8,9,3)
dd<-diff(Dates)
dv<-diff(Variable)
I'd like to find a way to add columns for the start and end date for each interval, so "06-07" as the start and "06-10" as the end date for the difference between the first 2 variables. Any ideas?
The OP has requested to add columns for the start and end date for each interval.
This can be accomplished by using the head() and tail() functions:
# data provided by OP
Dates <- c("2017-06-07","2017-06-10","2017-06-15","2017-07-07","2017-07-12","2017-07-18")
Variable<-c(5,6,7,8,9,3)
start <- head(Dates, -1) # take all Dates except the last one
end <- tail(Dates, -1L) # take all Dates except the first one
dd <- diff(as.Date(Dates)) # coersion to class Date required for date arthmetic
dv <- diff(Variable)
# create data.frame of intervals
intervals <- data.frame(start, end, dd, dv)
intervals
start end dd dv
1 2017-06-07 2017-06-10 3 days 1
2 2017-06-10 2017-06-15 5 days 1
3 2017-06-15 2017-07-07 22 days 1
4 2017-07-07 2017-07-12 5 days 1
5 2017-07-12 2017-07-18 6 days -6
Note that intervals has 5 rows while the vector of breakpoints Dates it was constructed from has a length of 6.
Are you after the difference in dates?
diff(as.Date(as.character(Dates,format="%Y-%M-%D")))

Subset dataframe in r for a specific month and date

I have a dataframe that looks like this:
V1 V2 V3 Month_nr Date
1 2 3 1 2017-01-01
3 5 6 1 2017-01-02
6 8 9 2 2017-02-01
6 8 9 8 2017-08-01
and I want to take all variables from the data set that have Month=1 (January) and date from 2017-01-01 til 2017-01-31 (so end of January), which means that I want to take the dates as well. I would create a column with days but I have multiple observations for one day and this would be even more confusing. I tried it with this:
df<- filter(df,df$Month_nr == 1, df$Date > 2017-01-01 && df$Date < 2017-01-31)
but it did not work. I would appreciate so much your help! I am desperate at this point. My dataset has measurements for an entire year (from 1 to 12) and hence I filter for months.
The problem is that you didn't put quotation marks around 2017-01-01. Directly putting 2017-01-01 will compute the subtraction and return a number, and then you're comparing a string to a number. You can compare string to string; with string, "2" is still greater than "1", so it would work for comparing dates as strings. BTW, you don't need to write df$ when using filter; you can directly write the column names without quoting when using the tidyverse.
Why do you need to have the month as well as dates in the filter? Just the filter on the dates would work fine. However, you will have to convert the date column into a date object. You can do that as follows:
df$Date_nr <- as.Date(df$Date_nr, format = "%Y-%m-%d")
df_new <- subset(df, Date_nr >= "2017-01-01" & Date_nr <= "2017-01-31")

Calculate the week number (0-53) in year

I have a dataset with locations and dates. I would like to calculate week of the year as number (00–53) but using Thursday as the first day of the week. The data looks like this:
location <- c(a,b,a,b,a,b)
date <- c("04-01-2013","26-01-2013","03-02-2013","09-02-2013","20-02-2013","03-03-2013")
mydf <- data.frame(location, date)
mydf
I know that there is strftime function for calculating week of year but it is only possible to use Monday or Sunday as the first day of the week.
Any help would be highly appreciated.
Just add 4 to the Date-formatted values:
> mydf$Dt <- as.Date(mydf$date, format="%d-%m-%Y")
> weeknum <- as.numeric( format(mydf$Dt+3, "%U"))
> weeknum
[1] 1 4 5 6 7 9
This uses a 0 based counting convention since that is what strftime provides and we are just piggybacking off that code base, so the first Friday in a year that begins on Tuesday as was the case in 2013 would be a 1-week result. Add 1 to the value if you want a 1 based convention. (Fundamentally, Date-formated values are in an integer sequence from the "origin" so they don't really recognize years or weeks. Adding 4 just shifts the reference frame of the underlying Date-integer.)
Edit note. Changed to an add three strategy per Gabor's advice. .... which still does not address the question of how to deal with the last week of the prior year.
Since the question stated that week goes from 00-53 we assume that the week number is the number of Thursdays in the year on or before the date in question. Thus, the first Thursday in the year begins week 1 and week 0 is assigned to any days prior to that.
(There were comments that if the first day of the year were Tuesday then that would be week 1 but if that were the case there could never be a week 0 as seems to be required in the subject so some clarification on precisely what the definition of week number is may be required. Here we are going to use the definition in the preceding paragraph but it would not be hard to change it if we knew what the definition was. For example, if we always wanted the first week in the year to be 1 even if it were a short week then we could add !is.thu(jan1(d)) to the result.)
Both of the solutions below are short enough that they could be expressed in one statement; however, we have factored them into several short functions each for clarity. The first is particularly straight forward but the second is automatically vectorized without the need for a sapply and would likely be more efficient.
1. sum Thursdays in year This solution assumes the input d is of class "Date" and just sums the number of Thursdays in the year before or on it:
is.thu <- function(x) weekdays(x) == "Thursday"
jan1 <- function(x) as.Date(cut(x, "year"))
week4 <- function(d) {
sapply(d, function(d) sum(is.thu(seq(jan1(d), d, by = "day"))))
}
We can test it like this:
d <- as.Date(c("2013-01-04", "2013-01-26", "2013-02-03", "2013-02-09",
"2013-02-20", "2013-03-03"))
week4(d) # 1 4 5 6 7 9
2. nextthu
Based on the nextfri function in the zoo quickref vignette we see that the number of days since the Epoch (1970-01-01) of the next Thursday (or the day in question if its already a Thursday) is as given by nextthu in the first line below. Applying this to the first day of the year we derive the result where d is as before:
nextthu <- function(d) 7 * ceiling(as.numeric(d) / 7)
week4a <- function(d) (as.numeric(d) - nextthu(jan1(d))) %/% 7 + 1
and here is a test
week4a(d) # 1 4 5 6 7 9
ADDED: fixed bug in second solution.

Resources