I am given a data frame where one of the column parameters is a year and month value (ex "2019-05"). I need to only display rows where the date value is later than a certain value. For instance, if I only wanted to show data later than a given year-month "2018-11".
You can convert them to dates, but in R < and > work on characters too, so you could just do something like this (assuming there's a leading 0 in the months with only 1 digit)
examp <- c('2011-01', '2013-08', '2018-04', '2018-12', '2019-05')
examp[examp > '2018-11']
#[1] "2018-12" "2019-05"
If you want to convert to dates, add a day to the end and use as.Date
examp <- as.Date(paste0(examp, '-01'))
examp
# [1] "2011-01-01" "2013-08-01" "2018-04-01" "2018-12-01" "2019-05-01"
examp[examp > as.Date('2018-11-01')]
# [1] "2018-12-01" "2019-05-01"
Related
I have a vector of date strings in the form month_name-2_digit_year i.e.
a = rbind("April-21", "March-21", "February-21", "January-21")
I'm trying to convert that vector into a vector of date objects. I'm aware this question is very similar to this: Convert non-standard date format to date in R posted some years ago, but unfortunately, it has not answered my question.
I have tried the following as.Date() calls to do this, but it just returns a vector of NA. I.e.
b = as.Date(a, format = "%B-%y")
b = as.Date(a, format = "%B%y")
b = as.Date(a, "%B-%y")
b = as.Date(a, "%B%y")
I'm also attempted to do it using the convertToDate function from the openxlsx package:
b = convertToDate(a, format = "%B-%y")
I have also tried all the above but using a single character string rather than a vector, but that produced the same issue.
I'm a little lost as to why this isn't working, as this format has worked in reverse earlier in my script (that is, I had a date object already in dd-mm-yyyy format and converted it to month_name-yy using %B-%y). Is there another way to go from string to date when the string is a non-standard (anything other than dd-mm-yyy or mm-dd-yy if you're in the US) date format?
For the record my R locales are all UK and english.
Thanks in advance.
A Date must have all three of day, month and year. Convert to yearmon class which requires only month and year and then to Date as in (1) and (2) below or add the day as in (3).
(1) and (3) give first of month and (2) gives the end of the month.
(3) uses only functions from base R.
Also consider not converting to Date at all but just use yearmon objects instead since they directly represent a year and month which is what the input represents.
library(zoo)
# test input
a <- c("April-21", "March-21", "February-21", "January-21")
# 1
as.Date(as.yearmon(a, "%B-%y"))
## [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
# 2
as.Date(as.yearmon(a, "%B-%y"), frac = 1)
## [1] "2021-04-30" "2021-03-31" "2021-02-28" "2021-01-31"
# 3
as.Date(paste(1, a), "%d %B-%y")
## [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
In addition to zoo, which #G. Grothendieck mentioned, you can also use clock or lubridate.
clock supports a variable precision calendar type called year_month_day. In this case you'd want "month" precision, then you can set the day to whatever you'd like and convert back to Date.
library(clock)
x <- c("April-21", "March-21", "February-21", "January-21")
ymd <- year_month_day_parse(x, format = "%B-%y", precision = "month")
ymd
#> <year_month_day<month>[4]>
#> [1] "2021-04" "2021-03" "2021-02" "2021-01"
# First of month
as.Date(set_day(ymd, 1))
#> [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
# End of month
as.Date(set_day(ymd, "last"))
#> [1] "2021-04-30" "2021-03-31" "2021-02-28" "2021-01-31"
The simplest solution may be to use lubridate::my(), which parses strings in the order of "month then year". That assumes that you want the first day of the month, which may or may not be correct for you.
library(lubridate)
x <- c("April-21", "March-21", "February-21", "January-21")
# Assumes first of month
my(x)
#> [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
Hey I have some data aggregated at quarter level and there is a column contains data like this:
> unique(data$fiscalyearquarter)
[1] "2012Q3" "2010Q3" "2012Q1" "2011Q4" "2012Q4" "2008Q1" "2008Q2" "2010Q4" "2010Q1"
[10] "2009Q2" "2012Q2" "2011Q3" "2013Q2" "2013Q1" "2011Q2" "2013Q4" "2009Q4" "2009Q3"
[19] "2011Q1" "2010Q2" "2013Q3" "2008Q4" "2009Q1" "2014Q1" "2008Q3" "2014Q2"
I am thinking about writing a function that turn a string into a timestamp.
Something like this, split the the string to be year and quarter and then force the quarter to be converted to be month(the middle of the quarter).
convert <- function(myinput = "2008Q2"){
year <- substr(myinput, 1, 4)
quarter <- substr(myinput, 6, 6)
month <- 3 * as.numeric(quarter) - 1
date <- as.Date(paste0(year, sprintf("%02d", month), '01'), '%Y%m%d')
return(date)
}
I have to convert those strings to date format and then analyze it from there.
> convert("2010Q3")
[1] "2010-08-01"
Is there any way beyond my hard coding solution to analyze time series problem at quarterly level?
If x is your vector and you're okay having the date be the first day of the quarter:
library(zoo)
as.Date(as.yearqtr(x))
If you want the date to be the first day of the second month of the quarter like your example, you could hack together something like this:
as.Date(format(as.Date(as.yearqtr(x))+40, "%Y-%m-01"))
I have a data frame 'rta' with a date variable (date of death) with data entered in multiple formats like DD/MM/YY, D/M/YY, DD/M/YY, D/MM/YY, DD/MM, D/MM, D/M, DD/M.
rta$date.of.death<-c('12/12/08' ,'1/10/08','4/3/08','24/5/08','23/4','11/11','1/12')
Luckily all the dates belong to the year 2008.
I want to make this variable into a uniform format of DD/MM/YYYY, for example 12/12/2008. How to get it this way?
You could use this quick'n'dirty way:
rta <- data.frame(date.of.death=c('12/12/08' ,'1/10/08', '4/3/08',
'24/5/08','23/4','11/11','1/12'),
stringsAsFactors=F)
# append '/08' to the dates without year
noYear <- grep('.+/.+/.+',rta$date.of.death,invert=TRUE)
rta$date.of.death[noYear] <- paste(rta$date.of.death[noYear],'08',sep='/')
# convert the strings into POSIXct dates
dates <- as.POSIXct(rta$date.of.death, format='%d/%m/%y')
# turn the dates into strings having format: DD/MM/YYYY
rta$date.of.death <- format(dates,format='%d/%m/%Y')
> rta$date.of.death
[1] "12/12/2008" "01/10/2008" "04/03/2008" "24/05/2008" "23/04/2008" "11/11/2008" "01/12/2008"
Note:
this code assumes that no date has a four-digit year e.g. 01/01/2008
Try this:
as.Date(paste0(rta$date.of.death, "/08"), "%d/%m/%y")
giving
[1] "2008-12-12" "2008-10-01" "2008-03-04" "2008-05-24" "2008-04-23"
[6] "2008-11-11" "2008-12-01"
I have this list of dates:
library(lubridate)
my.dates = ymd(c("2013-12-14", "2014-01-18", "2014-01-27", "2013-12-13", "2013-12-29", "2013-12-06"))
The following lubridate::weekfunctions outputs a numeric vector when I convert these dates to week numbers:
week(my.dates)
[1] 50 3 4 50 52 49
Can I get lubridate to output a date ("POSIXct" "POSIXt") object that converts my.dates to a week number and year number. So output should be a date object (not a character or numeric vector) formatted something like this:
[1] "50-2013" "3-2014" "4-2014" "50-2013" "52-2013" "49-2013"
I'm specifically interested in a solution that uses lubridate.
To convert my.dates to a week-year character vector try the following where week and year are lubridate functions:
> paste(week(my.dates), year(my.dates), sep = "-")
[1] "50-2013" "3-2014" "4-2014" "50-2013" "52-2013" "49-2013"
The sample output in the question did not use leading zeros for the week but if leading zeros were desired for the week then:
> sprintf("%02d-%d", week(my.dates), year(my.dates))
[1] "50-2013" "03-2014" "04-2014" "50-2013" "52-2013" "49-2013"
The above are character representations of week-year and do not uniquely identify a date nor can such a format represent a POSIXt object.
I have a large data frame with date variables, which reflect first day of the month. Is there an easy way to create a new data frame date variable that represents the last day of the month?
Below is some sample data:
date.start.month=seq(as.Date("2012-01-01"),length=4,by="months")
df=data.frame(date.start.month)
df$date.start.month
"2012-01-01" "2012-02-01" "2012-03-01" "2012-04-01"
I would like to return a new variable with:
"2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30"
I've tried the following but it was unsuccessful:
df$date.end.month=seq(df$date.start.month,length=1,by="+1 months")
To get the end of months you could just create a Date vector containing the 1st of all the subsequent months and subtract 1 day.
date.end.month <- seq(as.Date("2012-02-01"),length=4,by="months")-1
date.end.month
[1] "2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30"
Here is another solution using the lubridate package:
date.start.month=seq(as.Date("2012-01-01"),length=4,by="months")
df=data.frame(date.start.month)
library(lubridate)
df$date.end.month <- ceiling_date(df$date.start.month, "month") - days(1)
df$date.end.month
[1] "2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30"
This uses the same concept given by James above, in that it gets the first day of the next month and subtracts one day.
By the way, this will work even when the input date is not necessarily the first day of the month. So for example, today is the 27th of the month and it still returns the correct last day of the month:
ceiling_date(Sys.Date(), "month") - days(1)
[1] "2017-07-31"
Use timeLastDayInMonth from the timeDate package:
df$eom <- timeLastDayInMonth(df$somedate)
library(lubridate)
as.Date("2019-09-01") - days(1)
[1] "2019-08-31"
or
library(lubridate)
as.Date("2019-09-01") + months(1) - days(1)
[1] "2019-09-30"
A straightforward solution would be using the yearmonfunction with the argument frac=1 from the xts-package. frac is a number between 0 and 1 that indicates the fraction of the way through the period that the result represents.
as.Date(as.yearmon(seq.Date(as.Date('2017-02-01'),by='month',length.out = 6)),frac=1)
[1] "2017-02-28" "2017-03-31" "2017-04-30" "2017-05-31" "2017-06-30" "2017-07-31"
Or if you prefer “piping” using magrittr:
seq.Date(as.Date('2017-02-01'),by='month',length.out = 6) %>%
as.yearmon() %>% as.Date(,frac=1)
[1] "2017-02-28" "2017-03-31" "2017-04-30" "2017-05-31" "2017-06-30" "2017-07-31"
A function as below would do the work (assume dt is scalar) -
month_end <- function(dt) {
d <- seq(dt, dt+31, by="days")
max(d[format(d,"%m")==format(dt,"%m")])
}
If you have a vector of Dates, then do the following -
sapply(dates, month_end)
you can use timeperiodsR
date.start.month=seq(as.Date("2012-01-01"),length=4,by="months")
df=data.frame(date.start.month)
df$date.start.month
# install.packages("timeperiodsR")
pm <- previous_month(df$date.start.month[1]) # get previous month
start(pm) # first day of previous month
end(pm) # last day of previous month
seq(pm) # vector with all days of previous month
We can also use bsts::LastDayInMonth:
transform(df, date.end.month = bsts::LastDayInMonth(df$date.start.month))
# date.start.month date.end.month
# 1 2012-01-01 2012-01-31
# 2 2012-02-01 2012-02-29
# 3 2012-03-01 2012-03-31
# 4 2012-04-01 2012-04-30
tidyverse has added the clock package in addition to the lubridate package that has nice functionality for this:
library(clock)
date_build(2012, 1:12, 31, invalid = "previous")
# [1] "2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30" "2012-05-31" "2012-06-30"
# [7] "2012-07-31" "2012-08-31" "2012-09-30" "2012-10-31" "2012-11-30" "2012-12-31"
The invalid argument specifies what to do with an invalid date (e.g. 2012-02-31). From the documentation:
"previous": The previous valid instant in time.
"previous-day": The previous valid day in time, keeping the time of
day.
"next": The next valid instant in time.
"next-day": The next valid day in time, keeping the time of day.
"overflow": Overflow by the number of days that the input is invalid
by. Time of day is dropped.
"overflow-day": Overflow by the number of days that the input is
invalid by. Time of day is kept.
"NA": Replace invalid dates with NA.
"error": Error on invalid dates.