I have date values contained in text, each containing a half of the year:
date_by_half <- c("2016 H1", "2017 H2", "2018 H1")
I'd like to extract the date from text and store as the first day of each half or "semester". So, something like:
ysemester(date_by_half)
#[1] "2016-01-01" "2017-07-01" "2018-01-01"
I'm familiar with lubridate::yq() function, but I found that this only works for quarters.
lubridate::yq(date_by_half)
#[1] "2016-01-01" "2017-04-01" "2018-01-01"
Right now my work around is to replace H2 with Q3:
lubridate::yq(stringr::str_replace(date_by_half,"H2", "Q3"))
#[1] "2016-01-01" "2017-07-01" "2018-01-01"
However, I'm wondering if there is a more eloquent solution using lubridate (or some other quick and reusable method).
One liners
These one-liners use only base R:
1) read.table/ISOdate
with(read.table(text = date_by_half), as.Date(ISOdate(V1, ifelse(V2=="H1",1,7), 1)))
## [1] "2016-01-01" "2017-07-01" "2018-01-01"
2) sub Even shorter is:
as.Date(sub(" H2", "-7-1", sub(" H1", "-1-1", date_by_half)))
## [1] "2016-01-01" "2017-07-01" "2018-01-01"
S3
Another approach would be to create an S3 class, "half", for half year dates. We will only implement the methods we need.
as.half <- function(x, ...) UseMethod("as.half")
as.half.character <- function(x, ...) {
year <- as.numeric(sub("\\D.*", "", x))
half <- as.numeric(sub(".*\\D", "", x))
structure(year + (half - 1)/2, class = "half")
}
as.Date.half <- function(x, ...) {
as.Date(ISOdate(as.integer(x), 12 * (x - as.integer(x)) + 1, 1))
}
# test
as.Date(as.half(date_by_half))
## [1] "2016-01-01" "2017-07-01" "2018-01-01"
You can make your own function to do the trick.
# Your data
date_by_half <- c("2016 H1", "2017 H2", "2018 H1")
# Function to do the work
year_dater <- function(dates) {
year <- substr(dates, 1, 4)
quarter <- substr(dates, 6, 7)
month <- ifelse(quarter=="H1", 1, 7)
dates <- paste0(year, "-", month, "-", rep(1, length(month)))
return(dates)
}
# Running the function
dates <- year_dater(date_by_half)
# As date format
as.POSIXct(dates)
"2016-01-01 CET" "2017-07-01 CEST" "2018-01-01 CET"
We can use ceiling_date function from lubridate with unit as "halfyear" and change_on_boundary parameter set to FALSE so that the dates on boundary (2018-01-01, 2017-07-01 etc.) are never rounded up along with yq function.
library(lubridate)
ceiling_date(yq(date_by_half), unit = "halfyear", change_on_boundary = FALSE)
#[1] "2016-01-01" "2017-07-01" "2018-01-01"
Related
I'm trying to parse multiple date formats based on their position in a vector of dates. At some the data switched the format it used from y/m/d to y/d/m. This is annoying for dates like 2010/07/03 where specifying the order in lubridate .
This is an example of dates
datevec <- c("2011/07/01", "2011/07/02", "2011/07/03", "2011/02/07" )
The dates are set up so before a certain row the dates are one format and after another row the dates are another format, so I'm trying to provide an index to the function
when I tried to parse them using this plus lubridate it only returned 3 dates.
lapply(datevec, function(x, i) ifelse( x[i] <4, parse_date_time(x, "%Y-%m-%d"), parse_date_time(x,"%Y-%d-%m" )) )
1) If we changed the ifelse in the question to a plain if then the basic idea in the question works with appropriate modifications. Note that it gives a list L so assuming we really want a vector we add the last line of code.
f <- function(x, i) if (i < 4)
parse_date_time(x, "ymd") else parse_date_time(x, "ydm")
L <- Map(f, datevec, seq_along(datevec), USE.NAMES = FALSE)
do.call("c", L)
## [1] "2011-07-01 UTC" "2011-07-02 UTC" "2011-07-03 UTC" "2011-02-07 UTC"
2) Use the ifelse on the format part rather than on the date part and use as.Date instead of parse_date_time:
ix <- seq_along(datevec)
as.Date(datevec, ifelse(ix < 4, "%Y/%m/%d", "%Y/%d/%m"))
## [1] "2011-07-01" "2011-07-02" "2011-07-03" "2011-07-02"
3) Convert the first 3 using ymd and the rest using ydm and then concatenate.
c(ymd(head(datevec, 3)), ydm(tail(datevec, -3)))
## [1] "2011-07-01" "2011-07-02" "2011-07-03" "2011-07-02"
4) or with only base R:
c(as.Date(head(datevec, 3)), as.Date(tail(datevec, -3), "%Y/%d/%m"))
## [1] "2011-07-01" "2011-07-02" "2011-07-03" "2011-07-02"
5) Another approach is to convert the later dates using string manipulation so that all the dates are in the same format and then use as.Date or ymd:
ix <- seq_along(datevec)
swap <- sub("(..)/(..)$", "\\2/\\1", datevec)
as.Date(ifelse(ix < 4, datevec, swap))
## [1] "2011-07-01" "2011-07-02" "2011-07-03" "2011-07-02"
6) The above codes return Date class, which is more appropriate for dates without times but if for some reason you really need POSIXct use as.POSIXct on the above or else use parse_date_time like this:
c(parse_date_time(head(datevec, 3), "ymd"), parse_date_time(tail(datevec, -3), "ydm"))
## [1] "2011-07-01 UTC" "2011-07-02 UTC" "2011-07-03 UTC" "2011-07-02 UTC"
I have a large date frame of over 100k rows. The date column contains dates in multiple formats such as "%m/%d/%Y", "%Y-%m", "%Y", and "%Y-%m-%d". I can convert these all to dates with parse_date_time() from lubridate.
dates <- c("05/10/1983","8/17/2014","1953-12","1975","2001-06-17")
parse_date_time(dates, orders = c("%m/%d/%Y","%Y-%m","%Y","%Y-%m-%d"))
[1] "1983-05-10 UTC" "2014-08-17 UTC" "1953-12-01 UTC" "1975-01-01 UTC" "2001-06-17 UTC"
But as you can see, this sets dates with missing day to the first of the month and dates with missing month and day to the first of the year. How can I set those to the 15th and June 15th, respectively?
Use nchar to check the dates vector and paste what is missing.
library(lubridate)
dates <- c("05/10/1983","8/17/2014","1953-12","1975","2001-06-17")
dates <- ifelse(nchar(dates) == 4, paste(dates, "06-15", sep = "-"),
ifelse(nchar(dates) == 7, paste(dates, 15, sep = "-"), dates))
dates
#[1] "05/10/1983" "8/17/2014" "1953-12-15" "1975-06-15"
#[5] "2001-06-17"
parse_date_time(dates, orders = c("%m/%d/%Y","%Y-%m","%Y","%Y-%m-%d"))
#[1] "1983-05-10 UTC" "2014-08-17 UTC" "1953-12-15 UTC"
#[4] "1975-06-15 UTC" "2001-06-17 UTC"
Another solution would be to use an index vector, also based on nchar.
n <- nchar(dates)
dates[n == 4] <- paste(dates[n == 4], "06-15", sep = "-")
dates[n == 7] <- paste(dates[n == 7], "15", sep = "-")
dates
#[1] "05/10/1983" "8/17/2014" "1953-12-15" "1975-06-15"
#[5] "2001-06-17"
As you can see, the result is the same as with ifelse.
Here's another way of doing that - based on orders:
library(lubridate)
dates <- c("05/10/1983","8/17/2014","1953-12","1975","2001-06-17")
parseDates <- function(x, orders = c('mdY', 'dmY', 'Ymd', 'Y', 'Ym')){
fmts <- guess_formats(x, orders = orders)
dte <- parse_date_time(x, orders = fmts[1], tz = 'UTC')
if(!grepl('m', fmts[1]) ){
dte <- dte + days(165)
return(dte)
}
if(!grepl('d', fmts[1]) ){
dte <- dte + days(14)
}
return(dte)
}
output
> parseDates(dates[4])
[1] "1975-06-15 UTC"
> parseDates(dates[3])
[1] "1953-12-15 UTC"
This way for different date formats you only need to change the orders argument while the rest is done using lubridate.
Hope this is helpful!
I'm using R and want to construct a vector of dates, consisting of the 1st and 15th of each month, to use as breaks along the x-axis in a plot.
There are a lot of ways to do this but I'm trying to find the most elegant, straightforward approach.
My own solution to this is to create a full vector of dates and then discard the ones I don't need by checking the day.
library(lubridate)
library(magrittr)
myDateBreaks = function(start, end, days=c(1, 15){
dateBreaks = seq(as.Date(start), as.Date(end), by="1 day")
dateBreaks %<>% .[day(dateBreaks) %in% days]
return(dateBreaks)
}
x <- seq(as.Date("2015-01-01"), as.Date("2015-12-31"), by = "month")
rep(x, each = 2) + rep(c(0, 14), length(x))
#[1] "2015-01-01" "2015-01-15" "2015-02-01" "2015-02-15" "2015-03-01" "2015-03-15" "2015-04-01" "2015-04-15" "2015-05-01" "2015-05-15" "2015-06-01" "2015-06-15"
#[13] "2015-07-01" "2015-07-15" "2015-08-01" "2015-08-15" "2015-09-01" "2015-09-15" "2015-10-01" "2015-10-15" "2015-11-01" "2015-11-15" "2015-12-01" "2015-12-15"
I created a function that coerce a vector of quarters-years format to a vector of dates.
.quarter_to_date(c("Q1/13","Q2/14"))
[1] "2013-03-01" "2014-06-01"
This the code of my function.
.quarter_to_date <-
function(x){
ll <- strsplit(gsub('Q([0-9])[/]([0-9]+)','\\1,\\2',x),',')
res <- lapply(ll,function(x){
m <- as.numeric(x[1])*3
m <- ifelse(nchar(m)==1,paste0('0',m),as.character(m))
as.Date(paste(x[2],m,'01',sep='-'),format='%y-%m-%d')
})
do.call(c,res)
}
My function works fine but it looks long and a little bit complicated. I think that this should be already done in other packages( lubridate for example) But I can't find it. Can someone help me to simplify this code please?
1) The zoo package has a "yearqtr" class. Convert to that and then to "Date" class:
library(zoo)
x <- c("Q1/13","Q2/14")
as.Date(as.yearqtr(x, format = "Q%q/%y"))
## [1] "2013-01-01" "2014-04-01"
2) Alternately use this to get the last day of the quarter instead of the first:
as.Date(as.yearqtr(x, format = "Q%q/%y"), frac = 1)
## [1] "2013-03-31" "2014-06-30"
3) Also consider not converting to "Date" class at all and just using "yearqtr" class directly:
as.yearqtr(x, format = "Q%q/%y")
## [1] "2013 Q1" "2014 Q2"
How do I get a sequence of monthly dates that ends on a given month and has a given length? seq(as.Date(*), length, by="month") assumes the start date is given, not the end date, and AFAIK it's impossible to specify a negative value for by in this case.
ETA: that is, I want a sequence that spans a given period, but one whose end point is specified rather than the start point. So, something like seq(to="2000-03-01", len=3, by="month") --> 2000-01-01, 2000-02-01, 2000-03-01.
Try this:
rev(seq(as.Date("2000-03-01"), length = 3, by = "-1 month"))
## [1] "2000-01-01" "2000-02-01" "2000-03-01"
library(lubridate)
ymd('2011-03-03') - months(0:5)
Maybe you could just compute it forward, using by=month as the +1 increment, and then reverse:
R> rev(seq(as.Date("2011-01-01"), length=6, by="month"))
[1] "2011-06-01" "2011-05-01" "2011-04-01" "2011-03-01" "2011-02-01" "2011-01-01"
Here you go. Base functions only:
last.days.of.month <- function(dt) {ldt<- as.POSIXlt(dt)
ldt$mon <- ldt$mon+1
ldt$mday <- 1
return(format( ldt -1, "%Y-%m-%d"))}
last.days.of.month(as.Date(c("2010-01-06","2010-03-06", "2010-02-06")) )
# [1] "2010-01-31" "2010-03-31" "2010-02-28"
seq.ldom <- function(dt, nmonths) {ldt<- rep(as.POSIXlt(dt)[1], nmonths)
ldt$mon <- ldt$mon+seq(1:nmonths)
ldt$mday <- 1
return(format( ldt -1, "%Y-%m-%d"))}
seq.ldom(as.Date("2010-01-06"), 5)
#[1] "2010-01-31" "2010-02-28" "2010-03-31" "2010-04-30"
#[5] "2010-05-31"
Oh, for some reason I thought you wanted the last days of the month. Sorry about the useless code. The first days of the month is not hard.
seq.fdom <- function(dt, nmonths) {ldt<- rep(as.POSIXlt(dt)[1], nmonths)
ldt$mon <- ldt$mon+seq(0:(nmonths-1))
ldt$mday <- 1
return(format( ldt , "%Y-%m-%d"))}
seq.fdom(as.Date("2010-01-06"), 5)
#[1] "2010-02-01" "2010-03-01" "2010-04-01" "2010-05-01"
#[5] "2010-06-01"
And getting the prior months either:
seq.prior.fdom <- function(dt, nmonths) {ldt<- rep(as.POSIXlt(dt)[1], nmonths)
ldt$mon <- ldt$mon-rev(0:(nmonths-1))
ldt$mday <- 1
return(format( ldt , "%Y-%m-%d"))}
seq.prior.fdom(as.Date("2010-01-06"), 5)
#[1] "2009-09-01" "2009-10-01" "2009-11-01" "2009-12-01"
#[5] "2010-01-01"
I think the basic principle is clear (if not beaten to death with a canoe paddle.)