R change quarter to timestamp - r

Hey I have some data aggregated at quarter level and there is a column contains data like this:
> unique(data$fiscalyearquarter)
[1] "2012Q3" "2010Q3" "2012Q1" "2011Q4" "2012Q4" "2008Q1" "2008Q2" "2010Q4" "2010Q1"
[10] "2009Q2" "2012Q2" "2011Q3" "2013Q2" "2013Q1" "2011Q2" "2013Q4" "2009Q4" "2009Q3"
[19] "2011Q1" "2010Q2" "2013Q3" "2008Q4" "2009Q1" "2014Q1" "2008Q3" "2014Q2"
I am thinking about writing a function that turn a string into a timestamp.
Something like this, split the the string to be year and quarter and then force the quarter to be converted to be month(the middle of the quarter).
convert <- function(myinput = "2008Q2"){
year <- substr(myinput, 1, 4)
quarter <- substr(myinput, 6, 6)
month <- 3 * as.numeric(quarter) - 1
date <- as.Date(paste0(year, sprintf("%02d", month), '01'), '%Y%m%d')
return(date)
}
I have to convert those strings to date format and then analyze it from there.
> convert("2010Q3")
[1] "2010-08-01"
Is there any way beyond my hard coding solution to analyze time series problem at quarterly level?

If x is your vector and you're okay having the date be the first day of the quarter:
library(zoo)
as.Date(as.yearqtr(x))
If you want the date to be the first day of the second month of the quarter like your example, you could hack together something like this:
as.Date(format(as.Date(as.yearqtr(x))+40, "%Y-%m-01"))

Related

how to arrange false(no exist) character column to date in r?(similar Converting character column to Date in r)

ymd("2011-11-31")
All formats failed to parse. No formats found.[1] NA
2011-11 have 30 days not 31 so ymd get failed state.
My data have some false date in date column like this and I want to learn elegant way to handle.
is there any package or function that data to turn like this "2011-12-01"?
Not that I know of, but you could define your own function to handle it.
Here I take the year-month portion of the date and then add the number of days on and let it wrap into the next month (or even year) if required.
# two invalid, one valid date
x <- c("2011-11-31", "2000-04-31", "2010-01-10", "2011-12-32")
parse_bad_dates <- function(x) {
as.Date(paste(substr(x, 1, 7), "1"), format="%Y-%m %d") +
as.numeric(substr(x, 9, 10)) - 1
}
parse_bad_dates(x)
#[1] "2011-12-01" "2000-05-01" "2010-01-10" "2012-01-01"
Similar answer here but works with rolling months and years too
library(lubridate)
d <- c("2011-11-31",'2011-13-04','2011-12-32')
parse_false_date <- function(d) {
x <- strcapture("(\\d{4})-(\\d{2})-(\\d{2})", d,
data.frame(y=integer(),m=integer(),d=integer()))
make_date(x$y)+months(x$m-1)+days(x$d-1)
}
parse_false_date(d)
#> [1] "2011-12-01" "2012-01-04" "2012-01-01"

How to parse an invalid date with lubridate?

I need to parse dates and have a cases like "31/02/2018":
library(lubridate)
> dmy("31/02/2018", quiet = T)
[1] NA
This makes sense as the 31st of Feb does not exist. Is there a way to parse the string "31/02/2018" to e.g. 2018-02-28 ? So not to get an NA, but an actual date?
Thanks.
We can write a function assuming you would only have dates which could be higher than the actual date and would have the same format always.
library(lubridate)
get_correct_date <- function(example_date) {
#Split vector on "/" and get 3 components (date, month, year)
vecs <- as.numeric(strsplit(example_date, "\\/")[[1]])
#Check number of days in that month
last_day_of_month <- days_in_month(vecs[2])
#If the input date is higher than actual number of days in that month
#replace it with last day of that month
if (vecs[1] > last_day_of_month)
vecs[1] <- last_day_of_month
#Paste the date components together to get new modified date
dmy(paste0(vecs, collapse = "/"))
}
get_correct_date("31/02/2018")
#[1] "2018-02-28"
get_correct_date("31/04/2018")
#[1] "2018-04-30"
get_correct_date("31/05/2018")
#[1] "2018-05-31"
With small modification you can adjust the dates if they have different format or even if some dates are smaller than the first date.

How to format a Date as "YYYY-Mon" with Lubridate?

I would like to create a vector of dates between two specified moments in time with step 1 month, as described in this thread (Create a Vector of All Days Between Two Dates), to be then converted into factors for data visualization.
However, I'd like to have the dates in the YYYY-Mon, ie. 2010-Feb, format. But so far I managed only to have the dates in the standard format 2010-02-01, using a code like this:
require(lubridate)
first <- ymd_hms("2010-02-07 15:00:00 UTC")
start <- ymd(floor_date(first, unit="month"))
last <- ymd_hms("2017-10-29 20:00:00 UTC")
end <- ymd(ceiling_date(last, unit="month"))
> start
[1] "2010-02-01"
> end
[1] "2017-11-01"
How can I change the format to YYYY-Mon?
You can use format():
start %>% format('%Y-%b')
To create the vector, use seq():
seq(start, end, by = 'month') %>% format('%Y-%b')
Obs: Use capital 'B' for full month name: '%Y-%B'.

Separate Date into week and year

Currently my dataframe has dates displayed in the 'Date' column as 01/01/2007 etc I would like to convert these into a week/year value i.e. 01/2007. Any ideas?
I have been trying things like this and getting no where...
enviro$Week <- strptime(enviro$Date, format= "%W/%Y")
You have to first convert to date, then you can convert back to the week of the year using format, for example:
### Converts character to date
test.date <- as.Date("10/10/2014", format="%m/%d/%Y")
### Extracts only Week of the year and year
format(test.date, format="Week number %W of %Y")
[1] "Week number 40 of 2014"
### Or if you prefer
format(date, format="%W/%Y")
[1] "40/2014"
So, in your case, you would do something like this:
enviro$Week <- format(as.Date(enviro$Date, format="%m/%d/%Y"), format= "%W/%Y")
But remember that the part as.Date(enviro$Date, format="%m/%d/%Y") is only necessary if your data is not in Date format, and you also should put the right format parameter to convert your character to Date, if that is the case.
What is the class of enviro$Date? If it is of class Date there is probably a better way of doing this, otherwise you can try
v <- strsplit(as.character(enviro$Date), split = "/")
weeks <- sapply(v, "[", 2)
years <- sapply(v, "[", 3)
enviro$Week <- paste(weeks, years, sep = "/")

Create a proper date variable from an existing date variable in R

I have a data frame 'rta' with a date variable (date of death) with data entered in multiple formats like DD/MM/YY, D/M/YY, DD/M/YY, D/MM/YY, DD/MM, D/MM, D/M, DD/M.
rta$date.of.death<-c('12/12/08' ,'1/10/08','4/3/08','24/5/08','23/4','11/11','1/12')
Luckily all the dates belong to the year 2008.
I want to make this variable into a uniform format of DD/MM/YYYY, for example 12/12/2008. How to get it this way?
You could use this quick'n'dirty way:
rta <- data.frame(date.of.death=c('12/12/08' ,'1/10/08', '4/3/08',
'24/5/08','23/4','11/11','1/12'),
stringsAsFactors=F)
# append '/08' to the dates without year
noYear <- grep('.+/.+/.+',rta$date.of.death,invert=TRUE)
rta$date.of.death[noYear] <- paste(rta$date.of.death[noYear],'08',sep='/')
# convert the strings into POSIXct dates
dates <- as.POSIXct(rta$date.of.death, format='%d/%m/%y')
# turn the dates into strings having format: DD/MM/YYYY
rta$date.of.death <- format(dates,format='%d/%m/%Y')
> rta$date.of.death
[1] "12/12/2008" "01/10/2008" "04/03/2008" "24/05/2008" "23/04/2008" "11/11/2008" "01/12/2008"
Note:
this code assumes that no date has a four-digit year e.g. 01/01/2008
Try this:
as.Date(paste0(rta$date.of.death, "/08"), "%d/%m/%y")
giving
[1] "2008-12-12" "2008-10-01" "2008-03-04" "2008-05-24" "2008-04-23"
[6] "2008-11-11" "2008-12-01"

Resources