I would like to know is there a way to transform dates like this
"2016-01-8" in "20160101q" which means the first half of January 2016 or
"20160127" in "20160102q" which means the second half of January 2016 for example and thank you in advance?
here is a solution makeing use of data.table and the lubridate-packages.
It uses the lubridate::days_in_month()-function, to determine the number of days in the month of the date. This is necessairey, since February has (normally) 28 days, so day 15 of February --> 02q. But January has 31 days, so day 15 of January --> 01q.
The logic for calculating the q-periode is:
If day_number / number_of_days_in_month > 0.5 --> q periode = 02q,
else q_period --> 01q.
Then a paste0 command is used to crete the text for in de q_date-column. sprintf() is used to add leading zero for single-digit monthnumbers.
library(data.table)
library(lubridate)
#sample data
data <- data.table( date = as.Date( c("2019-12-30", "2020-01-15", "2020-02-15", "2020-02-14") ) )
# date
# 1: 2019-12-30
# 2: 2020-01-15
# 3: 2020-02-15
# 4: 2020-02-14
#if the day / #days of month > 0.5, date is in q2, else q1
data[ lubridate::mday(date) / lubridate::days_in_month(date) > 0.5,
q_date := paste0( lubridate::year(date), sprintf( "%02d", lubridate::month(date) ), "02q" ) ]
data[ is.na( q_date ),
q_date := paste0( lubridate::year(date), sprintf( "%02d", lubridate::month(date) ), "01q" ) ]
# date q_date
# 1: 2019-12-30 20191202q
# 2: 2020-01-15 20200101q
# 3: 2020-02-15 20200202q
# 4: 2020-02-14 20200201q
you can try with mutate and paste0, first you decompose the date in day, month and year. then create a variable that says if we are in the first or second half of the month, then paste the sting text of month, year and the variable containing "01q" or "02q" depending on the period
date<- c("2016-01-8",
"2016-01-27")
id <- c(1,2)
x <- as.data.frame(cbind(id, date))
library(tidyverse)
library(lubridate)
x = x %>%
mutate(date = ymd(date)) %>%
mutate_at(vars(date), funs(year, month, day))
x$half <- "01q"
x$half[day>15] <- "02q"
paste0(x$year,x$month,x$half)
Related
Been a little stuck on this for a couple days.
Let's say I have a cohort of 2 people.
Person 1 was in cohort from 01/01/2000 to 01/03/2001.
Person 2 was in cohort from 01/01/1999 to 31/12/2001.
This means person 1 was in the cohort for all of 2000 and 25% of 2001.
Person 2 was in the cohort for all of 1999, all of 2000, and all of 2001.
Adding this together means that, in total, the cohort contributed 1 year of person-time in 1999,
2 years of person-time in 2000, and 1.25 years of person-time in 2001.
Does anyone know of any R functions that might help with dividing up/summing time elapsed between dates like this? I could write it all from scratch, but I'd like to use existing functions if they're out there, and Google has got me nowhere.
Thanks!
Using data.table and lubridate:
Data <- Data[, .(Start, Start2 = seq(Start, End, by="year"), End), by=.(Person)]
Data[, End2 := Start2+years(1)-days(1)]
Data[year(Start2) != year(Start), Start := Start2]
Data[year(End2) != year(End), End := End2]
Data[, c("Year", "Contribution") := list(year(Start), (month(End)-month(Start)+1)/12)]
Data <- Data[, .(Contribution = sum(Contribution)), by=.(Year)][order(Year)]
Which gives:
> Data
Year Contribution
1: 1999 1.00
2: 2000 2.00
3: 2001 1.25
This is a possible generalized tidyverse approach also using lubridate. This creates rows for each year and appropriate time intervals for each person-year. The intersection between the calendar year and person-year interval will be the contribution summed up in the end. Note that Jan 1 to Mar 1 here would be considered 2 months or 1/6 of a year contribution (not 25%).
df <- data.frame(
person = c("Person 1", "Person 2"),
start = c("01/01/2000", "01/01/1999"),
end = c("01/03/2001", "31/12/2001")
)
df$start <- dmy(df$start)
df$end <- dmy(df$end)
library(lubridate)
library(tidyverse)
df %>%
mutate(date_int = interval(start, end),
year = map2(year(start), year(end), seq)) %>%
unnest(year) %>%
mutate(
year_int = interval(
as.Date(paste0(year, '-01-01')), as.Date(paste0(year, '-12-31'))
),
year_sect = intersect(date_int, year_int)
) %>%
group_by(year) %>%
summarise(contribute = signif(sum(as.numeric(year_sect, "years")), 2))
Output
year contribute
<int> <dbl>
1 1999 1
2 2000 2
3 2001 1.2
I want to calculate log returns for a stock in R. The issue is that my financial year is from April 1 to March 31. I have tried using packages tidyquant and tidyverse. The code I have tried is as follows:
library(tidyquant)
RIL<- tq_get("RELIANCE.NS") # download the stock price data of Reliance Industries Limited listed on NSE of India. The data is from January 2011 to May 2021.
library(tidyverse)
RIL1<- RIL %>% mutate(CalYear = year(date),
Month = month(date),
FinYear = if_else(Month<4,CalYear,CalYear+1)) # This creates a new variable called FinYear, which correctly shows the financial year. If the month is >3 (ie March), the financial year is calendar year +1.
RIL_Returns<- RIL1 %>%
group_by(FinYear) %>%
tq_transmute(select = adjusted,
mutate_fun = periodReturn,
period = "yearly",
type = "log") #This part of the code has the problem.
From this code, I get two values for log returns per each year. This can't be true. I want a table with columns FinYear and Log_Returns, where Log_Returns is defined as ln(adjusted close price for the last trading day of given FinYear/adjusted close price for the first trading day of the given FinYear). How can I do this?
Perhaps this is not the most elegant but I think it works, I obtained the first and last day of each year manually and computed the log returns accordingly
# Get data
library("tibble")
library("tidyquant")
RIL<- tq_get("RELIANCE.NS")
RIL1<- RIL %>% mutate(CalYear = year(date),
Month = month(date),
FinYear = if_else(Month<4,CalYear,CalYear+1))
# Get minimum and max dates in each year
start_dates = c()
end_dates = c()
for(year in format(min(RIL1$date),"%Y"):format(max(RIL1$date),"%Y")){
start_dates =
c(start_dates,
min(RIL1$date[format(RIL1$date, "%Y") == format(as.Date(ISOdate(year, 1, 1)),"%Y")])
)
end_dates =
c(end_dates,
max(RIL1$date[format(RIL1$date, "%Y") == format(as.Date(ISOdate(year, 1, 1)),"%Y")])
)
}
# Get filtered data
RIL2 <- RIL1[(RIL1$date %in% start_dates | RIL1$date %in% end_dates),]
# Get log returns, even indexes represent end of each year rows
end_adjusted = RIL2$adjusted[1:length(RIL2$adjusted) %% 2 == 0]
beginning_adjusted = RIL2$adjusted[1:length(RIL2$adjusted) %% 2 != 0]
log_returns = log(end_adjusted/beginning_adjusted)
# Put log returns and years in a tibble.
result = tibble(log_returns ,format(RIL2$date[1:length(RIL2$date) %% 2 == 0], "%Y"))
# Result
result
Outputs
# A tibble: 11 x 2
log_returns `format(RIL2$date[1:length(RIL2$date)%%2 == 0],…
<dbl> <chr>
1 -0.412 2011
2 0.185 2012
3 0.0739 2013
4 0.0117 2014
5 0.145 2015
6 0.0743 2016
7 0.537 2017
8 0.215 2018
9 0.306 2019
10 0.287 2020
11 0.0973 2021
I have the following df with the Date column having hourly marks for an entire year:
Date TD RN D.RN Press Temp G.Temp. Rad
1 2018-01-01 00:00:00 154.0535 9.035156 1.416667 950.7833 7.000000 60.16667 11.27000
2 2018-01-01 01:00:00 154.5793 9.663900 1.896667 951.2000 6.766667 59.16667 11.23000
3 2018-01-01 01:59:59 154.5793 7.523438 2.591667 951.0000 6.066667 65.16667 11.23500
4 2018-01-01 02:59:59 154.0535 7.994792 2.993333 951.1833 5.733333 64.00000 11.16833
5 2018-01-01 03:59:59 154.4041 6.797526 3.150000 951.4833 5.766667 57.83333 11.13500
6 2018-01-01 04:59:59 155.1051 12.009766 3.823333 951.0833 5.216667 61.33333 11.22167
I want to add a factor column 'Quarters' that indicates each quarter according to the 'Date'.
As far as I understand I can do that by:
Radiation$Quarter<-cut(Radiation$Date, breaks = "quarters", labels = c("Q1", "Q2", "Q3", "Q4"))
But I also want to add a factor column 'Day/Night' which indicates whether it's day or night, having:
Day → 8am - 8pm
Night → 8pm - 8am
It seems like with the cut() function there's no way to indicate time ranges.
You can use an ifelse/case_when statement after extracting hour from time.
library(dplyr)
library(lubridate)
df %>%
mutate(hour = hour(Date),
label = case_when(hour >= 8 & hour <= 19 ~ 'Day',
TRUE ~ 'Night'))
In base R :
df$hour = as.integer(format(df$Date, '%H'))
transform(df, label = ifelse(hour >= 8 & hour <= 19, 'Day', 'Night'))
We can also do
library(dplyr)
library(lubridate)
df %>%
mutate(hour = hour(Date),
label = case_when(between(hour, 8, 19) ~ "Day", TRUE ~ "Night"))
I have an R script that I run monthly. I'd like to subset my data frame to only show data within a 6 month time period, but each month I'd like the time period to move forward one month.
Original data frame from Sept.:
ID Name Date
1 John 1/1/2020
2 Adam 5/2/2020
3 Kate 9/30/2020
4 Jill 10/15/2020
After subsetting for only dates from May 1, 2020 - Sept. 30, 2020:
ID Name Date
2 Adam 5/2/2020
3 Kate 9/30/2020
The next month when I run my script, I'd like the dates it's subsetting to move forward by one month, so June 1, 2020 - Oct. 31, 2020:
ID Name Date
3 Kate 9/30/2020
4 Jill 10/15/2020
Right now, I'm changing this part of my script manually each month, ie:
df$Date >= subset(df$Date >= '2020-05-01' & df$date <= '2020-09-30')
Is there a way to make this automatic, so that I don't have to manually move forward the date one month every time?
We can use between after converting the 'Date' to Date class
library(dplyr)
library(lubridate)
start <- as.Date("2020-05-01")
end <- as.Date("2020-09-30")
df1 %>%
mutate(Date = mdy(Date)) %>%
filter(between(Date, start, end))
# ID Name Date
#1 2 Adam 2020-05-02
#2 3 Kate 2020-09-30
In the next month, we can change the 'start', 'end' by adding 1 month
start <- start %m+% months(1)
end <- ceiling_date(end %m+% months(1), 'months') - days(1)
start
#[1] "2020-06-01"
end
#[1] "2020-10-31"
using base R and no package dependency.
Data:
dt <- read.table(text = 'ID Name Date
1 John 1/1/2020
2 Adam 3/2/2021
3 Kate 12/30/2020
4 Jill 5/15/2021', header = TRUE, stringsAsFactors = FALSE)
Code:
date_format <- "%m/%d/%Y"
dt$Date <- as.Date(dt$Date, format = date_format)
today <- Sys.Date()
six_month <- today+(6*30)
start <- as.Date(paste(format(today, "%m"), "01",
format(today, "%Y"), sep = "/"),
format = date_format)
end <- as.Date(paste(format(six_month, "%m"), "31",
format(six_month, "%Y"), sep = "/"),
format = date_format)
dt[with(dt, Date >= start & Date <= end), ]
# ID Name Date
# 2 2 Adam 2021-03-02
# 3 3 Kate 2020-12-30
# 4 4 Jill 2021-05-15
This is a very simple solution:
library(lubridate)
t <- today() #automatic
t <- as.Date('2020-11-26') # manual (you can change it as you like)
start <- floor_date(t %m-% months(6), unit="months")
end <- floor_date(t %m-% months(1), unit="months")-1
df$Date >= subset(df$Date >= start & df$date <= end)
I want to be able to create a water year column for a time series. The US water year is from Oct-Sept and is considered the year it ends on. For example the 2014 water year is from October 1, 2013 - September 30, 2014.
This is the US water year, but not the only water year. Therefore I want to enter in a start month and have a water year calculated for the date.
For example if my data looks like
date
2008-01-01 00:00:00
2008-02-01 00:00:00
2008-03-01 00:00:00
2008-04-01 00:00:00
.
.
.
2008-12-01 00:00:00
I want my function to work something like:
wtr_yr <- function(data, start_month) {
does stuff
}
Then my output would be
wtr_yr(data, 2)
date wtr_yr
2008-01-01 00:00:00 2008
2008-02-01 00:00:00 2009
2008-03-01 00:00:00 2009
2008-04-01 00:00:00 2009
.
.
.
2009-01-01 00:00:00 2009
2009-02-01 00:00:00 2010
2009-03-01 00:00:00 2010
2009-04-01 00:00:00 2010
I started by breaking the date up into separate columns, but I don't think that is the best way to go about it. Any advice?
Thanks in advance!
We can use POSIXlt to come up with an answer.
wtr_yr <- function(dates, start_month=9) {
# Convert dates into POSIXlt
dates.posix = as.POSIXlt(dates)
# Year offset
offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
# Water year
adj.year = dates.posix$year + 1900 + offset
# Return the water year
adj.year
}
Let's now use this function in an example.
# Sample input vector
dates = c("2008-01-01 00:00:00",
"2008-02-01 00:00:00",
"2008-03-01 00:00:00",
"2008-04-01 00:00:00",
"2009-01-01 00:00:00",
"2009-02-01 00:00:00",
"2009-03-01 00:00:00",
"2009-04-01 00:00:00")
# Display the function output
wtr_yr(dates, 2)
# Combine the input and output vectors in a dataframe
df = data.frame(dates, wtr_yr=wtr_yr(dates, 2))
I had a similar problem a while back but dealing with fiscal years that started in October. I found this function which also computes the quarters within the year. For one part, I only wanted it to output the fiscal year, so I edited a tiny part of the function to do that. There is surely a much cleaner/efficient way of doing it, but this should work for smaller data sets. Here is the edited function:
getYearQuarter <- function(x,
firstMonth=7,
fy.prefix='FY',
quarter.prefix='Q',
sep='-',
level.range=c(min(x), max(x)) ) {
if(level.range[1] > min(x) | level.range[2] < max(x)) {
warning(paste0('The range of x is greater than level.range. Values ',
'outside level.range will be returned as NA.'))
}
quarterString <- function(d) {
year <- as.integer(format(d, format='%Y'))
month <- as.integer(format(d, format='%m'))
y <- ifelse(firstMonth > 1 & month >= firstMonth, year+1, year)
q <- cut( (month - firstMonth) %% 12, breaks=c(-Inf,2,5,8,Inf),
labels=paste0(quarter.prefix, 1:4))
return(paste0(fy.prefix, substring(y,3,4)))
}
vals <- quarterString(x)
levels <- unique(quarterString(seq(
as.Date(format(level.range[1], '%Y-%m-01')),
as.Date(format(level.range[2], '%Y-%m-28')), by='month')))
return(factor(vals, levels=levels, ordered=TRUE))
}
Your input vector should be type Date, and then specify the start month. Assuming you have a data frame(df) with the 'date' column as in your question, this should do the trick.
df$wtr_yr <- getYearQuarter(df$date, firstMonth=10)
You can also achieve adding a column by water year by using the "lfstat" package
https://www.rdocumentation.org/packages/lfstat/versions/0.9.4/topics/water_year