extracting "yearmon" from character format and calculating age in R - r

I have two columns of data:
DoB: yyyy/mm
Reported date: yyyy/mm/dd
Both are in character format.
I'd like to calculate an age, by subtracting DoB from Reported Date, without adding a fictional day to the DoB, so that the age comes out as 28.5 (meaning 28 and a half years old).
Please can someone help me with the coding, I'm struggling!
Many thanks from an R newbie.

library(lubridate)
a <- "2010/02"
b <- "2014/12/25"
c <- ymd(b) - ymd(paste0(a, "/01")) # I don't think this can be done without adding a fictional day
c <- as(c/365.25, "numeric")
What would you want the age to be if the dates are:
DoB: 2015/01
Reported date: 2015/01/30

As suggested, lubridate is a great package for working with dates. You probably want some version using difftime. You also can still use ymd for the yyyy/mm by setting truncated=1 meaning the field can be missing.
df <- data.frame(DoB = c("1987/08", "1994/04"),
Report_Date = c("2015/03/05","2014/07/04"))
library(lubridate)
df$age_years <- with(df,
as.numeric(
difftime(ymd(Report_Date),
ymd(DoB, truncated=1)
)/365.25))
df
DoB Report_Date age_years
1 1987/08 2015/03/05 27.59206023
2 1994/04 2014/07/04 20.25735797
Unfortunately difftime doesn't have a 'years' unit so you also will need to divide the 'days' output that you get back.

Use the "yearmon" class in zoo. It represents time as years + fraction (where fraction is in the set 0, 1/12, ..., 11/12) and so does not require that fictitious days be added:
library(zoo)
as.yearmon("2012/01/10", "%Y/%m/%d") - as.yearmon("1983/07", "%Y/%m")
giving:
[1] 28.5

Related

R function for finding difference of dates with exceptions?

I was wondering if there was a function for finding the difference between and issue date and a maturity date, but with 2 maturity date. For example, I want to prioritize the dates in maturity date source 1 and subtract it from the issue date to find the difference. Then, if my dataset is missing dates from maturity date source 1, such as in lines 5 & 6, I want to use dates from maturity date source 2 to fill in the rest. I have tried the code below, but am unsure how to incorporate the data from maturity date source 2 without changing everything else. I have attached a picture for reference. Thank you in advance.
df$Maturity_Date_source_1 <- as.Date(c(df$Maturity_Date_source_1))
df$Issue_Date <- as.Date(c(df$Issue_Date))
df$difference <- (df$Maturity_Date_source_1 - df$Issue_Date) / 365.25
df$difference <- as.numeric(c(df$difference))
An option would be to coalesce the columns and then do the difference
library(dplyr)
df %>%
mutate(difference = as.numeric((coalesce(Maturity_Date_source_1,
Maturity_Date_source_2) - Issue_Date)/365.25))

How do I subtract Date column given as a character in R?

I want to add a column which is a subtraction of Store_Entry_Time from Store_Exit_Time.
For example the result for row 1 should be (2014-12-02 18:49:05.402863 - 2014-12-02 16:56:32.394052) = 1 hour 53 minutes approximately.( I want this result in just hours).
I entered class(Store_Entry_Time) and it says "character".
How do I obtain the subtracting and put it into new column as "Time Spent"?
You can use ymd_hms from lubridate to convert the column into POSIXct format and then use difftime to caluclate the difference in time.
library(dplyr)
df <- df %>%
mutate(across(c(Store_Entry_Time, Store_Exit_Time), lubridate::ymd_hms),
Time_Spent = as.numeric(difftime(Store_Exit_Time,
Store_Entry_Time, units = 'hours')))
For a base R option here, we can try using as.POSIXct:
df$Time_Spent <- as.numeric(as.POSIXct(df$Store_Exit_Time) -
as.POSIXct(df$Store_Entry_Time)
The above column would give the difference in time, measured in hours.
Example:
Store_Exit_Time <- "2014-12-02 18:49:05.402863"
Store_Entry_Time <- "2014-12-02 16:56:32.394052"
Time_Spent <- as.numeric(as.POSIXct(Store_Exit_Time) - as.POSIXct(Store_Entry_Time))
Time_Spent
[1] 1.875836

Subset a dataframe based on numerical values of a string inside a variable

I have a data frame which is a time series of meteorological measurement with monthly resolution from 1961 till 2018. I am interested in the variable that measures the monthly average temperature since I need the multi-annual average temperature for the summers.
To do this I must filter from the "DateVaraible" column the fifth and sixth digit, which are the month.
The values in time column are formatted like this
"19610701". So I need the 07(Juli) after 1961.
I start coding for 1 month for other purposes, so I did not try anything worth to mention. I guess that .grepl could do the work, but I do not know how the "matching" operator works.
So I started with this code that works.
summersmonth<- Df[DateVariable %like% "19610101" I DateVariable %like% "19610201"]
I am expecting a code like this
summermonths <- Df[DateVariable %like% "**06**" I DateVariable%like% "**07**..]
So that all entries with month digit from 06 to 09 are saved in the new dataframe summermonths.
Thanks in advance for any reply or feedback regarding my question.
Update
Thank to your answers I got the first part, which is to convert the variable in a as.date with the format "month"(Class=char)
Now I need to select months from Juni to September .
A horrible way to get the result I wanted is to do several subset and a rbind afterward.
Sommer1<-subset(Df, MonthVar == "Mai")
Sommer2<-subset(Df, MonthVar == "Juli")
Sommer3<-subset(Df, MonthVar == "September")
SummerTotal<-rbind(Sommer1,Sommer2,Sommer3)
I would be very glad to see this written in a tidy way.
Update 2 - Solution
Here is the tidy way, as here Using multiple criteria in subset function and logical operators
Veg_Seas<-subset(Df, subset = MonthVar %in% c("Mai","Juni","Juli","August","September"))
You can convert your date variable as date (format) and take the month:
allmonths <- month(as.Date(Df$DateVariable, format="%Y%m%d"))
Note that of your column has been originally imported as factor you need to convert it to character first:
allmonths <- month(as.Date(as.character(Df$DateVariable), format="%Y%m%d"))
Then you can check whether it is a summermonth:
summersmonth <- Df[allmonths %in% 6:9, ]
Example:
as.Date("20190702", format="%Y%m%d")
[1] "2019-07-02"
month(as.Date("20190702", format="%Y%m%d"))
[1] 7
We can use anydate from anytime to convert to Date class and then extract the month
library(anytime)
month(anydate(as.character(Df$DateVariable)))

Converting variables in form of "2015M01" to date format in R?

I have a date frame df that simply looks like this:
month values
2012M01 99904
2012M02 99616
2012M03 99530
2012M04 99500
2012M05 99380
2012M06 99103
2013M01 98533
2013M02 97600
2013M03 96431
2013M04 95369
2013M05 94527
2013M06 93783
with month that was written in form of "M01", "M02"... and so on.
Now I want to convert this column to date format, is there a way to do it in R with lubridate?
I also want to select columns that contain one certain month from each year, like only March columns from all these years, what is the best way to do it?
The short answer is that dates require a year, month and day, so you cannot convert directly to a date format. You have 2 options.
Option 1: convert to a year-month format using zoo::as.yearmon.
library(zoo)
df$yearmon <- as.yearmon(df$month, "%YM%m")
# you can get e.g. month from that
months(df$yearmon[1])
# [1] "January"
Option 2: convert to a date by assuming that the day is always the first day of the month.
df$date <- as.Date(paste(df$month, "01", sep = "-"), "%YM%m-%d")
For selection (and I think you mean select rows, not columns), you already have everything you need. For example, to select only March 2013:
library(dplyr)
df %>% filter(month == "2013M03")
Something like this will get it:
raw <- "2012M01"
dt <- strptime(raw,format = "%YM%m")
dt will be in a Posix format. The strptime function will assign a '1' as the default day of month to make it a complete date.

How to find decimal representation of years in R?

Since I need reasonably accurate representations of years in decimal format (~ 4-5 digits of accuracy would work) I turned to the lubridate package. This is what I have tried:
refDate <- as.Date("2016-01-10")
endDate <- as.Date("2020-12-31")
daysInLeapYear <- 366
daysInRegYear <- 365
leapYearFractStart <- 0
leapYearRegStart <- 0
daysInterval <- as.interval(difftime(endDate, refDate, unit = "d"), start = refDate)
periodObject <- as.period(daysInterval)
if(leap_year(refDate)) {
leapYearFractStart <- (as.numeric(days_in_month(refDate))-as.numeric(format(refDate, "%d")))/daysInLeapYear
}
if(!leap_year(refDate)) {
leapYearRegStart <- (as.numeric(days_in_month(refDate))-as.numeric(format(refDate, "%d")))/daysInRegYear
}
returnData <- periodObject#year+(periodObject#month/12)+leapYearFractStart+leapYearRegStart
It is safe to assume that the end date is always at the end of a month, hence no leap year check at the end. Relying on lubridate for proper year/month counting I am adjusting for leap-years only for the start date.
I recon this gets me to within 3 digits of accuracy only! In addition, it looks a bit crude.
Is there a more complete and accurate procedure to determine decimal representation of years in an interval?
It's very unclear what you're trying to do exactly here, which makes accuracy difficult to talk about.
lubridate has a function decimal_date which turns dates into decimals. But since 3 decimal places gives you 1000 possible positions within a year, when we only have 365/366 days, there are between 2 and 3 viable values that fall within a day. Accuracy depends on when in the day you want the result to fall.
> decimal_date(as.POSIXlt("2016-01-10 00:00:01"))
[1] 2016.025
> decimal_date(as.POSIXlt("2016-01-10 12:00:00"))
[1] 2016.026
> decimal_date(as.POSIXlt("2016-01-10 23:59:59"))
[1] 2016.027
In other words, going beyond 3 decimal places is only really important if you're interested in the time of day.
This solution uses only base R. We get the beginning of the year using cut(..., "year") and the number of days in the year by differencing it with the beginning of the next year obtained using cut(..., "year") on an arbitrary date in the following year. Finally use those quantities to get the fraction and add it to the year.
d <- as.Date(c("2015-01-31", "2016-01-01", "2016-01-10", "2016-12-31")) # sample input
year_begin <- as.Date(cut(d, "year"))
days_in_year <- as.numeric( as.Date(cut(year_begin + 366, "year")) - year_begin )
as.numeric(format(d, "%Y")) + as.numeric(d - year_begin) / days_in_year
## [1] 2015.082 2016.000 2016.025 2016.997
Alternately, using as.POSIXlt this variation crams it into one line:
with(unclass(as.POSIXlt(d)),1900+year+yday/as.numeric(as.Date(cut(d-yday+366,"y"))-d+yday))
## [1] 2015.082 2016.000 2016.025 2016.997

Resources