Convert date of birth to age - r

I want to convert date of birth to age using the following code
df$age <- round(as.numeric(Sys.Date()-as.Date(df$DOB),format="%d/%m/%y")/365)
The format of DOB is f.e. 11-10-1969.
In the dataframe I see an age of 2012 (instead of 52).
I really dont know what I've done wrong. Can someone help me?
Thank you in advance!

"11-10-1969" (month day year or day month year) is not an unambiguous date format. To get it properly converted you will need to specify the format argument to as.Date()
Note also that a 4-digit year needs a capital Y in the format string: "%d-%m-%Y" (or "%d/%m/%Y" for /). Sys.Date() is already a Date object, so you don't need the format argument with the /s in it.
> as.numeric(Sys.Date() - as.Date("11-10-1969", format="%d-%m-%Y")) / 365.25
#> [1] 52.56674
EDIT: use 365.25 to approximate leap years per Henry's suggestion in comment

Related

Convert YYYY-YY to Year(date)

I have a data frame with year column as financial year
Year
2001-02
2002-03
2003-04
How can I convert this to as.Date keeping either the whole thing or just the second year i.e 2002,2003,2004. On converting with %Y, I inevitably get 2001-08-08, 2002-08-08, 2003-08-08 etc.
Thanks
library(lubridate)
Year <- c('2001-02', '2002-03', '2003-04')
year(as.Date(gsub('[0-9]{2}-', '', Year), format = '%Y'))
1) ISOdate Clarifying the question, since it refers to yearend and Date we assume that the input is the fiscal Year shown in the question (plus we have added the "1999-00" edge case) as well as the month and day of the yearend. We assume that the output desired is the yearend as a Date object. (If that is not the intended question and you just want the fiscal yearend year as a number then see Note at the end.)
Returning to the assumed problem let us suppose, for example, that March 31st is the yearend. Below we extract the first 4 character of Year using substring, convert that to numeric and add 1. Then we pass that along with month and day to ISODate and finally convert that to Date. No regular expressions or packages are used.
# test inputs
month <- 3
day <- 31
Year <- c("1999-00", "2001-02", "2002-03", "2003-04")
# yearends
as.Date(ISOdate( as.numeric(substring(Year, 1, 4))+1, month, day))
## [1] "2000-03-31" "2002-03-31" "2003-03-31" "2004-03-31"
2) string manipulation An alternative solution using the same inputs is the following. It is similar except that we use sub with a regular expression that matches the minus and following two characters subtituting a zero length string for them, converts to numeric and adds 1. Then it formats a string in a format acceptable to as.Date by using sprintf and finally applies as.Date. No packages are used.
as.Date(sprintf("%d-%d-%d", as.numeric(sub("-..", "", Year))+1, month, day))
## [1] "2000-03-31" "2002-03-31" "2003-03-31" "2004-03-31"
Note: If you only wanted the fiscal yearend year as a number then it would be just this:
as.numeric(substring(Year, 1, 4)) + 1

as.Date produces unexpected result in a sequence of week-based dates

I am working on the transformation of week based dates to month based dates.
When checking my work, I found the following problem in my data which is the result of a simple call to as.Date()
as.Date("2016-50-4", format = "%Y-%U-%u")
as.Date("2016-50-5", format = "%Y-%U-%u")
as.Date("2016-50-6", format = "%Y-%U-%u")
as.Date("2016-50-7", format = "%Y-%U-%u") # this is the problem
The previous code yields correct date for the first 3 lines:
"2016-12-15"
"2016-12-16"
"2016-12-17"
The last line of code however, goes back 1 week:
"2016-12-11"
Can anybody explain what is happening here?
Working with week of the year can become very tricky. You may try to convert the dates using the ISOweek package:
# create date strings in the format given by the OP
wd <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1", "2016-52-7")
# convert to "normal" dates
ISOweek::ISOweek2date(stringr::str_replace(wd, "-", "-W"))
The result
#[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19" "2017-01-01"
is of class Date.
Note that the ISO week-based date format is yyyy-Www-d with a capital W preceeding the week number. This is required to distinguish it from the standard month-based date format yyyy-mm-dd.
So, in order to convert the date strings provided by the OP using ISOweek2date() it is necessary to insert a W after the first hyphen which is accomplished by replacing the first - by -W in each string.
Also note that ISO weeks start on Monday and the days of the week are numbered 1 to 7. The year which belongs to an ISO week may differ from the calendar year. This can be seen from the sample dates above where the week-based date 2016-W52-7 is converted to 2017-01-01.
About the ISOweek package
Back in 2011, the %G, %g, %u, and %V format specifications weren't available to strptime() in the Windows version of R. This was annoying as I had to prepare weekly reports including week-on-week comparisons. I spent hours to find a solution for dealing with ISO weeks, ISO weekdays, and ISO years. Finally, I ended up creating the ISOweek package and publishing it on CRAN. Today, the package still has its merits as the aforementioned formats are ignored on input (see ?strptime for details).
As #lmo said in the comments, %u stands for the weekdays as a decimal number (1–7, with Monday as 1) and %U stands for the week of the year as decimal number (00–53) using Sunday as the first day. Thus, as.Date("2016-50-7", format = "%Y-%U-%u") will result in "2016-12-11".
However, if that should give "2016-12-18", then you should use a week format that has also Monday as starting day. According to the documentation of ?strptime you would expect that the format "%Y-%V-%u" thus gives the correct output, where %V stands for the week of the year as decimal number (01–53) with monday as the first day.
Unfortunately, it doesn't:
> as.Date("2016-50-7", format = "%Y-%V-%u")
[1] "2016-01-18"
However, at the end of the explanation of %V it sais "Accepted but ignored on input" meaning that it won't work.
You can circumvent this behavior as follows to get the correct dates:
# create a vector of dates
d <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1")
# convert to the correct dates
as.Date(paste0(substr(d,1,8), as.integer(substring(d,9))-1), "%Y-%U-%w") + 1
which gives:
[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19"
The issue is because for %u, 1 is Monday and 7 is Sunday of the week. The problem is further complicated by the fact that %U assumes week begins on Sunday.
For the given input and expected behavior of format = "%Y-%U-%u", the output of line 4 is consistent with the output of previous 3 lines.
That is, if you want to use format = "%Y-%U-%u", you should pre-process your input. In this case, the fourth line would have to be as.Date("2016-51-7", format = "%Y-%U-%u") as revealed by
format(as.Date("2016-12-18"), "%Y-%U-%u")
# "2016-51-7"
Instead, you are currently passing "2016-50-7".
Better way of doing it might be to use the approach suggested in Uwe Block's answer. Since you are happy with "2016-50-4" being transformed to "2016-12-15", I suspect in your raw data, Monday is counted as 1 too. You could also create a custom function that changes the value of %U to count the week number as if week begins on Monday so that the output is as you expected.
#Function to change value of %U so that the week begins on Monday
pre_process = function(x, delim = "-"){
y = unlist(strsplit(x,delim))
# If the last day of the year is 7 (Sunday for %u),
# add 1 to the week to make it the week 00 of the next year
# I think there might be a better solution for this
if (y[2] == "53" & y[3] == "7"){
x = paste(as.integer(y[1])+1,"00",y[3],sep = delim)
} else if (y[3] == "7"){
# If the day is 7 (Sunday for %u), add 1 to the week
x = paste(y[1],as.integer(y[2])+1,y[3],sep = delim)
}
return(x)
}
And usage would be
as.Date(pre_process("2016-50-7"), format = "%Y-%U-%u")
# [1] "2016-12-18"
I'm not quite sure how to handle when the year ends on a Sunday.

How to convert ordinal date day-month-year format using R

I have log files where the date is mentioned in the ordinal date format.
wikipedia page for ordinal date
i.e 14273 implies 273'rd day of 2014 so 14273 is 30-Sep-2014.
is there a function in R to convert ordinal date (14273) to (30-Sep-2014).
Tried the date package but didn come across a function that would do this.
Try as.Date with the indicated format:
as.Date(sprintf("%05d", 14273), format = "%y%j")
## [1] "2014-09-30"
Notes
For more information see ?strptime [link]
The 273 part is sometimes referred to as the day of the year (as opposed to the day of the month) or the day number or the julian day relative to the beginning of the year.
If the input were a character string of the form yyjjj (rather than numeric) then as.Date(x, format = "%y%j") will do.
Update Have updated to also handle years with one digit as per comments.
Data example
x<-as.character(c("14273", "09001", "07031", "01033"))
Data conversion
x1<-substr(x, start=0, stop=2)
x2<-substr(x, start=3, stop=5)
x3<-format(strptime(x2, format="%j"), format="%m-%d")
date<-as.Date(paste(x3, x1, sep="-"), format="%m-%d-%y")
You can use lubridate package as follows:
>library(lubridate)
# Create a template date object
>date <- as.POSIXlt("2009-02-10")
# Update the date using
> update(date, year=2014, yday=273)
[1] "2014-09-30 JST"

From MMDD to day of the year in R

I have this .txt file:
http://pastebin.com/raw.php?i=0fdswDxF
First column (Date) shows date in month/day
So 0601 is the 1st of June
When I load this into R and I show the data, it removes the first 0 in the data.
So when loaded it looks like:
601
602
etc
For 1st of June, 2nd of June
For the months 10,11,12, it remains unchanged.
How do I change it back to 0601 etc.?
What I am trying to do is to change these days into the day of the year, for instance,
1st of January (0101) would be 1, and 31st of December would be 365.
There is no leap year to be considered.
I have the code to change this, if my data was shown as 0601 etc, but not as 601 etc.
copperNew$Date = as.numeric(as.POSIXct(strptime(paste0("2013",copperNew$Date), format="%Y%m%d")) -
as.POSIXct("2012-12-31"), units = "days")
Where Date of course is from the file linked above.
Please ask if you do not consider the description to be good enough.
You can use colClasses in the read.table function, then convert to POSIXlt and extract the year date. You are over complicating the process.
copperNew <- read.table("http://pastebin.com/raw.php?i=0fdswDxF", header=TRUE,
colClasses=c("character", "integer", rep("numeric", 3)))
tmp <- as.POSIXlt( copperNew$Date, format='%m%d' )
copperNew$Yday <- tmp$yday
The as.POSIXct function is able to parse a string without a year (assumes the current year) and computes the day of the year for you.
d<-as.Date("0201", format = "%m%d")
strftime(d, format="%j")
#[1] "032"
First you parse your string and obtain Date object which represents your date (notice that it will add current year, so if you want to count days for some specific year add it to your string: as.Date("1988-0201", format = "%Y-%m%d")).
Function strftime will convert your Date to POSIXlt object and return day of year. If you want the result to be a numeric value, you can do it like this: as.numeric(strftime(d, format = "%j"))(Thanks Gavin Simpson)
Convert it to POSIXlt using a year that is not a leap-year, then access the yday element and add 1 (because yday is 0 on January 1st).
strptime(paste0("2011","0201"),"%Y%m%d")$yday+1
# [1] 32
From start-to-finish:
x <- read.table("http://pastebin.com/raw.php?i=0fdswDxF",
colClasses=c("character",rep("numeric",5)), header=TRUE)
x$Date <- strptime(paste0("2011",x$Date),"%Y%m%d")$yday+1
In which language?
If it's something like C#, Java or Javascript, I'd follow these steps:
1-) parse a pair of integers from that column;
2-) create a datetime variable whose day and month are taken from the integers from step one. Set the year to some fixed value, or to the current year.
3-) create another datetime variable, whose date is the 1st of February of the same year as the one in step 2.
The number of the day is the difference in days between the datetime variables, + 1 day.
This one worked for me:
copperNew <- read.table("http://pastebin.com/raw.php?i=0fdswDxF",
header=TRUE, sep=" ", colClasses=c("character",
"integer",
rep("numeric", 3)))
copperNew$diff = difftime(as.POSIXct(strptime(paste0("2013",dat$Date),
format="%Y%m%d", tz="GMT")),
as.POSIXct("2012-12-31", tz="GMT"), units="days")
I had to specify the timezone (tz argument in as.POSIXct), otherwise I got two different timezones for the vectors I am subtracting and therefore non-integer days.

Bucketing data into weekly, bi-weekly, monthly and quarterly data in R

I have a data frame with two columns. Date, Gender
I want to change the Date column to the start of the week for that observation. For example if Jun-28-2011 is a Tuesday, I'd like to change it to Jun-27-2011. Basically I want to re-label Date fields such that two data points that are in the same week have the same Date.
I also want to be able to do it by-weekly, or monthly and specially quarterly.
Update:
Let's use this as a dataset.
datset <- data.frame(date = as.Date("2011-06-28")+c(1:100))
One slick way to do this that I just learned recently is to use the lubridate package:
library(lubridate)
datset <- data.frame(date = as.Date("2011-06-28")+c(1:100))
#Add 1, since floor_date appears to round down to Sundays
floor_date(datset$date,"week") + 1
I'm not sure about how to do bi-weekly binning, but monthly and quarterly are easily handled with the respective base functions:
quarters(datset$date)
months(datset$date)
EDIT: Interestingly, floor_date from lubridate does not appear to be able to round down to the nearest quarter, but the function of the same name in ggplot2 does.
Look at ?strftime. In particular, the following formats:
%b: Abbreviated month name in the
current locale. (Also matches full
name on input.)
%B: Full month name
in the current locale. (Also matches
abbreviated name on input.)
%m: Month as decimal number (01–12).
%W: Week of the year as decimal number
(00–53) using Monday as the first day
of week (and typically with the first
Monday of the year as day 1 of week
1). The UK convention.
eg:
> strftime("2011-07-28","Month: %B, Week: %W")
[1] "Month: July, Week: 30"
> paste("Quarter:",ceiling(as.integer(strftime("2011-07-28","%m"))/3))
[1] "Quarter: 3"

Resources