Extract year from date - r

How can I remove the first elements from a variable, especially if this variable has a special characters. For example, I have the following column:
Date
01/01/2009
01/01/2010
01/01/2011
01/01/2012
I need to have a new column like the following:
Date
2009
2010
2011
2012

As discussed in the comments, this can be achieved by converting the entry into Date format and extracting the year, for instance like this:
format(as.Date(df1$Date, format="%d/%m/%Y"),"%Y")

library(lubridate)
a=mdy(b)
year(a)
https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
http://vita.had.co.nz/papers/lubridate.pdf

When you convert your variable to Date:
date <- as.Date('10/30/2018','%m/%d/%Y')
you can then cut out the elements you want and make new variables, like year:
year <- as.numeric(format(date,'%Y'))
or month:
month <- as.numeric(format(date,'%m'))

if all your dates are the same width, you can put the dates in a vector and use substring
Date
a <- c("01/01/2009", "01/01/2010" , "01/01/2011")
substring(a,7,10) #This takes string and only keeps the characters beginning in position 7 to position 10
output
[1] "2009" "2010" "2011"

This is more advice than a specific answer, but my suggestion is to convert dates to date variables immediately, rather than keeping them as strings. This way you can use date (and time) functions on them, rather than trying to use very troublesome workarounds.
As pointed out, the lubridate package has nice extraction functions.
For some projects, I have found that piecing dates out from the start is helpful:
create year, month, day (of month) and day (of week) variables to start with.
This can simplify summaries, tables and graphs, because the extraction code is separate from the summary/table/graph code, and because if you need to change it, you don't have to roll out those changes in multiple spots.

If you are using the date package, this can be done fairly easily.
library(date)
Date <- c("01/01/2009", "01/01/2010", "01/01/2011", "01/01/2012")
Date <- as.date(Date)
Date
# [1] 1Jan2009 1Jan2010 1Jan2011 1Jan2012
date.mdy(Date)$year
# [1] 2009 2010 2011 2012
## be aware that these are now integers and thus different methods may be invoked:
str(date.mdy(Date)$year)
# int [1:4] 2009 2010 2011 2012
summary(Date)
# First Last
# "1Jan2009" "1Jan2012"
summary(date.mdy(Date)$year)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 2009 2010 2010 2010 2011 2012

For some time now, you can also only rely on the data.table package and its IDate class plus associated functions (Check ?as.IDate()).
require(data.table)
a <- c("01/01/2009", "01/01/2010" , "01/01/2011")
year(as.IDate(a, '%d/%m/%Y')) # all data.table functions

Related

Character 2 digit year conversion to year only

Using R
Got large clinical health data set to play with, but dates are weird
Most problematic is 2digityear/halfyear, as in 98/2, meaning at some point in 1998 after July 1
I have split the column up into 2 character columns, e.g. 98 and 2 but now need to convert the 2 digit year character string into an actual year.
I tried as.Date(data$variable,format="%Y") but not only did I get a conversion to 0098 as the year rather than 1998, I also got todays month and year arbitrarily added (the actual data has no month or day).
as in 0098-06-11
How do I get just 1998 instead?
Not elegant. But using combination of lubridate and as.Date you can get that.
library(lubridate)
data <- data.frame(variable = c(95, 96, 97,98,99), date=c(1,2,3,4,5))
data$variableUpdated <- year(as.Date(as.character(data$variable), format="%y"))
and only with base R
data$variableUpdated <- format(as.Date(as.character(data$variable), format="%y"),"%Y")

R: Creating two date variables from a complete date

I have date recorded as: Month/Day/Year or MM/DD/YYYY
I would like to write code that creates two new variables from that information.
I would like a year variable alone
I would like to create a quarter variable
The Quarter Variables would not be influenced by year. I would want this variable to apply to all years.
Quarter 1 would be January 1 - March 31
Quarter 2 would be April 1 - June 30
Quarter 3 would be July 1 - September 30
Quarter 4 would be October 1 - December 31
Any assistance would be greatly appreciated. I cannot seem to get the nuance of how to do these functions in R.
Thanks,
Jared
Assuming that the date variable is of class POSIX** you could do:
#example date
date <- as.POSIXlt( "05/12/2015", format='%m/%d/%Y')
In order to return the year from a date data.table has already a function to do it and that is year:
library(data.table)
> year(date)
[1] 2015
As for the quarter it can easily be created from the function below (uses data.table::month that returns the number of a month):
quarter <- function(x) {
rep(c('quarter 1','quarter 2','quarter 3','quarter 4'), each=3)[month(x)]
}
> quarter(date)
[1] "quarter 2"
Using only the base packages:
Try formatting your dates with the strptime fxn, so that all dates are now in the Year-Month-Day format. This format constrains the each element of the date to be the same character length and in the same position. Look at the strptime documentation for the appropriate formatting argument.
date.vec<-c(1/1/1999,2/2/1999)
fmt.date.vec<-strptime(date.vec, "%m/%d/%Y")
With the dates in this format it is easy to extract the year, month, and day using the substring function
Year<-substring(fmt.date.vec,1,4)
Month<-substring(fmt.date.vec,6,7)
Day<-substring(fmt.date.vec,9,10)
With this information you can now generate your Quarter vector any number of ways. For example if a data.frame "df" has a Month column:
df$Quarter<-"Quarter_1"
df[df$Month %in% c("04","05","06"),]$Quarter<-"Quarter_2"
df[df$Month %in% c("07","08","09"),]$Quarter<-"Quarter_3"
df[df$Month %in% c("10","11","12"),]$Quarter<-"Quarter_4"

From MMDD to day of the year in R

I have this .txt file:
http://pastebin.com/raw.php?i=0fdswDxF
First column (Date) shows date in month/day
So 0601 is the 1st of June
When I load this into R and I show the data, it removes the first 0 in the data.
So when loaded it looks like:
601
602
etc
For 1st of June, 2nd of June
For the months 10,11,12, it remains unchanged.
How do I change it back to 0601 etc.?
What I am trying to do is to change these days into the day of the year, for instance,
1st of January (0101) would be 1, and 31st of December would be 365.
There is no leap year to be considered.
I have the code to change this, if my data was shown as 0601 etc, but not as 601 etc.
copperNew$Date = as.numeric(as.POSIXct(strptime(paste0("2013",copperNew$Date), format="%Y%m%d")) -
as.POSIXct("2012-12-31"), units = "days")
Where Date of course is from the file linked above.
Please ask if you do not consider the description to be good enough.
You can use colClasses in the read.table function, then convert to POSIXlt and extract the year date. You are over complicating the process.
copperNew <- read.table("http://pastebin.com/raw.php?i=0fdswDxF", header=TRUE,
colClasses=c("character", "integer", rep("numeric", 3)))
tmp <- as.POSIXlt( copperNew$Date, format='%m%d' )
copperNew$Yday <- tmp$yday
The as.POSIXct function is able to parse a string without a year (assumes the current year) and computes the day of the year for you.
d<-as.Date("0201", format = "%m%d")
strftime(d, format="%j")
#[1] "032"
First you parse your string and obtain Date object which represents your date (notice that it will add current year, so if you want to count days for some specific year add it to your string: as.Date("1988-0201", format = "%Y-%m%d")).
Function strftime will convert your Date to POSIXlt object and return day of year. If you want the result to be a numeric value, you can do it like this: as.numeric(strftime(d, format = "%j"))(Thanks Gavin Simpson)
Convert it to POSIXlt using a year that is not a leap-year, then access the yday element and add 1 (because yday is 0 on January 1st).
strptime(paste0("2011","0201"),"%Y%m%d")$yday+1
# [1] 32
From start-to-finish:
x <- read.table("http://pastebin.com/raw.php?i=0fdswDxF",
colClasses=c("character",rep("numeric",5)), header=TRUE)
x$Date <- strptime(paste0("2011",x$Date),"%Y%m%d")$yday+1
In which language?
If it's something like C#, Java or Javascript, I'd follow these steps:
1-) parse a pair of integers from that column;
2-) create a datetime variable whose day and month are taken from the integers from step one. Set the year to some fixed value, or to the current year.
3-) create another datetime variable, whose date is the 1st of February of the same year as the one in step 2.
The number of the day is the difference in days between the datetime variables, + 1 day.
This one worked for me:
copperNew <- read.table("http://pastebin.com/raw.php?i=0fdswDxF",
header=TRUE, sep=" ", colClasses=c("character",
"integer",
rep("numeric", 3)))
copperNew$diff = difftime(as.POSIXct(strptime(paste0("2013",dat$Date),
format="%Y%m%d", tz="GMT")),
as.POSIXct("2012-12-31", tz="GMT"), units="days")
I had to specify the timezone (tz argument in as.POSIXct), otherwise I got two different timezones for the vectors I am subtracting and therefore non-integer days.

Calculating days per month between interval of two dates

I have a set of events that each have a start and end date, but they take place over the scope of a number of months. I would like to create a table that shows the number of days in each month for this event.
I have the following example.
event_start_date <- as.Date("23/10/2012", "%d/%m/%Y")
event_end_date <- as.Date("07/02/2013", "%d/%m/%Y")
I would expect to get a table out as the following:
Oct-12 8
Nov-12 30
Dec-12 31
Jan-13 31
Feb-13 7
Does anybody know about a smart and elegant way of doing this or is creating a system of loops the only viable method?
Jochem
This is not necessarily efficient because it creates a sequence of days, but it does the job:
> library(zoo)
> table(as.yearmon(seq(event_start_date, event_end_date, "day")))
Oct 2012 Nov 2012 Dec 2012 Jan 2013 Feb 2013
9 30 31 31 7
If your time span is so large than this method is slow, you'll have to create a sequence of firsts of the months between your two (truncated) dates, take the diff, and do a little extra work for the end points.
As DjSol already pointed out in his comment, you can just subtract two dates to get the number of days:
event_start_date <- as.Date("23/10/2012", "%d/%m/%Y")
event_end_date <- as.Date("07/02/2013", "%d/%m/%Y")
as.numeric(event_end_date - event_start_date)
Is that what you want? I have the feeling that you might have more of a problem to get the start and end date in such a format so you can easily subtract them because you mention a loop. If so, however, I guess we need more details on how your actual data looks.

Format dates to Month-Year while keeping class Date

I feel like there's a pretty simple way to do this, but I'm not finding it easily...
I am working with R to extract data from a dataset and them summarize it by a number of different characteristics. One of them is the month in which an event is scheduled / has occurred. We have the exact date of the event in the database, something like this:
person_id date_visit
1 2012-05-03
2 2012-08-13
3 2012-12-12
...
I would like to use the table() function to generate a summary table that would look something like this:
Month Freq
Jan 12 1
Feb 12 2
Mar 12 1
Apr 12 3
...
My issue is this. I've read the data in and used as.Date() to convert character strings to dates. I can use format.Date() to get the dates formatted as Jan 12, Mar 12, etc. But when you use format.Date(), you end up with character strings again. This means when you apply table() to them, they come out in alphabetical order (my current set is Aug 12, Jul 12, Jun 12, Mar 12, and so forth).
I know that in SAS, you could use a format to change the appearance of a date, while preserving it as a date (so you could still do date operators on it). Can the same thing be done using R?
My plan is to build a nice data frame through a number of steps, and then (after making sure that all the dates are converted to strings, for compatibility reasons) use xtable() to make a nice LaTeX output.
Here's my code at present.
load("temp.RData")
ds$date_visit <- as.Date(ds$date_visit,format="%Y-%m-%d")
table(format.Date(safebeat_recruiting$date_baseline,format="%b %Y"))
ETA: I'd prefer to just do it in Base R if I can, but if I have to I can always use an additional package.
You could use the yearmon class from the zoo package
require("zoo")
ds <- data.frame(person_id=1:3, date_visit=c("2012-05-03", "2012-08-13", "2012-12-12"))
ds$date_visit <- as.yearmon(ds$date_visit)
ds
person_id date_visit
1 1 May 2012
2 2 Aug 2012
3 3 Dec 2012
month.abb is a constant vector in R and can be used to sort on the first three letter of the string of names for the table.
ds <- data.frame(person_id=1:3, date_visit=as.Date(c("2012-05-03", "2012-08-13", "2012-12-12")))
table(format( ds$date_visit, format="%b %Y"))
tbl <- table(format( ds$date_visit, format="%b %Y"))
tbl[order( match(substr(names(tbl), 1,3), month.abb) )]
May 2012 Aug 2012 Dec 2012
1 1 1
With additional years you would see the "May"s all together so this would be needed:
tbl[order( substr(names(tbl), 5,8), match(substr(names(tbl), 1,3), month.abb) )]

Resources