Character 2 digit year conversion to year only - r

Using R
Got large clinical health data set to play with, but dates are weird
Most problematic is 2digityear/halfyear, as in 98/2, meaning at some point in 1998 after July 1
I have split the column up into 2 character columns, e.g. 98 and 2 but now need to convert the 2 digit year character string into an actual year.
I tried as.Date(data$variable,format="%Y") but not only did I get a conversion to 0098 as the year rather than 1998, I also got todays month and year arbitrarily added (the actual data has no month or day).
as in 0098-06-11
How do I get just 1998 instead?

Not elegant. But using combination of lubridate and as.Date you can get that.
library(lubridate)
data <- data.frame(variable = c(95, 96, 97,98,99), date=c(1,2,3,4,5))
data$variableUpdated <- year(as.Date(as.character(data$variable), format="%y"))
and only with base R
data$variableUpdated <- format(as.Date(as.character(data$variable), format="%y"),"%Y")

Related

Using lubridate with multiple date formats

I have a column of dates that was stored in the format 8/7/2001, 10/21/1990, etc. Two values are just four-digit years. I converted the entire column to class Date using the following code.
lubridate::parse_date_time(eventDate, orders = c('mdy', 'Y'))
It works great, except the values that were just years are converted to yyyy-01-01 and I want them to just be yyyy. Is there a way to keep lubridate from adding on any information that wasn't already there?
Edit: Code to create data frame
id = (1:5)
eventDate = c("10/7/2001", "1989", NA, "5/5/2016", "9/18/2011")
df <- data.frame(id, eventDate)
I do not think is possible to convert your values to Dates, and keep the "yyyy" values intact. And by transforming your "yyyy" values into "yyyy-01-01" the lubridate is doing the right thing. Because dates have order, and if you have other values in your column that have days and months defined, all the other values needs to have these components too.
For example. If I produce the data.frame below. If I ask R, to order the table, according to the date column, the date in the first line ("2020"), comes before the value in the second row ("2020-02-28")? Or comes after it? The value "2020" being the year of 2020, it can actually means every possible day in this year, so how R should treate it? By adding the first day of the year, lubridate is defining these components, and avoiding that R get confused by it.
dates <- c("2020", "2020-02-28", "2020-02-20", "2020-01-10", "2020-05-12")
id <- 1:5
df <- data.frame(
id,
dates
)
id dates
1 1 2020
2 2 2020-02-28
3 3 2020-02-20
4 4 2020-01-10
5 5 2020-05-12
So if you want to mantain the "yyyy" intact, is very likely that they should not rest in your eventDate column, with other values that are in a different structure ("dd/mm/yyyy"). Now if is really necessary to mantain these values intact, I think is best, to keep the values of eventDate column as characters, and store these values as Dates in another column, like this:
df$as_dates <- lubridate::parse_date_time(df$eventDate, orders = c('mdy', 'Y'))
id eventDate as_dates
1 1 10/7/2001 2001-10-07
2 2 1989 1989-01-01
3 3 <NA> <NA>
4 4 5/5/2016 2016-05-05
5 5 9/18/2011 2011-09-18

Year column to time series [duplicate]

This question already has answers here:
Convert four digit year values to class Date
(5 answers)
Closed 5 years ago.
OK, this should be really simple but I'm not 'getting it.' I have a data frame with a column "Year" that I want to convert to a time series, but the format is tripping me up. How do I convert the "Year" value to a date, with the actual date being the end of each respective year (e.g. 2015 -> December 31st 2015)?
Year Production
1 1900 38400000
2 1901 43400000
3 1902 49000000
4 1903 44100000
5 1904 49800000
Goal is to get this to a time series data frame. (e.g. xts)
It is not quite the same as a previous question that converted a vector of years to dates. "Convert four digit year values to date type". Goal is to index the data by date, converting it to xts or similar object.
Edited:
This was the final solution:
df <- xts(x = df_original, order.by = as.Date(paste0(df_original[,1], "-12-31")))
whereby the "[,1]" indicates the first column of the original data frame.
If you want each full date to be 31 December, you could use paste along with as.Date to cast to a date:
df$date <- as.Date(paste0(df$Year, "-12-31"))
In addition to Tim Biegeleisen's answer, I will just add another way
df$final_date <- as.Date(ISOdate(df$Year, 12, 31))

R: Creating two date variables from a complete date

I have date recorded as: Month/Day/Year or MM/DD/YYYY
I would like to write code that creates two new variables from that information.
I would like a year variable alone
I would like to create a quarter variable
The Quarter Variables would not be influenced by year. I would want this variable to apply to all years.
Quarter 1 would be January 1 - March 31
Quarter 2 would be April 1 - June 30
Quarter 3 would be July 1 - September 30
Quarter 4 would be October 1 - December 31
Any assistance would be greatly appreciated. I cannot seem to get the nuance of how to do these functions in R.
Thanks,
Jared
Assuming that the date variable is of class POSIX** you could do:
#example date
date <- as.POSIXlt( "05/12/2015", format='%m/%d/%Y')
In order to return the year from a date data.table has already a function to do it and that is year:
library(data.table)
> year(date)
[1] 2015
As for the quarter it can easily be created from the function below (uses data.table::month that returns the number of a month):
quarter <- function(x) {
rep(c('quarter 1','quarter 2','quarter 3','quarter 4'), each=3)[month(x)]
}
> quarter(date)
[1] "quarter 2"
Using only the base packages:
Try formatting your dates with the strptime fxn, so that all dates are now in the Year-Month-Day format. This format constrains the each element of the date to be the same character length and in the same position. Look at the strptime documentation for the appropriate formatting argument.
date.vec<-c(1/1/1999,2/2/1999)
fmt.date.vec<-strptime(date.vec, "%m/%d/%Y")
With the dates in this format it is easy to extract the year, month, and day using the substring function
Year<-substring(fmt.date.vec,1,4)
Month<-substring(fmt.date.vec,6,7)
Day<-substring(fmt.date.vec,9,10)
With this information you can now generate your Quarter vector any number of ways. For example if a data.frame "df" has a Month column:
df$Quarter<-"Quarter_1"
df[df$Month %in% c("04","05","06"),]$Quarter<-"Quarter_2"
df[df$Month %in% c("07","08","09"),]$Quarter<-"Quarter_3"
df[df$Month %in% c("10","11","12"),]$Quarter<-"Quarter_4"

Post-Process a Stata %tw date in R

The %tw format in Stata has the form: 1960w1 which has no equivalent in R.
Therefore %tw dates must be post-processed.
Importing a .dta file into R, the date is an integer like 1304 (instead of 1985w5) or 1426 (instead of 1987w23). If it was a simple time series you could set a starting date as follows:
ts(df, start= c(1985,5), frequency=52)
Another possibility would be:
as.Date(Camp$date, format= "%Yw%W" , origin = "1985w5")
But if each row is not a single date, then you must convert it.
The package ISOweek is based on ISO-8601 with the form "1985-W05" and does not process the Stata %tw.
The Lubridate package does not work with this format. The week() returns the number of complete seven day periods that have occurred between the date and January 1st, plus one. week function
In Stata week 1 of any year starts on 1 January, whatever day of the week that is. Stata Documentation on Dates
In the format %W of Date in R the week starts as Monday as first day of the week.
From strptime %V is
the Week of the year as decimal number (00--53) as defined in ISO
8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise,
it is the last week of the previous year, and the next week is week 1.
(Accepted but ignored on input.) Strptime
Larmarange noted on Github that Haven doesn't interpret dates properly:
months, week, quarter and halfyear are specific format from Stata,
respectively %tm, %tw, %tq and %th. I'm not sure that there are
corresponding formats available in R. So far they are imported as
integers.
Is there a way to convert Stata %tw to a date format R understands?
Here is an Stata file with dates
This won't be an answer in terms of R code, but it is commentary on Stata weeks that can't be fitted into a comment.
Strictly, dates in Stata are not defined by the display formats that make them intelligible to people. A date in Stata is always a numeric variable or scalar or macro defined with origin the first instance in 1960. Thus it is at best a shorthand to talk about %tw dates, etc. We can use display to see the effects of different date display formats:
. di %td 0
01jan1960
. di %tw 0
1960w1
. di %tq 0
1960q1
. di %td 42
12feb1960
. di %tw 42
1960w43
. di %tq 42
1970q3
A subtle point made explicit above is that changing the display format will not change what is stored, i.e. the numeric value.
Otherwise put, dates in Stata are not distinct data types; they are just integers made intelligible as dates by a pertinent display format.
The question presupposes that it was correct to describe some weekly dates in terms of Stata weeks. This seems unlikely, as I know no instance in which a body outside StataCorp uses the week rules of Stata, not only that week 1 always starts on 1 January, but also that week 52 always includes either 8 or 9 days and hence that there is never a week 53 in a calendar year.
So, you need to go upstream and find out what the data should have been. Failing some explanation, my best advice is to map the 52 weeks of each year to the days that start them, namely days 1(7)358 of each calendar year.
Stata weeks won't map one-to-one to any other scheme for defining weeks.
More in this article on Stata weeks
It's not completely clear what the question is but the year and week corresponding to 1304 are:
wk <- 1304
1960 + wk %/% 52
## [1] 1985
wk %% 52 + 1
## [1] 5
so assuming that the first week of the year is week 1 and starts on Jan 1st, the beginning of the above week is this date:
as.Date(paste(1960 + wk %/% 52, 1, 1, sep = "-")) + 7 * (wk %% 52)
## [1] "1985-01-29"

Producing Ordered Columns of Integers in R for odd-numbered ranges

Total newb R question, but here it is: lets say I want to create a data frame with two columns, one with all years in a range, and the other with every month in each year. When I'm done, I should have this:
year month
1990 1
1990 2
1990 3
Et cetera. This seems like a pretty obvious job for cbind, to turn a range into a column, and repeat, to produce 12 instances of each year. This works great, but only for an even number of years in the range. So, for instance:
df <- data.frame(cbind(year=rep(c(1990:2000), 12)))
Works fine. And so does this:
df <- data.frame(cbind(year=rep(c(1990:2000), 12), month=c(1:12)))
But this produces overt nonsense:
df <- data.frame(cbind(year=rep(c(1990:2001), 12), month=c(1:12)))
The first line of code produces 12 instances of each year in the range, just as you'd expect; the second line produces the desired result. The third line produces 12 instances of each year, where each year only gets one month number. Thus:
year month
1990 1
1990 1
1990 1
Is there a way around this that doesn't require always adding a year and trimming it off later?
You are looking for expand.grid
df <- expand.grid(year = 1990:2001, month = 1:12)

Resources