Converting a string to date format in R - r

Just started learning R and I would like to convert a string of a total of 10 characters (yymmdd and some random numbers) to date format.
Example:
Numbers
1. 6010111234
2. 7012245675
3. 9201015678
4. 0404125689
Desired outcome:
Numbers Dates
1. 6010111234 1960-10-11
2. 7012245675 1970-12-24
3. 9201015678 1992-01-01
4. 0404125689 2004-04-12
I will able to do this easily in excel with the formula Dates, left and right:
DATES(LEFT(Numbers,2),RIGHT(LEFT(Numbers,4),2), RIGHT(LEFT(Numbers,6),2))
I have also tried using as.Date(substr(df$Numbers, 1,6), format=%y%m%d).
However, the results are not what I wanted. The results will be some 4-5 digit numbers.
Can anyone help? Thanks!

If you don't like what dates are put into 20th, respective 21st century by as.Date(..., format = '%y%m%d'), you can write your own variant:
nums <- c('6010111234', '7012245675', '9201015678', '0404125689')
breakpoint <- '30'
dplyr::if_else(substr(nums, 1, 2) >= breakpoint ,
as.Date(paste0('19', substr(nums, 1, 6)), '%Y%m%d'),
as.Date(paste0('20', substr(nums, 1, 6)), '%Y%m%d')
)
#"1960-10-11" "1970-12-24" "1992-01-01" "2004-04-12"
dplyr::if_else is used since ifelse() coerces dates to numeric, see e.g. this question

Related

Find four digit numbers and convert them to calendary date in R

I have a dataframe column that contains a mixture of date formats, for example 30/06/2020,07/2020 and 2020. I would like to convert the four digit numbers into a date (e.g. 2020 -> XX/XX/2020). I have different years, not just 2020, so I would prefer, if possible, a generic expression.
A supplementary question:
when I read the data from an excel file, I get five-digit numbers instead of dates. From what I have read, these numbers are the days passed since 1900. Hence, the actual column involves also five-digit numbers, the four-digit numbers that represent the year, and the other days.
I have dealed with that issue, but not in an optimal way. Is there a generic way to deal all these formats together? Sorry for the large post
K
Thank you all for your ideas. You are right, I need to be more specific next time. I focused on solving the problem to be honest I believe I did it.
Regarding the data, a simple illustration might be the following:
date
08/2003
12/06/2002
38054
2004
...
...
...
First, I found which elements of the dataframe column (RHO_DataBase$date) are expressed as a year (e.g. 2003) and convert them to date (e.g. 15/05/2003):
#Step 1
counter1 <- which( (!is.na(as.numeric(RHO_DataBase$date))) & (as.numeric(RHO_DataBase$date)<2030) )
for (i in counter1) {
RHO_DataBase$date[i] <- paste ("15/05/",sep="",RHO_DataBase$date[i])
}
Then, I found which elements are expressed in numeric values (days since 30/12/1899), and convert their format to day/month/year
#Step 2
counter2 <- which(!is.na(as.numeric(RHO_DataBase$date)))
for (i in counter2) {
RHO_DataBase$date[i] <- format(as.Date(as.numeric(RHO_DataBase$date[i]), origin = "1899-12-30"),'%d/%m/%Y')
}
Then, I found the elements of the column that are expressed in the other remaining format, in this case only month/year, and change it to the day/month/year using paste.
# Step 3:
counter3<-which(is.na(as.Date( RHO_DataBase$date, "%d/%m/%Y") ) )
for (i in counter3) {
RHO_DataBase$date[i] <- paste ("01/",sep="",RHO_DataBase$date[i])
}
Cheers,
K

making 2 digit month and day columns in R

I'm very new to R, so this might seem straightforward. But I have a data frame with an original date column that has values that look like this: 4-02-91, 5-29-93 (i.e. m-d-y). I am trying to separate this column into 3, where months, days, and years are separate. Then I need to combine them again to this format 19910402, 19930529 - I need it this way in order to compare it to another dataset with similar dates.
Here is what I've been trying to do:
# Make DATE an actual date column
dataframe$DATE <- as.Date(used$DATE, format="%m-%d-%Y")
# This changes the original date column into something that looks like this: 1991-04-02, 1993-05-29
# Separate DATE into multiple columns
dataframe$year <- year(dataframe$DATE)
dataframe$month <- month(dataframe$DATE)
dataframe$day <- day(dataframe$DATE)
# Combine dates again to get string
dataframe$raster_date<-paste(dataframe$year, dataframe$month, dataframe$day, sep = "")
The last step looks great except where the months or days are single digits. It's coming out as 199142 and 1993529 instead of 19910402 and 19930529. How do I insert zeros when the month and day values are 1 digit?
Here, we can use sprintf instead of paste as the year, month, day from lubridate extracts those as numeric values and numeric class would drop the 0 padded as prefix. We add those prefix with 0s in sprintf
sprintf("%04d%02d%02d", dataframe$year, dataframe$month, dataframe$day)

Mixed Date formats in R data frame

how do you work with a column of mixed date types, for example 8/2/2020,2/7/2020, and all are reflecting February,
I have tried zoo::as.Date(mixeddatescolumn,"%d/%m/%Y").The first one is right but the second is wrong.
i have tried solutions here too
Fixing mixed date formats in data frame? but the questions seems different from what i am handling.
It is really tricky to know even for a human if dates like '8/2/2020' is 8th February or 2nd August. However, we can leverage the fact that you know all these dates are in February and remove the "2" part of the date which represents the month and arrange the date in one standard format and then convert the date to an actual Date object.
x <- c('8/2/2020','2/7/2020')
lubridate::mdy(paste0('2/', sub('2/', '', x, fixed = TRUE)))
#[1] "2020-02-08" "2020-02-07"
Or same in base R :
as.Date(paste0('2/', sub('2/', '', x, fixed = TRUE)), "%m/%d/%Y")
Since we know that every month is in February search for /2/ or /02/ and if found the middle number is the month; otherwise, the first number is the month. In either case set the format appropriately and use as.Date. No packages are used.
dates <- c("8/2/2020", "2/7/2020", "2/28/2000", "28/2/2000") # test data
as.Date(dates, ifelse(grepl("/0?2/", dates), "%d/%m/%Y", "%m/%d/%Y"))
## [1] "2020-02-08" "2020-02-07" "2000-02-28" "2000-02-28"

Convert YYYY-YY to Year(date)

I have a data frame with year column as financial year
Year
2001-02
2002-03
2003-04
How can I convert this to as.Date keeping either the whole thing or just the second year i.e 2002,2003,2004. On converting with %Y, I inevitably get 2001-08-08, 2002-08-08, 2003-08-08 etc.
Thanks
library(lubridate)
Year <- c('2001-02', '2002-03', '2003-04')
year(as.Date(gsub('[0-9]{2}-', '', Year), format = '%Y'))
1) ISOdate Clarifying the question, since it refers to yearend and Date we assume that the input is the fiscal Year shown in the question (plus we have added the "1999-00" edge case) as well as the month and day of the yearend. We assume that the output desired is the yearend as a Date object. (If that is not the intended question and you just want the fiscal yearend year as a number then see Note at the end.)
Returning to the assumed problem let us suppose, for example, that March 31st is the yearend. Below we extract the first 4 character of Year using substring, convert that to numeric and add 1. Then we pass that along with month and day to ISODate and finally convert that to Date. No regular expressions or packages are used.
# test inputs
month <- 3
day <- 31
Year <- c("1999-00", "2001-02", "2002-03", "2003-04")
# yearends
as.Date(ISOdate( as.numeric(substring(Year, 1, 4))+1, month, day))
## [1] "2000-03-31" "2002-03-31" "2003-03-31" "2004-03-31"
2) string manipulation An alternative solution using the same inputs is the following. It is similar except that we use sub with a regular expression that matches the minus and following two characters subtituting a zero length string for them, converts to numeric and adds 1. Then it formats a string in a format acceptable to as.Date by using sprintf and finally applies as.Date. No packages are used.
as.Date(sprintf("%d-%d-%d", as.numeric(sub("-..", "", Year))+1, month, day))
## [1] "2000-03-31" "2002-03-31" "2003-03-31" "2004-03-31"
Note: If you only wanted the fiscal yearend year as a number then it would be just this:
as.numeric(substring(Year, 1, 4)) + 1

Convert a date vector to ranks

I have the the date vector:
d <- c("30/5/15", "6/6/15", "23/5/15")
I would like to convert it to 2, 3, 1 with smallest rank to older and biggest to newest.
I tried rank(d) but it looks like it makes the ranking based on days only and reverse, it returns 3, 1, 2.
Convert to Date class, then numeric, then rank:
d <- c("30/5/15", "6/6/15", "23/5/15")
rank(as.numeric(as.Date(d, "%d/%m/%y")))
#[1] 2 3 1
Suggestions from comments:
drop as.numeric, as rank can handle dates. Although it might be preferable to be explicit.
use lubridate package: library(lubridate); rank(dmy(d))
convert the data into date format and then rank it. internally date will save in numeric values. so it can rank on it.
d <- c("30/5/15", "6/6/15", "23/5/15")
rank(as.Date(d,'%d/%m/%y'))

Resources