How do I rearrange dates in R? - r

Here is just one date that I have (out of more than 6,000)
02/01/15
This is expressed as January 2nd, 2015. And I would like the date to instead look like the following,
2015/01/02
I read up on this thread: Changing date format to "%d/%m/%Y"
But unless my R does not work properly, none of the answers give the correct format, instead I get this output,
0002/01/15

You can do:
format(as.Date("02/01/15", format = "%d/%m/%y"), "%Y/%m/%d")
[1] "2015/01/02"

The lubridate package is very useful for date and time manipulation
library(lubridate)
x <- lubridate::dmy('02/01/15')
format(x, format = ('%Y/%m/%d'))

Related

Cannot convert date from Jan 11, 2002 to 2002-01-11 in R

I'm trying to change the date format in R.
I have a data frame and one of the columns contains dates (as strings) in given format:
Jan 11, 2002
but I would like to change the format to (also as string):
2002-01-11
I have tried many things, but nothing seems to work.
My best shot was trying to convert it to Data object and then convert it back to string, but in different format.
Here is a piece of my code:
df$date = strftime(as.Date(df$date, format="%b %d, %Y"), "%Y-%m-%d")
I was trying other ways, but the result is always NA or a string, but in 'old' format.
I think there is something wrong with the first format: "%b %d, %Y", because when I tried the same thing but with different input, e.g. 11/01/2002 ("%d/%m/%Y") everything worked just fine.
I'm pretty new to R so any help would be appreciated.
There are a number of packages that can help (as well as base R functions, see the first comment). Here is my favourite solution which does require any input format help:
> library(anytime) # for function anydate()
> input <- "Jan 11, 2002"
> d <- anydate(input)
> d
[1] "2002-01-11"
>
This would need a reproducible example for an accurate response.
First, confirm that the source dates are all in the same format, otherwise, this can lead to parsing errors.
Try the following:
# Using Lubridate: lubridate::mdy (note, this can be any order depending on your data, e.g. mdy, dmy, ymd, etc. so is flexible
df <- df %>% mutate(strftime = lubridate::mdy(strftime))
Please do look into reproducible examples to get the best answers.
Maybe you wrote it wrong, see, you've written:
df$date = strftime(as.Date(df$date, format="%b %d, %Y"), "%Y-%m-%d")
should be:
df$date = strftime(as.Date(df$date, format="%m %d, %Y"), "%Y-%m-%d")
in format() there's a '%b' and I think it should be '%m'

How to read in excel file when Date and Time in the same column in R

I am trying to read an excel file into R. Among other fields, the excel file has two "date" fields, each containing both the date and time stamp in the SAME field.
Example:
StartDate 9/14/2019 10:18:59 AM
EndDate 9/18/2019 2:27:14 AM
When I tried read_excel to read in the excel file, the data frame formatted these two columns very strangely. It spat out the days (with decimals). Such as 43712.429849537039, Which I thought was days from Jan-01-1970 (the origin date that popped up when I typed lubrudate::origin).
data %<>%
mutate(StartDate = as.Date(StartDate, origin = "1970-01-01 UTC"))
So I tried converting this back using as.Date, but it converts it to the totally wrong date... (converts all the dates to the year 2089). Example, 2089-09-05.
Any help with this would be really appreciated! There must be a simpler way to directly read in a date-time column?!
You can use the lubridate package, it is excellent:
library(tidyverse)
df <- data.frame(StartDate =c("9/14/2019 10:18:59 AM","9/14/2019 3:18:59 PM"),
EndDate= c("9/18/2019 2:27:14 AM","9/18/2019 1:27:14 PM"))
df <- df %>% mutate(StartDate = lubridate::mdy_hms(StartDate), EndDate = lubridate::mdy_hms(EndDate))
It turns out that excel has a different "origin date" from R. Excels counts the days from 01-01-1900, where as R counts days from 01-01-1970.
When I used read_excel to read the file into a df, R used excels' counts of days. Which is why I got a weird date when I tried to convert to the date format using 1970. As soon as I used as.Date with excels "origin" date of 1990 (excels origin date), my dates parsed out correctly!

R How to change format of date from MM/DD/YYYY to YYYYMMDD and YY-Month (Abbreviated Month)?

I have a column called data$Month. The data in that column is formatted like 11/30/2018. I would like to change the format to 20181130 and 18-Nov and use that in other columns.
I've tried this.
data$Month2 <- format(as.Date(data$Month, "%m/%d/%Y"), "%Y%M%D")
data$Month3 <- format(as.Date(data$Month, "%m/%d/%Y"), "%Y-%m")
data$tranId=paste("EXAMPLE","_",data$Month2)
data$postingperiod=data$Month3
data$Month2<-NULL
data$Month3<-NULL
But, I get data that looks like Example_ 20180011/30/18 and 2017-11, respectively. Also, I feel like the code could be simplified, I'm going to be running this in a loop and would like to not have to use as many functions if possible. I'm sure it's a simple solution but I would really appreciate any help.
Make sure your dates are formatted correctly i.e. they are not factors -
x <- as.Date("11/30/2018", format = "%m/%d/%Y")
paste0("Example_", format.Date(x, "%Y%m%d"))
[1] "Example_20181130"
format.Date(x, "%y-%b")
[1] "18-Nov"

How to deal with irregular date formats or date error in R?

I am reading a csv file in R and it has date column.I am using
as.Date(dat$date, format ="%d-%m-%Y")
But i am getting dates in
0012-02-14
with the year 2012 described as 0012. How to deal with this error.
I also tried lubridate package but no results
col1 col2 policydate
112345 Renew 02/28/2012
156566 Not Renew 03/25/2010
895414 Renew 10/01/2006
Something like this.
Use this code:
as.Date(dat$date, format ="%m/%d/%Y")
[1] "2012-02-28" "2010-03-25" "2006-10-01"
Your problem was related to the use of - instead / and the reversal of days and month in the format code.
You need to do the format as per your date format which is month/day/Year.
data <- data.frame(Date=c("02/28/2012","03/25/2010"))
data$Date <- as.Date(data$Date, format ="%m/%d/%Y")
Result
"2012-02-28" "2010-03-25"
You can always follow this style guide to format your date.

Converting integer format date to double format of date

I have date format in following format in a data frame:
Jan-85
Apr-99
1-Nov
Feb-96
When I see the typeof(df$col) I get the answer as "integer".
Actually when I see the format in excel it is in m/d/yyyy format. I was trying to convert this to date format in R. All my efforts yielded NA.
I tried parse_date_time function. I tried as.date along with as.character. I tried as.POSIXct but everything is giving me NA.
My trials were as follows and everything was a failure:
as.Date.numeric(df$col,"m%d%Y")
transform(df$col, as.Date(as.character(df$col), "%m%d%Y"))
as.Date(df$col,"m%d%Y")
as.POSIXct.numeric(as.character(loan_new$issue_d), format="%Y%m%d")
as.POSIXct.date(as.character(df$col), format="%Y%m%d")
mdy(df$col)
parse_date_time(df$col,c("mdy"))
How can I convert this to date format? I have used lubridate package for parse_date_time and mdy package.
dput output is below
Label <- factor(c("Apr-08",
"Apr-09", "Apr-10", "Apr-11", "Aug-07", "Aug-08", "Aug-09", "Aug-10",
"Aug-11", "Dec-07", "Dec-08", "Dec-09", "Dec-10", "Dec-11", "Feb-08",
"Feb-09", "Feb-10", "Feb-11", "Jan-08", "Jan-09", "Jan-10", "Jan-11",
"Jul-07", "Jul-08", "Jul-09", "Jul-10", "Jul-11", "Jun-07", "Jun-08",
"Jun-09", "Jun-10", "Jun-11", "Mar-08", "Mar-09", "Mar-10", "Mar-11",
"May-08", "May-09", "May-10", "May-11", "Nov-07", "Nov-08", "Nov-09",
"Nov-10", "Nov-11", "Oct-07", "Oct-08", "Oct-09", "Oct-10", "Oct-11",
"Sep-07", "Sep-08", "Sep-09", "Sep-10", "Sep-11"))
NA is typically what you get when you misspecify the format. Which is what you do. That said, if your data is really looking like the first example you gave, it's impossible to simply convert this to a date. You have two different formats, one being month-year and the other day-month.
If your updated date (i.e. Dec-11) is the correct format, then you use the format argument of as.Date like this:
date <- "Dec-11"
as.Date(date, format = "%b-%d")
# [1] "2017-12-11"
Or on your example data:
as.Date(Label, format = "%b-%d")
# [1] "2017-04-08" "2017-04-09" "2017-04-10" "2017-04-11" "2017-08-07" "2017-08-08"
# [7] "2017-08-09" "2017-08-10" "2017-08-11" "2017-12-07" "2017-12-08" "2017-12-09"
If you want to convert something like Jan-85, you have to decide which day of the month that date should have. Say we just take the first of each month, then you can do:
x <- "Jan-85"
xd <- paste0("1-",x)
as.Date(xd, "%d-%b-%y")
# [1] "1985-01-01"
More information on the format codes can be found on ?strptime
Note that R will automatically add this year as the year. It has to, otherwise it can't specify the date. In case you do not have a day of the month (eg like Jan-85), conversion to a date is impossible because the underlying POSIX algorithms don't have all necessary information.
Also keep in mind that this only works when your locale is set to english. Otherwise you have a big chance your OS won't recognize the month abbreviations correctly. To do so, do eg:
Sys.setlocale(category = "LC_TIME", locale = "English_United Kingdom")
You can later set it back to the original one if you must, or restart your R session to reset the locale settings.
note: Please check carefully which locale notations are valid for your OS. The above example works on Windows, but is not guaranteed on either Linux or Mac.
Why you see integer
The fact that these string values are of integer type, is due to the fact that R automatically convert character vectors to factors when reading in a data frame. So typeof() returns integer because that's the internal representation of a factor.

Resources