I have date format in following format in a data frame:
Jan-85
Apr-99
1-Nov
Feb-96
When I see the typeof(df$col) I get the answer as "integer".
Actually when I see the format in excel it is in m/d/yyyy format. I was trying to convert this to date format in R. All my efforts yielded NA.
I tried parse_date_time function. I tried as.date along with as.character. I tried as.POSIXct but everything is giving me NA.
My trials were as follows and everything was a failure:
as.Date.numeric(df$col,"m%d%Y")
transform(df$col, as.Date(as.character(df$col), "%m%d%Y"))
as.Date(df$col,"m%d%Y")
as.POSIXct.numeric(as.character(loan_new$issue_d), format="%Y%m%d")
as.POSIXct.date(as.character(df$col), format="%Y%m%d")
mdy(df$col)
parse_date_time(df$col,c("mdy"))
How can I convert this to date format? I have used lubridate package for parse_date_time and mdy package.
dput output is below
Label <- factor(c("Apr-08",
"Apr-09", "Apr-10", "Apr-11", "Aug-07", "Aug-08", "Aug-09", "Aug-10",
"Aug-11", "Dec-07", "Dec-08", "Dec-09", "Dec-10", "Dec-11", "Feb-08",
"Feb-09", "Feb-10", "Feb-11", "Jan-08", "Jan-09", "Jan-10", "Jan-11",
"Jul-07", "Jul-08", "Jul-09", "Jul-10", "Jul-11", "Jun-07", "Jun-08",
"Jun-09", "Jun-10", "Jun-11", "Mar-08", "Mar-09", "Mar-10", "Mar-11",
"May-08", "May-09", "May-10", "May-11", "Nov-07", "Nov-08", "Nov-09",
"Nov-10", "Nov-11", "Oct-07", "Oct-08", "Oct-09", "Oct-10", "Oct-11",
"Sep-07", "Sep-08", "Sep-09", "Sep-10", "Sep-11"))
NA is typically what you get when you misspecify the format. Which is what you do. That said, if your data is really looking like the first example you gave, it's impossible to simply convert this to a date. You have two different formats, one being month-year and the other day-month.
If your updated date (i.e. Dec-11) is the correct format, then you use the format argument of as.Date like this:
date <- "Dec-11"
as.Date(date, format = "%b-%d")
# [1] "2017-12-11"
Or on your example data:
as.Date(Label, format = "%b-%d")
# [1] "2017-04-08" "2017-04-09" "2017-04-10" "2017-04-11" "2017-08-07" "2017-08-08"
# [7] "2017-08-09" "2017-08-10" "2017-08-11" "2017-12-07" "2017-12-08" "2017-12-09"
If you want to convert something like Jan-85, you have to decide which day of the month that date should have. Say we just take the first of each month, then you can do:
x <- "Jan-85"
xd <- paste0("1-",x)
as.Date(xd, "%d-%b-%y")
# [1] "1985-01-01"
More information on the format codes can be found on ?strptime
Note that R will automatically add this year as the year. It has to, otherwise it can't specify the date. In case you do not have a day of the month (eg like Jan-85), conversion to a date is impossible because the underlying POSIX algorithms don't have all necessary information.
Also keep in mind that this only works when your locale is set to english. Otherwise you have a big chance your OS won't recognize the month abbreviations correctly. To do so, do eg:
Sys.setlocale(category = "LC_TIME", locale = "English_United Kingdom")
You can later set it back to the original one if you must, or restart your R session to reset the locale settings.
note: Please check carefully which locale notations are valid for your OS. The above example works on Windows, but is not guaranteed on either Linux or Mac.
Why you see integer
The fact that these string values are of integer type, is due to the fact that R automatically convert character vectors to factors when reading in a data frame. So typeof() returns integer because that's the internal representation of a factor.
Related
I'm making a 'Make your own graph tutorial' using Rmarkdown,
I have a dataset with a date variable (format: day-month-year), using dmy(variable) I can correctly class the variable as a 'Date' variable (format:year-month-day).
Users have to filter based on this date, and seeing as Dutch people use the day-month-year format I want to allow them to filter based on this format. (I could simply tell them to filter in the year-month-day format but this complicates the exercise.)
Goal of question: getting R to recognize a variable as date, but using a day-month-year format
date <- "28-02-2020"
date <- as.Date(date, "%d-%m-%Y")
class(date)
again: date now gives "2020-02-28" - I want "28-02-2020" (with class: Date)
As far as I understand, no, it is not possible.
Internally, R stores dates as the number of days since January 1, 1970, and prints them out in format "YYYY-mm-dd". Given that people use different formats around the world, it would be really messy if it would be otherwise.
As a side note, if you need to plot your data, ggplot2 gives you the option to properly format your dates, without having to transform them into characters or factors.
As a side note 2, base r lets you adjust the format of the date, and this depends on the system locale that are available, which you can set (this is different on different systems, I am using ubuntu):
Sys.setlocale("LC_TIME", "en_US.utf8")
format(Sys.Date(), format = "%Y-%b-%d")
[1] "2020-May-01"
Sys.setlocale("LC_TIME", "de_CH.utf8")
format(Sys.Date(), format = "%Y-%b-%d")
[1] "2020-Mai-01"
See more info here:
How to change the locale of R?
How to set the default language of date in R
I have a data frame called RequisitionHistory2 with a variable called RequisitionDateTime and the levels are factors which look like 4/30/2019 14:16 I would like to split this into RequisitionDate and RequisitionTime in a datetime format.
I tried this code, but this still does not solve my issue with needing to split these into their own columns. The code also did not work as I got the error below.
mutate(When = as.POSIXct(RequisitionHistory2, format="%m/%d/%. %H:%M %p"))
Error in as.POSIXct.default(RequisitionHistory2, format = "%m/%d/%. %H:%M %p") : do not know how to convert 'RequisitionHistory2' to class “POSIXct”
I would like to have the variable RequisitionDateTime split into RequisitionDate and another variable RequisitionTime in the dataframe RequisitionHistory2. Any help is greatly appreciated!
Do not convert factors to datetime directly. You will need to convert it to a character first and then use a datetime function.
as.Date(as.character("10/25/2018"), format = "%m/%d/%Y")
would work for your date example.
library(lubridate)
mutate(df,When = mdy_hm(RequisitionHistory2))
If your datetime is in 4/30/2019 14:16 format
Note that as.POSIXct() works only on datetimes already in ISO 8601 format. I wrote a blog post about this and I think would be helpful for you to check out:
https://jackylam.io/tutorial/uber-data/
The anytime package ON CRAN directly converts from many formats, including factor and ordered to dates and datetime objects. It also heuristically tries a number of viable formats so that you do not need a format string. See the README at GitHub for an introduction, there is also a vignette
Your example works:
R> library(anytime)
R> anytime(as.factor("4/30/2019 14:16"))
[1] "2019-04-30 14:16:00 CDT"
R> anytime(as.factor("4/3/2019 14:16:17"), useR=TRUE)
[1] "2019-04-03 14:16:17 CDT"
R>
However, the underlying (Boost C++) parser does not like single digit days or month so you may need to flip back to R's parser via useR=TRUE as I did on the second example.
I read a .xlsx file containing Date columns into R and converted it into dataframe.
Some date columns are being read correctly but most of the others are getting converted to "43116" format.
Any attempt to convert it into Date using as.Date(, origin= <>, format=<>) is returning NA.
I have tried all possible solutions like using 'stringAsFactors = FALSE', POSIT thing and checking the excel file for date formats but nothing worked.
Please help.
It is difficult to recreate the problem if no data is provided, but if you want to convert the number 43129 or the character "43129" to a R date you should do the following:
a <- 43129
b <- '43129'
format(as.Date(a, origin = "1899-12-30"), '%Y-%m-%d')
[1] "2018-01-29"
format(as.Date(as.integer(b), origin = "1899-12-30"), '%Y-%m-%d')
[1] "2018-01-29"
I used the format yyyy-mm-dd, but any other date format could be used if you format it properly.
Hope it helps!
I am relatively new to R and I have a dataset in which I am trying to convert a date and time into a numeric value. The date and time are in the format 01JUN17:00:00:00 under a variable called pickup_datetime. I have tried using the code
cab_small_sample$pickup_datetime <- as.numeric(as.Date(cab_small_sample$pickup_datetime, format = '%d%b%y'))
but this way doesn't incorporate time, I tried to add the time format to the format section of code but still did not work. Is there an R function that will convert the data into a numeric value>
R has two main time classes: "Date" and "POSIXct". POSIXct is a datetime class and you can get all the gory details at: ? DateTimeClasses. The help page for the formats used at the time of data input, however, are at ?striptime.
cab_small_sample <- data.frame(pickup_datetime = "01JUN17:00:00:00")
cab_small_sample$pickup_dt <- as.numeric(as.POSIXct(cab_small_sample$pickup_datetime,
format = '%d%b%y:%H:%M:%S'))
cab_small_sample
# pickup_datetime pickup_dt
#1 01JUN17:00:00:00 1496300400 # seconds since 1970-01-01
I find that a "destructive reassignment of values" is generally a bad idea so as a "my (best?) practice rule" I don't assign to the same column until I'm sure I have the code working properly. (And I always leave an untouched copy somewhere safe.)
lubridate is an extremely handy package for dealing with dates. It includes a variety of functions which do the date/time parsing for you, as long as you can provide the order of components. In this case, since your data is in day-month-year-hms form, you can use the dmy_hms function.
library(lubridate)
cab_small_sample <- dplyr::tibble(
pickup_datetime = c("01JUN17:00:00:00", "01JUN17:11:00:00"))
cab_small_sample$pickup_POSIX <- dmy_hms(cab_small_sample$pickup_datetime)
I have a dataset called EPL2011_12. I would like to make new a dataset by subsetting the original by date. The dates are in the column named Date The dates are in DD-MM-YY format.
I have tried
EPL2011_12FirstHalf <- subset(EPL2011_12, Date > 13-01-12)
and
EPL2011_12FirstHalf <- subset(EPL2011_12, Date > "13-01-12")
but get this error message each time.
Warning message:
In Ops.factor(Date, 13- 1 - 12) : > not meaningful for factors
I guess that means R is treating like text instead of a number and that why it won't work?
Well, it's clearly not a number since it has dashes in it. The error message and the two comments tell you that it is a factor but the commentators are apparently waiting and letting the message sink in. Dirk is suggesting that you do this:
EPL2011_12$Date2 <- as.Date( as.character(EPL2011_12$Date), "%d-%m-%y")
After that you can do this:
EPL2011_12FirstHalf <- subset(EPL2011_12, Date2 > as.Date("2012-01-13") )
R date functions assume the format is either "YYYY-MM-DD" or "YYYY/MM/DD". You do need to compare like classes: date to date, or character to character. And if you were comparing character-to-character, then it's only going to be successful if the dates are in the YYYYMMDD format (with identical delimiters if any delimiters are used).
The first thing you should do with date variables is confirm that R reads it as a Date. To do this, for the variable (i.e. vector/column) called Date, in the data frame called EPL2011_12, input
class(EPL2011_12$Date)
The output should read [1] "Date". If it doesn't, you should format it as a date by inputting
EPL2011_12$Date <- as.Date(EPL2011_12$Date, "%d-%m-%y")
Note that the hyphens in the date format ("%d-%m-%y") above can also be slashes ("%d/%m/%y"). Confirm that R sees it as a Date. If it doesn't, try a different formatting command
EPL2011_12$Date <- format(EPL2011_12$Date, format="%d/%m/%y")
Once you have it in Date format, you can use the subset command, or you can use brackets
WhateverYouWant <- EPL2011_12[EPL2011_12$Date > as.Date("2014-12-15"),]