I have an imported CSV in R which contains a column of dates and times - this is imported into R as character. The format is "30/03/2020 08:59". I want to convert these strings into a format that allows me to work on them. For simplicity I have made a dataframe which has a single column of these dates (854) in this format.
I'm trying to use the parse_date_time function from lubridate.
It works fine when I reference a single value, e.g.
b=parse_date_time(consults_dates[3,1],orders="dmy HM")
gives b=2020-03-30 09:08:00
However, when I try to perform this on the entire(consults_dates), I get an error, e.g.
c= parse_date_time(consults_dates,orders="dmy HM") gives error:
Warning message:
All formats failed to parse. No formats found.
Apologies - if this is blatantly a simple question, day 1 of R after years of Matlab.
You need to pass the column to parse_date_time function and not the entire dataframe.
library(lubridate)
consults_dates$colum_name <- parse_date_time(consults_dates$colum_name, "dmy HM")
However, if you have only one format in the column you can use dmy_hm
consults_dates$colum_name <- dmy_hm(consults_dates$colum_name)
In base R, we can use :
consults_dates$colum_name <- as.POSIXct(consults_dates$colum_name,
format = "%d/%m/%Y %H:%M", tz = "UTC")
Related
I know this question is asked a lot, but I only come to you because I tried everything (including the tips from similar questions that I managed to understand).
I have a rather big CSV file (> 16 000 rows), with, among others, a "Date" column, containing dates in the following format "01/01/1999".
However, when loading the file, the column is not recognised as a date, but as a Factor with read.csv2, or a character with fread (data.table package). I loaded the lubridate library.
In both cases, I tried to convert the column to a date format, using all methods I knew (column = Date, data = test):
as.Date(test$Date, format = "%d/%m/%Y", tz = "")
Or
strptime(test$Date, format = "%d/%m/%y", tz = "")
Or
as_date(test$Date)
And also the function dmy from lubridate, and
as.POSIXct(test$Date, "%d/%m/%y", tz = "").
I also tried changing the format: ymd instead of dmy, "-" instead of "/".
I even tried to change the character class to numeric (when loaded with fread), and the factor class to numeric (when loaded with read.csv2).
Despite all of this, the columns stay in their factor / character classes.
Does someone know what I missed?
Just use the anydate() function from the anytime package:
R> library(anytime)
R> var <- as.factor(c("01/01/1999", "01/02/1999"))
R> anydate(var)
[1] "1999-01-01" "1999-01-02"
R>
R> class(anydate(var))
[1] "Date"
R>
R> class(var)
[1] "factor"
R>
R>
It will read just about any input time, and convert it without requiring a format and this works as long as the represented is somewhat standard (i.e. we do not work with two-digit years etc).
(Otherwise you can of course also use the base R functions after first converting from factor back to character via as.character(). But anytime() and anydate() do that, and much more, for you too.)
If you are using read.csv2, try
read.csv2(..., stringsAsFactors=F)
and then continue with as.Date
I have a data frame called RequisitionHistory2 with a variable called RequisitionDateTime and the levels are factors which look like 4/30/2019 14:16 I would like to split this into RequisitionDate and RequisitionTime in a datetime format.
I tried this code, but this still does not solve my issue with needing to split these into their own columns. The code also did not work as I got the error below.
mutate(When = as.POSIXct(RequisitionHistory2, format="%m/%d/%. %H:%M %p"))
Error in as.POSIXct.default(RequisitionHistory2, format = "%m/%d/%. %H:%M %p") : do not know how to convert 'RequisitionHistory2' to class “POSIXct”
I would like to have the variable RequisitionDateTime split into RequisitionDate and another variable RequisitionTime in the dataframe RequisitionHistory2. Any help is greatly appreciated!
Do not convert factors to datetime directly. You will need to convert it to a character first and then use a datetime function.
as.Date(as.character("10/25/2018"), format = "%m/%d/%Y")
would work for your date example.
library(lubridate)
mutate(df,When = mdy_hm(RequisitionHistory2))
If your datetime is in 4/30/2019 14:16 format
Note that as.POSIXct() works only on datetimes already in ISO 8601 format. I wrote a blog post about this and I think would be helpful for you to check out:
https://jackylam.io/tutorial/uber-data/
The anytime package ON CRAN directly converts from many formats, including factor and ordered to dates and datetime objects. It also heuristically tries a number of viable formats so that you do not need a format string. See the README at GitHub for an introduction, there is also a vignette
Your example works:
R> library(anytime)
R> anytime(as.factor("4/30/2019 14:16"))
[1] "2019-04-30 14:16:00 CDT"
R> anytime(as.factor("4/3/2019 14:16:17"), useR=TRUE)
[1] "2019-04-03 14:16:17 CDT"
R>
However, the underlying (Boost C++) parser does not like single digit days or month so you may need to flip back to R's parser via useR=TRUE as I did on the second example.
I have a dataset called EPL2011_12. I would like to make new a dataset by subsetting the original by date. The dates are in the column named Date The dates are in DD-MM-YY format.
I have tried
EPL2011_12FirstHalf <- subset(EPL2011_12, Date > 13-01-12)
and
EPL2011_12FirstHalf <- subset(EPL2011_12, Date > "13-01-12")
but get this error message each time.
Warning message:
In Ops.factor(Date, 13- 1 - 12) : > not meaningful for factors
I guess that means R is treating like text instead of a number and that why it won't work?
Well, it's clearly not a number since it has dashes in it. The error message and the two comments tell you that it is a factor but the commentators are apparently waiting and letting the message sink in. Dirk is suggesting that you do this:
EPL2011_12$Date2 <- as.Date( as.character(EPL2011_12$Date), "%d-%m-%y")
After that you can do this:
EPL2011_12FirstHalf <- subset(EPL2011_12, Date2 > as.Date("2012-01-13") )
R date functions assume the format is either "YYYY-MM-DD" or "YYYY/MM/DD". You do need to compare like classes: date to date, or character to character. And if you were comparing character-to-character, then it's only going to be successful if the dates are in the YYYYMMDD format (with identical delimiters if any delimiters are used).
The first thing you should do with date variables is confirm that R reads it as a Date. To do this, for the variable (i.e. vector/column) called Date, in the data frame called EPL2011_12, input
class(EPL2011_12$Date)
The output should read [1] "Date". If it doesn't, you should format it as a date by inputting
EPL2011_12$Date <- as.Date(EPL2011_12$Date, "%d-%m-%y")
Note that the hyphens in the date format ("%d-%m-%y") above can also be slashes ("%d/%m/%y"). Confirm that R sees it as a Date. If it doesn't, try a different formatting command
EPL2011_12$Date <- format(EPL2011_12$Date, format="%d/%m/%y")
Once you have it in Date format, you can use the subset command, or you can use brackets
WhateverYouWant <- EPL2011_12[EPL2011_12$Date > as.Date("2014-12-15"),]
I have date format in following format in a data frame:
Jan-85
Apr-99
1-Nov
Feb-96
When I see the typeof(df$col) I get the answer as "integer".
Actually when I see the format in excel it is in m/d/yyyy format. I was trying to convert this to date format in R. All my efforts yielded NA.
I tried parse_date_time function. I tried as.date along with as.character. I tried as.POSIXct but everything is giving me NA.
My trials were as follows and everything was a failure:
as.Date.numeric(df$col,"m%d%Y")
transform(df$col, as.Date(as.character(df$col), "%m%d%Y"))
as.Date(df$col,"m%d%Y")
as.POSIXct.numeric(as.character(loan_new$issue_d), format="%Y%m%d")
as.POSIXct.date(as.character(df$col), format="%Y%m%d")
mdy(df$col)
parse_date_time(df$col,c("mdy"))
How can I convert this to date format? I have used lubridate package for parse_date_time and mdy package.
dput output is below
Label <- factor(c("Apr-08",
"Apr-09", "Apr-10", "Apr-11", "Aug-07", "Aug-08", "Aug-09", "Aug-10",
"Aug-11", "Dec-07", "Dec-08", "Dec-09", "Dec-10", "Dec-11", "Feb-08",
"Feb-09", "Feb-10", "Feb-11", "Jan-08", "Jan-09", "Jan-10", "Jan-11",
"Jul-07", "Jul-08", "Jul-09", "Jul-10", "Jul-11", "Jun-07", "Jun-08",
"Jun-09", "Jun-10", "Jun-11", "Mar-08", "Mar-09", "Mar-10", "Mar-11",
"May-08", "May-09", "May-10", "May-11", "Nov-07", "Nov-08", "Nov-09",
"Nov-10", "Nov-11", "Oct-07", "Oct-08", "Oct-09", "Oct-10", "Oct-11",
"Sep-07", "Sep-08", "Sep-09", "Sep-10", "Sep-11"))
NA is typically what you get when you misspecify the format. Which is what you do. That said, if your data is really looking like the first example you gave, it's impossible to simply convert this to a date. You have two different formats, one being month-year and the other day-month.
If your updated date (i.e. Dec-11) is the correct format, then you use the format argument of as.Date like this:
date <- "Dec-11"
as.Date(date, format = "%b-%d")
# [1] "2017-12-11"
Or on your example data:
as.Date(Label, format = "%b-%d")
# [1] "2017-04-08" "2017-04-09" "2017-04-10" "2017-04-11" "2017-08-07" "2017-08-08"
# [7] "2017-08-09" "2017-08-10" "2017-08-11" "2017-12-07" "2017-12-08" "2017-12-09"
If you want to convert something like Jan-85, you have to decide which day of the month that date should have. Say we just take the first of each month, then you can do:
x <- "Jan-85"
xd <- paste0("1-",x)
as.Date(xd, "%d-%b-%y")
# [1] "1985-01-01"
More information on the format codes can be found on ?strptime
Note that R will automatically add this year as the year. It has to, otherwise it can't specify the date. In case you do not have a day of the month (eg like Jan-85), conversion to a date is impossible because the underlying POSIX algorithms don't have all necessary information.
Also keep in mind that this only works when your locale is set to english. Otherwise you have a big chance your OS won't recognize the month abbreviations correctly. To do so, do eg:
Sys.setlocale(category = "LC_TIME", locale = "English_United Kingdom")
You can later set it back to the original one if you must, or restart your R session to reset the locale settings.
note: Please check carefully which locale notations are valid for your OS. The above example works on Windows, but is not guaranteed on either Linux or Mac.
Why you see integer
The fact that these string values are of integer type, is due to the fact that R automatically convert character vectors to factors when reading in a data frame. So typeof() returns integer because that's the internal representation of a factor.
I'm getting an error using smartbind to append two datasets. First, I'm pretty sure the error I'm getting:
> Error in as.vector(x, mode) : invalid 'mode' argument
is coming from the date variable in both datasets. The date variable in it's raw format is such: month/day/year. I transformed the variable after importing the data using as.Date and format
> rs.month$xdeeddt <- as.Date(rs.month$xdeeddt, "%m/%d/%Y")
> rs.month$deed.year <- as.numeric(format(rs.month$xdeeddt, format = "%Y"))
> rs.month$deed.day <- as.numeric(format(rs.month$xdeeddt, format = "%d"))
> rs.month$deed.month <- as.numeric(format(rs.month$xdeeddt, format = "%m"))
The resulting date variable is as such:
> [1] "2014-03-01" "2014-03-13" "2014-01-09" "2013-10-09"
The transformation for the date was applied to both datasets (the format of the raw data was identical for both datasets). When I try to use smartbind, from the gtools package, to append the two datasets it returns with the error above. I removed the date, month, day, and year variables from both datasets and was able to append the datasets successfully with smartbind.
Any suggestions on how I can append the datasets with the date variables.....?
I came here after googling for the same error message during a smartbind of two data frames. The discussion above, while not so conclusive about a solution, definitely helped me move through this error.
Both my data frames contain POSIXct date objects. Those are just a numeric vector of UNIXy seconds-since-epoch, along with a couple of attributes that provide the structure needed to interpret the vector as a date object. The solution is simply to strip the attributes from that variable, perform the smartbind, and then restore the attributes:
these.atts <- attributes(df1$date)
attributes(df1$date) <- NULL
attributes(df2$date) <- NULL
df1 <- smartbind(df1,df2)
attributes(df1$date) <- these.atts
I hope this helps someone, sometime.
-Andy