Converting to Date Time Using as.POSIXct - r

I have a column "DateTime".
Example value: 2016-12-05-16.25.54.875000
When I import this, R reads it as a Factor.
Now, when I sort the dataset by decreasing "DateTime", the maximum DateTime is 23 June 2017. When I use DateTime = as.POSIXct(DateTime), it changes to 22 June 2017. How is this happening?
P.S. I am running this R script in Power BI.

So some comments first. When you read strings in R, unless you specify otherwise they are imported as factors. You can use the Option
Trying what #Disco Superfly has suggested works if you define the data as a string in R
> a <- "2016-12-05-16.25.54.875000"
> as.POSIXct(a, format="%Y-%m-%d-%H.%M.%S")
[1] "2016-12-05 16:25:54 CET"
> as.POSIXct(a)
[1] "2016-12-05 CET"
Is not clear what you are saying about the fact that the data is being changed. Can you give a reproducible example?
To summarize, if your Dates are strings that what other have already suggested works perfectly. I suppose you are trying to do more than what you are explaining and therefore I don't understand what you are saying exactly.

Related

Convert factors to datetime, <fctr> 10/25/2018 (M, D, Y)

I have a data frame called RequisitionHistory2 with a variable called RequisitionDateTime and the levels are factors which look like 4/30/2019 14:16 I would like to split this into RequisitionDate and RequisitionTime in a datetime format.
I tried this code, but this still does not solve my issue with needing to split these into their own columns. The code also did not work as I got the error below.
mutate(When = as.POSIXct(RequisitionHistory2, format="%m/%d/%. %H:%M %p"))
Error in as.POSIXct.default(RequisitionHistory2, format = "%m/%d/%. %H:%M %p") : do not know how to convert 'RequisitionHistory2' to class “POSIXct”
I would like to have the variable RequisitionDateTime split into RequisitionDate and another variable RequisitionTime in the dataframe RequisitionHistory2. Any help is greatly appreciated!
Do not convert factors to datetime directly. You will need to convert it to a character first and then use a datetime function.
as.Date(as.character("10/25/2018"), format = "%m/%d/%Y")
would work for your date example.
library(lubridate)
mutate(df,When = mdy_hm(RequisitionHistory2))
If your datetime is in 4/30/2019 14:16 format
Note that as.POSIXct() works only on datetimes already in ISO 8601 format. I wrote a blog post about this and I think would be helpful for you to check out:
https://jackylam.io/tutorial/uber-data/
The anytime package ON CRAN directly converts from many formats, including factor and ordered to dates and datetime objects. It also heuristically tries a number of viable formats so that you do not need a format string. See the README at GitHub for an introduction, there is also a vignette
Your example works:
R> library(anytime)
R> anytime(as.factor("4/30/2019 14:16"))
[1] "2019-04-30 14:16:00 CDT"
R> anytime(as.factor("4/3/2019 14:16:17"), useR=TRUE)
[1] "2019-04-03 14:16:17 CDT"
R>
However, the underlying (Boost C++) parser does not like single digit days or month so you may need to flip back to R's parser via useR=TRUE as I did on the second example.

How to convert date and time into a numeric value in R

I am relatively new to R and I have a dataset in which I am trying to convert a date and time into a numeric value. The date and time are in the format 01JUN17:00:00:00 under a variable called pickup_datetime. I have tried using the code
cab_small_sample$pickup_datetime <- as.numeric(as.Date(cab_small_sample$pickup_datetime, format = '%d%b%y'))
but this way doesn't incorporate time, I tried to add the time format to the format section of code but still did not work. Is there an R function that will convert the data into a numeric value>
R has two main time classes: "Date" and "POSIXct". POSIXct is a datetime class and you can get all the gory details at: ? DateTimeClasses. The help page for the formats used at the time of data input, however, are at ?striptime.
cab_small_sample <- data.frame(pickup_datetime = "01JUN17:00:00:00")
cab_small_sample$pickup_dt <- as.numeric(as.POSIXct(cab_small_sample$pickup_datetime,
format = '%d%b%y:%H:%M:%S'))
cab_small_sample
# pickup_datetime pickup_dt
#1 01JUN17:00:00:00 1496300400 # seconds since 1970-01-01
I find that a "destructive reassignment of values" is generally a bad idea so as a "my (best?) practice rule" I don't assign to the same column until I'm sure I have the code working properly. (And I always leave an untouched copy somewhere safe.)
lubridate is an extremely handy package for dealing with dates. It includes a variety of functions which do the date/time parsing for you, as long as you can provide the order of components. In this case, since your data is in day-month-year-hms form, you can use the dmy_hms function.
library(lubridate)
cab_small_sample <- dplyr::tibble(
pickup_datetime = c("01JUN17:00:00:00", "01JUN17:11:00:00"))
cab_small_sample$pickup_POSIX <- dmy_hms(cab_small_sample$pickup_datetime)

How to Convert Date in "01MAR1978:00:00:00" string format to Date Format in SparkR?

I have dates in the following formats:
08MAR1978:00:00:00
10FEB1973:00:00:00
15AUG1982:00:00:00
I would like to convert them to:
1978-03-08
1973-02-10
1982-09-15
I have tried the following in SparkR:
period_uts <- unix_timestamp(all.new$DATE_OF_BIRTH, '%d%b%Y:%H:%M:%S')
period_ts <- cast(period_uts, 'timestamp')
period_dt <- cast(period_ts, 'date')
df <- withColumn(all.new, 'p_dt', period_dt)
But when I do this, all the dates get changed into "NA".
Can anyone please provide some insights on how I can convert dates in %d%B%Y:%H:%M:%S format to dates in SparkR?
Thanks!
I don't think you need SparkR to solve this question.
What you have:
DoB <- c("08MAR1978:00:00:00", "10FEB1973:00:00:00", "15AUG1982:00:00:00")
If you want to get 1978-03-08 etc. you could just use as.Date in combination with the date format you already found yourself:
as.Date(DoB, format="%d%B%Y:%H:%M:%S")
# [1] "1978-03-08" "1973-02-10" "1982-08-15"
as.Date will ensure that R knows how to interpret your string as a date.
Note, however, that in general the way dates are displayed to you (i.e. 1978-03-08) actually don't really matter. The reason is that 'under the hood', R understands your date now, so all date-related operations will be performed appropriately.
I figured out how to do it:
all.new = all.new %>% withColumn("Date_of_Birth_Fixed", to_date(.$DATE_OF_BIRTH, "ddMMMyyyy"))
This works in Spark 2.2.x

Converting integer format date to double format of date

I have date format in following format in a data frame:
Jan-85
Apr-99
1-Nov
Feb-96
When I see the typeof(df$col) I get the answer as "integer".
Actually when I see the format in excel it is in m/d/yyyy format. I was trying to convert this to date format in R. All my efforts yielded NA.
I tried parse_date_time function. I tried as.date along with as.character. I tried as.POSIXct but everything is giving me NA.
My trials were as follows and everything was a failure:
as.Date.numeric(df$col,"m%d%Y")
transform(df$col, as.Date(as.character(df$col), "%m%d%Y"))
as.Date(df$col,"m%d%Y")
as.POSIXct.numeric(as.character(loan_new$issue_d), format="%Y%m%d")
as.POSIXct.date(as.character(df$col), format="%Y%m%d")
mdy(df$col)
parse_date_time(df$col,c("mdy"))
How can I convert this to date format? I have used lubridate package for parse_date_time and mdy package.
dput output is below
Label <- factor(c("Apr-08",
"Apr-09", "Apr-10", "Apr-11", "Aug-07", "Aug-08", "Aug-09", "Aug-10",
"Aug-11", "Dec-07", "Dec-08", "Dec-09", "Dec-10", "Dec-11", "Feb-08",
"Feb-09", "Feb-10", "Feb-11", "Jan-08", "Jan-09", "Jan-10", "Jan-11",
"Jul-07", "Jul-08", "Jul-09", "Jul-10", "Jul-11", "Jun-07", "Jun-08",
"Jun-09", "Jun-10", "Jun-11", "Mar-08", "Mar-09", "Mar-10", "Mar-11",
"May-08", "May-09", "May-10", "May-11", "Nov-07", "Nov-08", "Nov-09",
"Nov-10", "Nov-11", "Oct-07", "Oct-08", "Oct-09", "Oct-10", "Oct-11",
"Sep-07", "Sep-08", "Sep-09", "Sep-10", "Sep-11"))
NA is typically what you get when you misspecify the format. Which is what you do. That said, if your data is really looking like the first example you gave, it's impossible to simply convert this to a date. You have two different formats, one being month-year and the other day-month.
If your updated date (i.e. Dec-11) is the correct format, then you use the format argument of as.Date like this:
date <- "Dec-11"
as.Date(date, format = "%b-%d")
# [1] "2017-12-11"
Or on your example data:
as.Date(Label, format = "%b-%d")
# [1] "2017-04-08" "2017-04-09" "2017-04-10" "2017-04-11" "2017-08-07" "2017-08-08"
# [7] "2017-08-09" "2017-08-10" "2017-08-11" "2017-12-07" "2017-12-08" "2017-12-09"
If you want to convert something like Jan-85, you have to decide which day of the month that date should have. Say we just take the first of each month, then you can do:
x <- "Jan-85"
xd <- paste0("1-",x)
as.Date(xd, "%d-%b-%y")
# [1] "1985-01-01"
More information on the format codes can be found on ?strptime
Note that R will automatically add this year as the year. It has to, otherwise it can't specify the date. In case you do not have a day of the month (eg like Jan-85), conversion to a date is impossible because the underlying POSIX algorithms don't have all necessary information.
Also keep in mind that this only works when your locale is set to english. Otherwise you have a big chance your OS won't recognize the month abbreviations correctly. To do so, do eg:
Sys.setlocale(category = "LC_TIME", locale = "English_United Kingdom")
You can later set it back to the original one if you must, or restart your R session to reset the locale settings.
note: Please check carefully which locale notations are valid for your OS. The above example works on Windows, but is not guaranteed on either Linux or Mac.
Why you see integer
The fact that these string values are of integer type, is due to the fact that R automatically convert character vectors to factors when reading in a data frame. So typeof() returns integer because that's the internal representation of a factor.

R datetime format issues

I am currently trying to determine the time and date on the observations in my dataset.
The date/timestamp is as follows:
1458024601.18659
1458024660.818
The observation are recorded ever minute.
I am trying to convert the above date/time stamp into something for understandable/ interpretable.
Could you please help me with this issue.
Many thanks.
Looks like seconds, but seconds starting from when? Typically, 1970-01-01:
> x = 1458024601.18659
> as.POSIXct(x, origin="1970-01-01")
[1] "2016-03-15 06:50:01 GMT"
So if you are expecting that timestamp to be that time, we've got the origin right.
If you are expecting a date in 1946, then origin="1900-01-01" is probably what you want.
Since, according to your most recent post, the data is stored as a factor class, some further manipulations are required.
To convert the factor column into the required numeric class, this modification of #Spacedman's answer should work:
as.POSIXct(as.numeric(as.character(all_prices$timestamp)), origin="1970-01-01")
Your solution is perfect, except that i have another issue :(
I tried to run this code on the data.frame that i have. Unfortunately, i keep getting this following error after running the code.
dates <- as.POSIXct(all_prices$timestamp, origin="2016-03-15")
Error in as.POSIXlt.character(as.character(x), ...) :
character string is not in a standard unambiguous format
Data "all_prices" is a data.frame.
class(all_prices)
[1] "data.frame"
data "all_prices$timestamp" is a factor.
class(all_prices$timestamp)
[1] "factor"

Resources