Change character to date with different format (Win and Mac) - r

I have several date variables in a data.frame.
They look for example like this:
[1] "10/14/18 17:55:28" "10/15/18 19:27:56"
[3] "11/04/18 15:47:46" "Thu Feb 7 14:51:55 2019"
[5] "Thu Feb 7 17:14:15 2019" "Thu Feb 7 15:46:09 2019"
[7] "Thu Feb 7 11:42:27 2019" "Thu Feb 7 13:24:16 2019"
[9] "Thu Feb 7 18:02:29 2019" "Mon Oct 15 08:48:43 2018"
[11] "10/17/18 17:08:38" "12/08/18 08:08:11"
[13] "10/11/18 21:25:30" "10/14/18 19:15:30"
[15] "10/16/18 11:18:01" "10/16/18 18:19:27"
[17] "Tue Oct 16 19:49:24 2018" "Wed Oct 17 21:36:32 2018"
[19] "Sat Oct 13 11:22:35 2018" "Fri Dec 7 17:12:33 2018"
At the moment this is a character variable. I want to change it with as.Date to substract the variables from each other.
I already found this:
as.Date( DATE$Sess1, format = "%m/%d/%y")
I would prefer to keep not only the date but also the time.
The real problem is that they include Apple and Windows format which makes it even more complicated.
I would prefer dplyr solutions ;)

You can use lubridates parse_date_time and include all the formats that it could take.
x <- c("10/14/18 17:55:28" , "10/15/18 19:27:56" ,
"11/04/18 15:47:46" , "Thu Feb 7 14:51:55 2019",
"Thu Feb 7 17:14:15 2019", "Thu Feb 7 15:46:09 2019")
lubridate::parse_date_time(x,c('mdyT', 'amdTY'))
#[1] "2018-10-14 17:55:28 UTC" "2018-10-15 19:27:56 UTC" "2018-11-04 15:47:46 UTC"
#[4] "2019-02-07 14:51:55 UTC" "2019-02-07 17:14:15 UTC" "2019-02-07 15:46:09 UTC"
Read ?parse_date_time to know different format details.
To get the dates, you can wrap as.Date around it.
as.Date(lubridate::parse_date_time(x,c('mdyT', 'amdTY')))
#[1] "2018-10-14" "2018-10-15" "2018-11-04" "2019-02-07" "2019-02-07" "2019-02-07"

For keeping the time, it's best to use a different date format, e.g. POSIXlt or POSIXct. You can also extend the format string to include the time (e.g. format = "%m/%d/%y %H:%M:%S") - see https://astrostatistics.psu.edu/su07/R/html/base/html/strptime.html for more details on these codes.
as.POSIXlt(DATE$Sess1, format = "%m/%d/%y %H:%M:%S")
As for handling different formats, because the ones you have aren't unambiguous on their own, I suggest having a vector of possible formats, then trying each in turn until one works.

If you're using the tidyverse, use {lubridate} to reformat. There are two different date/time formats in your example, so you'll need to format them twice.
lubridate::as_datetime(DATE$Sess1, format = "%a %b %e %H:%M:%S %Y")
and then for all the NA results...
lubridate::as_datetime(DATE$Sess1, format = "%m/%d/%y %H:%M:%S")

Related

Extract time stamps from string and convert to R POSIXct object

Currently, my dataset has a time variable (factor) in the following format:
weekday month day hour min seconds +0000 year
I don't know what the "+0000" field is but all observations have this. For example:
"Tues Feb 02 11:05:21 +0000 2018"
"Mon Jun 12 06:21:50 +0000 2017"
"Wed Aug 01 11:24:08 +0000 2018"
I want to convert these values to POSIXlt or POSIXct objects(year-month-day hour:min:sec) and make them numeric. Currently, using as.numeric(as.character(time-variable)) outputs incorrect values.
Thank you for the great responses! I really appreciate a lot.
Not sure how to reproduce the transition from factor to char, but starting from that this code should work:
t <- unlist(strsplit(as.character("Tues Feb 02 11:05:21 +0000 2018")," "))
strptime(paste(t[6],t[2],t[3], t[4]),format='%Y %b %d %H:%M:%S')
PS: More on date formats and conversion: https://www.stat.berkeley.edu/~s133/dates.html
For this problem you can get by without using lubridate. First, to extract individual dates we can use regmatches and gregexpr:
date_char <- 'Tue Feb 02 11:05:21 +0000 2018 Mon Jun 12 06:21:50 +0000 2017'
ptrn <- '([[:alpha:]]{3} [[:alpha:]]{3} [[:digit:]]{2} [[:digit:]]{2}\\:[[:digit:]]{2}\\:[[:digit:]]{2} \\+[[:digit:]]{4} [[:digit:]]{4})'
date_vec <- unlist( regmatches(date_char, gregexpr(ptrn, date_char)))
> date_vec
[1] "Tue Feb 02 11:05:21 +0000 2018" "Mon Jun 12 06:21:50 +0000 2017"
You can learn more about regular expressions here.
In the above example +0000 field is the UTC offset in hours e.g. it would be -0500 for EST timezone. To convert to R date-time object:
> as.POSIXct(date_vec, format = '%a %b %d %H:%M:%S %z %Y', tz = 'UTC')
[1] "2018-02-02 11:05:21 UTC" "2017-06-12 06:21:50 UTC"
which is the desired output. The formats can be found here or you can use lubridate::guess_formats(). If you don't specify the tz, you'll get the output in your system's time zone (e.g. for me that would be EST). Since the offset is specified in the format, R correctly carries out the conversion.
To get numeric values, the following works:
> as.numeric(as.POSIXct(date_vec, format = '%a %b %d %H:%M:%S %z %Y', tz = 'UTC'))
[1] 1517569521 1497248510
Note: this is based on uniform string structure. In the OP there was Tues instead of Tue which wouldn't work. The above example is based on the three-letter abbreviation which is the standard reporting format.
If however, your data is a mix of different formats, you'd have to extract individual time strings (customized regexes, of course), then use lubridate::guess_formats() to get the formats and then use those to carry out the conversion.
Hope this is helpful!!

Difficult Date Time Conversion in R

I'm trying to separate this date/time string in R but have not been successful.
Here is an example of the strings:
"Thu Sep 28 02:11:51 +0000 2017"
"Mon Oct 02 19:22:35 +0000 2017"
What is the best way to make this tidy? I've realized this is far beyond my skills.
Try something like this:
as.POSIXct(gsub("\\+0000", '', "Thu Sep 28 02:11:51 +0000 2017"), format = "%a %b %d %H:%M:%S %Y")
which gives "2017-09-28 02:11:51 EDT"

How to convert a character timestamp into a date-time object in R

There is a timestamp variable (i.e. UTC) in my data frame which is a character / string and the date-time format is as follows:-
Fri Aug 10 04:42:47 +0000 2012
How to convert it into a date-time object in R? I tried using the following but it is giving me NAs.
data1$datetime <- as.POSIXct(as.numeric(data1$UTC),origin="1970-01-01",tz="GMT")
This works for your example. See ?strptime for the format codes.
as.POSIXct("Fri Aug 10 04:42:47 +0000 2012",format="%a %b %d %H:%M:%S %z %Y",tz="GMT")
[1] "2012-08-10 04:42:47 GMT"
You can also use parse_date_time from lubridate, which saves you the typing of spaces and % signs:
date_string = "Fri Aug 10 04:42:47 +0000 2012"
library(lubridate)
parse_date_time(date_string, "abdHMSzY", tz = "GMT")
# [1] "2012-08-10 04:42:47 GMT"

converting multiple date formats into one in r

I am working with messy excel file with multiple date formats
2016-10-17T12:38:41Z
Mon Oct 17 08:03:08 GMT 2016
10-Sep-15
13-Oct-09
18-Oct-2016 05:42:26 UTC
I want to convert all of the above in yyyy-mm-dd format. I am using following code for the conversion but lot of values are coming NA.
as.Date(parse_date_time(df$date,c('mdy', 'ymd_hms','a b d HMS y','d b y HMS')))
How can I do it all of them together. I have read other threads on similar case,but nothing seems to work for my case.
Please help
If I add 'dmy' to the list then at least all of the cases in your example are succesfully parsed:
z <- c("2016-10-17T12:38:41Z", "Mon Oct 17 08:03:08 GMT 2016",
"10-Sep-15", "13-Oct-09", "18-Oct-2016 05:42:26 UTC")
library(lubridate)
parse_date_time(z,c('mdy', 'dmy', 'ymd_HMS','a b d HMS y','d b y HMS'))
## [1] "2016-10-17 12:38:41 UTC" "2016-10-17 08:03:08 UTC"
## [3] "2015-09-10 00:00:00 UTC" "2009-10-13 00:00:00 UTC"
## [5] "2016-10-18 05:42:26 UTC"
Your big problem will be the third and fourth elements: are these actually meant to be 'ymd' and 'dmy' respectively? I'm not sure how any logic will let you auto-detect these differences ... out of context, "15 Sep 2010" and "10 September 2015" both seem perfectly reasonable possibilities ...
For what it's worth I also tried the new anytime package - it only handled the first and last element.
Removing the times first makes it possible to specify only three alternatives in orders to parse the sample data in the question. This interprets 10-Sep-15 and 13-Oct-09 as dmy but if you want them interpreted as ymd then uncomment the commented out line:
orders <- c("dmy", "mdy", "ymd")
# orders <- c("ymd", "dmy", "mdy")
as.Date(parse_date_time(gsub("..:..:..", " ", x), orders = orders))
giving:
[1] "2016-10-17" "2016-10-17" "2015-09-10" "2009-10-13" "2016-10-18"
or if the commented out line is uncommented then:
[1] "2016-10-17" "2016-10-17" "2010-09-15" "2013-10-09" "2016-10-18"
Note: The input is:
x <- c("2016-10-17T12:38:41Z ", "Mon Oct 17 08:03:08 GMT 2016", "10-Sep-15",
"13-Oct-09", "18-Oct-2016 05:42:26 UTC")

Converting char to date time

In a data.frame, I have a date time stamp in the form:
head(x$time)
[1] "Thu Oct 11 22:18:02 2012" "Thu Oct 11 22:50:15 2012" "Thu Oct 11 22:54:17 2012"
[4] "Thu Oct 11 22:43:13 2012" "Thu Oct 11 22:41:18 2012" "Thu Oct 11 22:15:19 2012"
Everytime I try to convert it with as.Date, lubridate, or zoo I get NAs or Errors.
What is the way to convert this time to a readable form?
I've tried:
Time<-strptime(x$time,format="&m/%d/%Y %H:$M")
x$minute<-parse_date_time(x$time)
x$minute<-mdy(x$time)
x$minute<-as.Date(x$time,"%m/%d/%Y %H:%M:%S")
x$minute<-as.time(x$time)
x$minute<-as.POSIXct(x$time,format="%H:%M")
x$minute<-minute(x$time)
What you really want is strptime(). Try something like:
strptime(x$time, "%a %b %d %H:%M:%S %Y")
As an example of the interesting things you can do with strptime(), consider the following:
thedate <- "I came to your house at 11:45 on January 21, 2012."
strptime(thedate, "I came to your house at %H:%M on %B %d, %Y.")
# [1] "2012-01-21 11:45:00"
Another option is to use lubridate::parse_date_time():
library(lubridate)
parse_date_time(x$time, "%a %b %d %H:%M:%S %Y")
Or more simply:
parse_date_time(x$time, "abdHMSY")
From the docs:
It differs from base::strptime() in two respects. First, it allows specification of the order in which the formats occur without the need to include separators and % prefix. Such a formating argument is refered to as "order". Second, it allows the user to specify several format-orders to handle heterogeneous date-time character representations.
The docs contain all the formats (the "abdHMSY" etc.) recognized by lubridate.

Resources