Converting char to date time - r

In a data.frame, I have a date time stamp in the form:
head(x$time)
[1] "Thu Oct 11 22:18:02 2012" "Thu Oct 11 22:50:15 2012" "Thu Oct 11 22:54:17 2012"
[4] "Thu Oct 11 22:43:13 2012" "Thu Oct 11 22:41:18 2012" "Thu Oct 11 22:15:19 2012"
Everytime I try to convert it with as.Date, lubridate, or zoo I get NAs or Errors.
What is the way to convert this time to a readable form?
I've tried:
Time<-strptime(x$time,format="&m/%d/%Y %H:$M")
x$minute<-parse_date_time(x$time)
x$minute<-mdy(x$time)
x$minute<-as.Date(x$time,"%m/%d/%Y %H:%M:%S")
x$minute<-as.time(x$time)
x$minute<-as.POSIXct(x$time,format="%H:%M")
x$minute<-minute(x$time)

What you really want is strptime(). Try something like:
strptime(x$time, "%a %b %d %H:%M:%S %Y")
As an example of the interesting things you can do with strptime(), consider the following:
thedate <- "I came to your house at 11:45 on January 21, 2012."
strptime(thedate, "I came to your house at %H:%M on %B %d, %Y.")
# [1] "2012-01-21 11:45:00"

Another option is to use lubridate::parse_date_time():
library(lubridate)
parse_date_time(x$time, "%a %b %d %H:%M:%S %Y")
Or more simply:
parse_date_time(x$time, "abdHMSY")
From the docs:
It differs from base::strptime() in two respects. First, it allows specification of the order in which the formats occur without the need to include separators and % prefix. Such a formating argument is refered to as "order". Second, it allows the user to specify several format-orders to handle heterogeneous date-time character representations.
The docs contain all the formats (the "abdHMSY" etc.) recognized by lubridate.

Related

Change character to date with different format (Win and Mac)

I have several date variables in a data.frame.
They look for example like this:
[1] "10/14/18 17:55:28" "10/15/18 19:27:56"
[3] "11/04/18 15:47:46" "Thu Feb 7 14:51:55 2019"
[5] "Thu Feb 7 17:14:15 2019" "Thu Feb 7 15:46:09 2019"
[7] "Thu Feb 7 11:42:27 2019" "Thu Feb 7 13:24:16 2019"
[9] "Thu Feb 7 18:02:29 2019" "Mon Oct 15 08:48:43 2018"
[11] "10/17/18 17:08:38" "12/08/18 08:08:11"
[13] "10/11/18 21:25:30" "10/14/18 19:15:30"
[15] "10/16/18 11:18:01" "10/16/18 18:19:27"
[17] "Tue Oct 16 19:49:24 2018" "Wed Oct 17 21:36:32 2018"
[19] "Sat Oct 13 11:22:35 2018" "Fri Dec 7 17:12:33 2018"
At the moment this is a character variable. I want to change it with as.Date to substract the variables from each other.
I already found this:
as.Date( DATE$Sess1, format = "%m/%d/%y")
I would prefer to keep not only the date but also the time.
The real problem is that they include Apple and Windows format which makes it even more complicated.
I would prefer dplyr solutions ;)
You can use lubridates parse_date_time and include all the formats that it could take.
x <- c("10/14/18 17:55:28" , "10/15/18 19:27:56" ,
"11/04/18 15:47:46" , "Thu Feb 7 14:51:55 2019",
"Thu Feb 7 17:14:15 2019", "Thu Feb 7 15:46:09 2019")
lubridate::parse_date_time(x,c('mdyT', 'amdTY'))
#[1] "2018-10-14 17:55:28 UTC" "2018-10-15 19:27:56 UTC" "2018-11-04 15:47:46 UTC"
#[4] "2019-02-07 14:51:55 UTC" "2019-02-07 17:14:15 UTC" "2019-02-07 15:46:09 UTC"
Read ?parse_date_time to know different format details.
To get the dates, you can wrap as.Date around it.
as.Date(lubridate::parse_date_time(x,c('mdyT', 'amdTY')))
#[1] "2018-10-14" "2018-10-15" "2018-11-04" "2019-02-07" "2019-02-07" "2019-02-07"
For keeping the time, it's best to use a different date format, e.g. POSIXlt or POSIXct. You can also extend the format string to include the time (e.g. format = "%m/%d/%y %H:%M:%S") - see https://astrostatistics.psu.edu/su07/R/html/base/html/strptime.html for more details on these codes.
as.POSIXlt(DATE$Sess1, format = "%m/%d/%y %H:%M:%S")
As for handling different formats, because the ones you have aren't unambiguous on their own, I suggest having a vector of possible formats, then trying each in turn until one works.
If you're using the tidyverse, use {lubridate} to reformat. There are two different date/time formats in your example, so you'll need to format them twice.
lubridate::as_datetime(DATE$Sess1, format = "%a %b %e %H:%M:%S %Y")
and then for all the NA results...
lubridate::as_datetime(DATE$Sess1, format = "%m/%d/%y %H:%M:%S")

as.Date wont convert full written months properly [duplicate]

When I try to parse a timestamp in the following format: "Thu Nov 8 15:41:45 2012", only NA is returned.
I am using Mac OS X, R 2.15.2 and Rstudio 0.97.237. The language of my OS is Dutch: I presume this has something to do with it.
When I try strptime, NA is returned:
var <- "Thu Nov 8 15:41:45 2012"
strptime(var, "%a %b %d %H:%M:%S %Y")
# [1] NA
Neither does as.POSIXct work:
as.POSIXct(var, "%a %b %d %H:%M:%S %Y")
# [1] NA
I also tried as.Date on the string above but without %H:%M:%S components:
as.Date("Thu Nov 8 2012", "%a %b %d %Y")
# [1] NA
Any ideas what I could be doing wrong?
I think it is exactly as you guessed, strptime fails to parse your date-time string because of your locales. Your string contains both abbreviated weekday (%a) and abbreviated month name (%b). These time specifications are described in ?strptime:
Details
%a: Abbreviated weekday name in the current locale on this
platform
%b: Abbreviated month name in the current locale on this platform.
"Note that abbreviated names are platform-specific (although the
standards specify that in the C locale they must be the first three
letters of the capitalized English name:"
"Knowing what the abbreviations are is essential if you wish to use
%a, %b or %h as part of an input format: see the examples for
how to check."
See also
[...] locales to query or set a locale.
The issue of locales is relevant also for as.POSIXct, as.POSIXlt and as.Date.
From ?as.POSIXct:
Details
If format is specified, remember that some of the format
specifications are locale-specific, and you may need to set the
LC_TIME category appropriately via Sys.setlocale. This most often
affects the use of %b, %B (month names) and %p (AM/PM).
From ?as.Date:
Details
Locale-specific conversions to and from character strings are used
where appropriate and available. This affects the names of the days
and months.
Thus, if weekdays and month names in the string differ from those in the current locale, strptime, as.POSIXct and as.Date fail to parse the string correctly and NA is returned.
However, you may solve this issue by changing the locales:
# First save your current locale
loc <- Sys.getlocale("LC_TIME")
# Set correct locale for the strings to be parsed
# (in this particular case: English)
# so that weekdays (e.g "Thu") and abbreviated month (e.g "Nov") are recognized
Sys.setlocale("LC_TIME", "en_GB.UTF-8")
# or
Sys.setlocale("LC_TIME", "C")
#Then proceed as you intended
x <- "Thu Nov 8 15:41:45 2012"
strptime(x, "%a %b %d %H:%M:%S %Y")
# [1] "2012-11-08 15:41:45"
# Then set back to your old locale
Sys.setlocale("LC_TIME", loc)
With my personal locale I can reproduce your error:
Sys.setlocale("LC_TIME", loc)
# [1] "fr_FR.UTF-8"
strptime(var,"%a %b %d %H:%M:%S %Y")
# [1] NA
Was just messing around with same problem, and found this solution to be much cleaner because there is no need to change any of system settings manually, because there is a wrapper function doing this job in the lubridate package, and all you have to do is set the argument locale:
date <- c("23. juni 2014", "1. november 2014", "8. marts 2014", "16. juni 2014", "12. december 2014", "13. august 2014")
df$date <- dmy(df$Date, locale = "Danish")
[1] "2014-06-23" "2014-11-01" "2014-03-08" "2014-06-16" "2014-12-12" "2014-08-13"

Extract time stamps from string and convert to R POSIXct object

Currently, my dataset has a time variable (factor) in the following format:
weekday month day hour min seconds +0000 year
I don't know what the "+0000" field is but all observations have this. For example:
"Tues Feb 02 11:05:21 +0000 2018"
"Mon Jun 12 06:21:50 +0000 2017"
"Wed Aug 01 11:24:08 +0000 2018"
I want to convert these values to POSIXlt or POSIXct objects(year-month-day hour:min:sec) and make them numeric. Currently, using as.numeric(as.character(time-variable)) outputs incorrect values.
Thank you for the great responses! I really appreciate a lot.
Not sure how to reproduce the transition from factor to char, but starting from that this code should work:
t <- unlist(strsplit(as.character("Tues Feb 02 11:05:21 +0000 2018")," "))
strptime(paste(t[6],t[2],t[3], t[4]),format='%Y %b %d %H:%M:%S')
PS: More on date formats and conversion: https://www.stat.berkeley.edu/~s133/dates.html
For this problem you can get by without using lubridate. First, to extract individual dates we can use regmatches and gregexpr:
date_char <- 'Tue Feb 02 11:05:21 +0000 2018 Mon Jun 12 06:21:50 +0000 2017'
ptrn <- '([[:alpha:]]{3} [[:alpha:]]{3} [[:digit:]]{2} [[:digit:]]{2}\\:[[:digit:]]{2}\\:[[:digit:]]{2} \\+[[:digit:]]{4} [[:digit:]]{4})'
date_vec <- unlist( regmatches(date_char, gregexpr(ptrn, date_char)))
> date_vec
[1] "Tue Feb 02 11:05:21 +0000 2018" "Mon Jun 12 06:21:50 +0000 2017"
You can learn more about regular expressions here.
In the above example +0000 field is the UTC offset in hours e.g. it would be -0500 for EST timezone. To convert to R date-time object:
> as.POSIXct(date_vec, format = '%a %b %d %H:%M:%S %z %Y', tz = 'UTC')
[1] "2018-02-02 11:05:21 UTC" "2017-06-12 06:21:50 UTC"
which is the desired output. The formats can be found here or you can use lubridate::guess_formats(). If you don't specify the tz, you'll get the output in your system's time zone (e.g. for me that would be EST). Since the offset is specified in the format, R correctly carries out the conversion.
To get numeric values, the following works:
> as.numeric(as.POSIXct(date_vec, format = '%a %b %d %H:%M:%S %z %Y', tz = 'UTC'))
[1] 1517569521 1497248510
Note: this is based on uniform string structure. In the OP there was Tues instead of Tue which wouldn't work. The above example is based on the three-letter abbreviation which is the standard reporting format.
If however, your data is a mix of different formats, you'd have to extract individual time strings (customized regexes, of course), then use lubridate::guess_formats() to get the formats and then use those to carry out the conversion.
Hope this is helpful!!

strptime does not convert certain dates [duplicate]

When I try to parse a timestamp in the following format: "Thu Nov 8 15:41:45 2012", only NA is returned.
I am using Mac OS X, R 2.15.2 and Rstudio 0.97.237. The language of my OS is Dutch: I presume this has something to do with it.
When I try strptime, NA is returned:
var <- "Thu Nov 8 15:41:45 2012"
strptime(var, "%a %b %d %H:%M:%S %Y")
# [1] NA
Neither does as.POSIXct work:
as.POSIXct(var, "%a %b %d %H:%M:%S %Y")
# [1] NA
I also tried as.Date on the string above but without %H:%M:%S components:
as.Date("Thu Nov 8 2012", "%a %b %d %Y")
# [1] NA
Any ideas what I could be doing wrong?
I think it is exactly as you guessed, strptime fails to parse your date-time string because of your locales. Your string contains both abbreviated weekday (%a) and abbreviated month name (%b). These time specifications are described in ?strptime:
Details
%a: Abbreviated weekday name in the current locale on this
platform
%b: Abbreviated month name in the current locale on this platform.
"Note that abbreviated names are platform-specific (although the
standards specify that in the C locale they must be the first three
letters of the capitalized English name:"
"Knowing what the abbreviations are is essential if you wish to use
%a, %b or %h as part of an input format: see the examples for
how to check."
See also
[...] locales to query or set a locale.
The issue of locales is relevant also for as.POSIXct, as.POSIXlt and as.Date.
From ?as.POSIXct:
Details
If format is specified, remember that some of the format
specifications are locale-specific, and you may need to set the
LC_TIME category appropriately via Sys.setlocale. This most often
affects the use of %b, %B (month names) and %p (AM/PM).
From ?as.Date:
Details
Locale-specific conversions to and from character strings are used
where appropriate and available. This affects the names of the days
and months.
Thus, if weekdays and month names in the string differ from those in the current locale, strptime, as.POSIXct and as.Date fail to parse the string correctly and NA is returned.
However, you may solve this issue by changing the locales:
# First save your current locale
loc <- Sys.getlocale("LC_TIME")
# Set correct locale for the strings to be parsed
# (in this particular case: English)
# so that weekdays (e.g "Thu") and abbreviated month (e.g "Nov") are recognized
Sys.setlocale("LC_TIME", "en_GB.UTF-8")
# or
Sys.setlocale("LC_TIME", "C")
#Then proceed as you intended
x <- "Thu Nov 8 15:41:45 2012"
strptime(x, "%a %b %d %H:%M:%S %Y")
# [1] "2012-11-08 15:41:45"
# Then set back to your old locale
Sys.setlocale("LC_TIME", loc)
With my personal locale I can reproduce your error:
Sys.setlocale("LC_TIME", loc)
# [1] "fr_FR.UTF-8"
strptime(var,"%a %b %d %H:%M:%S %Y")
# [1] NA
Was just messing around with same problem, and found this solution to be much cleaner because there is no need to change any of system settings manually, because there is a wrapper function doing this job in the lubridate package, and all you have to do is set the argument locale:
date <- c("23. juni 2014", "1. november 2014", "8. marts 2014", "16. juni 2014", "12. december 2014", "13. august 2014")
df$date <- dmy(df$Date, locale = "Danish")
[1] "2014-06-23" "2014-11-01" "2014-03-08" "2014-06-16" "2014-12-12" "2014-08-13"

date objects in month day format

I was wondering if there is a way for R to turn this format into any date object. The format is 'month [space] day'. For example: Jan 1 or Jul 29 or Jul 30. I just want those examples to be read as a date object so I can manipulate them.
Yes, use as.Date, but you also have to specify a year:
x <- c("Jan 1", "Jul 29", "Jul 30")
as.Date(paste("2012", x), format="%Y %b %d")
[1] "2012-01-01" "2012-07-29" "2012-07-30"
See ?as.Date for more help on Date objects, and ?strptime for help on the formatting codes.

Resources