Dealing with date-time string that has day of the week - r

I have a date-time string that has day of the week and some meta-data in the string.
d <- "Fri, 14 Jul 2000 06:59:00 -0700 (PDT)"
I need to convert it into a date-time object (e.g. I have a column of these in a data.table) for further analysis. I have dealt with this using regexes to strip off meta-data from the string. Is there a better approach?
What I have is:
m <- regexpr("^\\w+,\\s+", d, perl=TRUE)
regmatches(d, m)
m <- regexpr("\\s-?\\d+\\s\\(\\w+\\)$", d, perl=TRUE)
regmatches(d, m)
ds <- sub("^\\w+,\\s+", "", d)
ds <- sub("\\s-?\\d+\\s\\(\\w+\\)$", "", ds)
Now I can convert this to date-time objects of class Date, Posixlt or Posixct for use in analysis.
dd <- strptime(ds, format="%d %b %Y %H:%M:%S")
dd <- as.Date(ds, format="%d %b %Y %H:%M:%S")
dd <- as.POSIXct(ds, format="%d %b %Y %H:%M:%S")

I wrote the anytime package to help with (among other things) these silly format strings -- so it heuristically just tries a number of them (and focuses on sane ones).
The input you have here qualifies (and is in fact a pretty common form):
R> anytime("Fri, 14 Jul 2000 06:59:00 -0700 (PDT)")
[1] "2000-07-14 06:59:00 CDT"
R>
We do not currently try to capture the timezone offset information at the end, so you have to deal with that after the fact. The display is in CDT which is my local timezone.
There is some more information about anytime on its webpage.

assuming the format of string is going to be constant across your data :
time = trimws(unlist(strsplit(d, "[,-]"))[2])
#[1] "14 Jul 2000 06:59:00"
tz = unlist(strsplit(d, "[,-]"))[3]
tz = gsub("[^A-Z]", "", tz)
#[1] "PDT"
> as.Date(time, format = "%d %b %Y")
[1] "2000-07-14"
> as.POSIXct(time, format = "%d %b %Y %H:%M:%S") #specify th etimezone with tz
[1] "2000-07-14 06:59:00 IST"

Related

Timestamp conversion in R and calculating Time Difference between 2 Columns of different DFs

I need to calculate time difference in minutes/hours/days etc between 2 Date-Time columns of two dataframes, please find the details below
df1 <- data.frame (Name = c("Aks","Bob","Caty","David"),
timestamp = c("Mon Apr 1 14:23:09 1980", "Sun Jun 12 12:10:21 1975", "Fri Jan 5 18:45:10 1985", "Thu Feb 19 02:26:19 1990"))
df2 <- data.frame (Name = c("Aks","Bob","Caty","David"),
timestamp = c("Apr-01-1980 14:28:00","Jun-12-1975 12:45:10","Jan-05-1985 17:50:30","Feb-19-1990 02:28:00"))
I am facing problem in converting df1$timestamp and df2$timestamp , here POSIXct & as.Date are not working getting error - non numeric argument to binary operator
I need to calculate time diff in mins/hrs or days
One approach is strptime and indicate the appropriate directives in the datetime format:
df1$timestamp2 <- strptime(df1$timestamp, "%a %b %d %H:%M:%S %Y")
df2$timestamp2 <- strptime(df2$timestamp, "%b-%d-%Y %H:%M:%S")
In this case, you have:
%a abbreviated weekday name
%b abbreviated month name
%d day of the month
%H hour, 24-hour clock
%M minute
%S second
%Y year including century
Then you can use difftime to get the difference, and specify the units (in this case, difference expressed in hours):
difftime(df1$timestamp2, df2$timestamp2, units = "hours")
Output
Time differences in hours
[1] -0.08083333 -0.58027778 0.91111111 -0.02805556
If locale-setting prevent correct reading, try:
# Store current locale
orig_locale <- Sys.getlocale("LC_TIME")
Sys.setlocale("LC_TIME", "C")
# Convert to posix-timestamp
df1$timestamp <- as.POSIXct( df1$timestamp, format = "%a %b %d %H:%M:%S %Y")
df2$timestamp <- as.POSIXct( df2$timestamp, format = "%b-%d-%Y %H:%M:%S")
# Restore locale
Sys.setlocale("LC_TIME", orig_locale)
# Calculate difference
df2$timestamp - df1$timestamp
# Time differences in mins
# [1] 4.850000 34.816667 -54.666667 1.683333

How to convert a string into a date-time object

The data is character and I want it to be date-time. I have the cheat sheet with me but there isn't any format that I can use that satisfies the weird date format. Any suggestions?
x <- 'Fri Dec 11 12:10:51 PST 2020'
You can use the anytime package
> library(anytime)
> anytime("Fri Dec 11 12:10:51 PST 2020")
[1] "2020-12-11 12:10:51 CST"
>
> class(anytime("Fri Dec 11 12:10:51 PST 2020"))
[1] "POSIXct" "POSIXt"
>
It has three key advantages:
it can guess the format (as here)
it converts all sorts of input format (incl character, factor, ...)
it is pretty fast (as the parser is C++ from Boost)
It is pretty standard for most methods to ignore the timezone attribute. So the PST became my local time, i.e. Central.
In base R, you could do :
x <- 'Fri Dec 11 12:10:51 PST 2020'
as.POSIXct(x, format = '%a %b %d %T PST %Y')
See ?strptime for detailed format specifications.

Character string with timezone convert to date

I am trying to convert a vector of dates that I read from a csv file using read.table. These were read as a vector of character strings. I am trying to convert it to a date vector using as_date.
The date vector has elements of the below type
dateString
"Wed Dec 11 00:00:00 ICT 2013"
On trying to convert using the below command,
as.Date(dateString,"%a %b %e %H:%M:%S %Z %Y")
Error in strptime(x, format, tz = "GMT") :
use of %Z for input is not supported
What would be the right format to use in strptime? or in as.Date?
Just use the anytime() function from the anytime package:
R> anytime::anytime("Wed Dec 11 00:00:00 ICT 2013")
[1] "2013-12-11 CST"
R>
There is also an utctime() variant to not impose your local time, and much. By now we also had a number of questions here so just search.
And if you want a date, it works the same way:
R> anytime::anydate("Wed Dec 11 00:00:00 ICT 2013")
[1] "2013-12-11"
R>

NA returned while using strptime

I have this data frame which gives me Date and Time columns. I am trying to combine these 2 columns but strptime is returning NA. i want to understand why is it happening?
x <- data.frame(date = "1/2/2007", time = "00:00:02")
y <- strptime(paste(x$date,x$time,sep = " "), format = "%b/%d/%y %H:%M:%S")
We need %m and %Y in place of %b and %y (%b - Abbreviated month name in the current locale on this platform. %y - Year without century (00–99)).
strptime(paste(x$date,x$time,sep = " "), "%m/%d/%Y %H:%M:%S")
#[1] "2007-01-02 00:00:02 IST"
For understanding the format, it is better to check ?strptime
Or we can use mdy_hms from lubridate
library(lubridate)
with(x, mdy_hms(paste(date, time)))
#[1] "2007-01-02 00:00:02 UTC"

R timeDate drops time in coersion

I have dates of the format: "2/9/2016 21:16"
When I attempt to coerce them to a timeDate, I receive the result: [1] [2016-02-03]
I would prefer to not have to write my own string manipulation, but I can and already have, but there has to be a better way. I have a dataframe and I am attempting to do the following:
restData2 <- restData %>%
mutate(year = year(as.timeDate(Date)),
month = month(as.timeDate(Date)),
day = day(as.timeDate(Date)),
timeCategory = converToTimeCategory(Date)
)
Note, that day is not a function in timeDate either. Day of Week and Day of year exist, I need Day of Month.
The data exists in a data frame. The data is basic transaction data.
David, you are confused. R differentiates between internal representation and actual formated display. For all types.
And there is (once again) no need for timeDate, lubridate, or any other wrapper:
R> intxt <- c("2/9/2016 21:16", "2/11/2016 22:23")
R> parsed <- as.POSIXct(intxt, format="%d/%m/%Y %H:%M")
R> parsed
[1] "2016-09-02 21:16:00 CDT" "2016-11-02 22:23:00 CDT"
R> format(parsed, "%d %b %Y at %H:%M")
[1] "02 Sep 2016 at 21:16" "02 Nov 2016 at 22:23"
R>
Here we parse a datetime object into the standard POSIXct, specifying a format. Which can be day-month or month-day; here I picked the former.
Given the parsed object, I first show the default display, and then a custom format string.
Lastly, if you must, you can also convert to timeDate:
R> library(timeDate)
R> as.timeDate(parsed)
GMT
[1] [2016-09-03 02:16:00] [2016-11-03 03:23:00]
R>
Not the timezone adjustment from my local (Central) time.

Resources