I have the following dataframe and am trying to calculate the difference in minutes between dates in vectors and store it into a new one.
Reportnumber OpenedDate
00001 22/1/2016 5:52:12 PM
00002 20/1/2016 4:15:06 PM
00003 18/1/2016 1:09:46 PM
00004 15/1/2016 10:47:40 AM
00005 15/1/2016 10:32:37 AM
00006 14/1/2016 2:13:48 PM
00007 14/1/2016 11:12:29 AM
00008 14/1/2016 10:17:30 AM
00009 12/1/2016 2:25:03 PM
Before using difftime to get the difference, I'm trying to convert the time to a 24 hour format and strip AM/PM, I'm doing the following:
dataset$convertedDate <- as.POSIXct('dataset$OpenedDate', format="%d/%b/%Y %H:%M:%s")
I don't get an error in the console but the dataset$convertedDate vector isn't updated.
Is this the right way to approach the problem?
Update:
Get ready for a facepalm.
Look closely at the call you are making:
dataset$convertedDate <- as.POSIXct('dataset$OpenedDate', format="%d/%b/%Y %H:%M:%s")
You are passing in 'dataset$OpenedDate' instead of dataset$OpenedDate. In other words, you are actually passing in a text string to as.POSIXct()! I verified that passing in a string to as.POSIXct() indeed returns NA, which is what you are seeing.
You were also missing a format parameter for PM (%p). Try the following, which assumes that the timezone is UTC (which you can change to fit your needs):
as.POSIXct(df$OpenedDate, format="%d/%m/%Y %I:%M:%S %p", tz="UTC")
Output:
[1] "2016-01-22 17:52:12 UTC" "2016-01-20 16:15:06 UTC"
Data:
df <- data.frame(Reportnumber=c('00001', '00002'),
OpenedDate=c('22/1/2016 5:52:12 PM', '20/1/2016 4:15:06 PM'),
ClosedDate=c('25/1/2016 1:35:05 PM', '20/1/2016 4:30:06 PM'))
Related
I have a spreadsheet that has the date and 12 hour time in one column and then another column that specifies AM/PM. How do I combine these files so I can use them as a POSIXct/POSIXlt/POSIXt object?
The spreadsheet has the time column as
DAY/MONTH/YEAR HOUR:MINUTE
while hour is in a 12-hour format from a roster of check in times. The other column just says AM or PM. I am trying to combine these columns and then convert them to 24 hour time and use it as a POSIXt object.
Example of what I see:
Timesheet
AM-PM
8/10/2022 9:00
AM
8/10/2022 9:01
AM
And this continues until 5:00 PM (same day)
What I have tried so far:
Timesheet %>%
unite("timestamp_24", c("timestamp_12","am_pm"),na.rm=FALSE)%>%
mutate(timestamp=(as.POSIXct(timestamp, format = "%d-%m-%Y %H:%M"))
This does not work as when they are combined it gives:
Timestamp_24
DAY/MONTH/YEAR HOUR:MINUTE_AM
and I think this is the crux of the issue because then as.POSIXct can't read it.
Here's my solution. The approach is simply to extract the hour, +12 if it is PM, then format correctly with as.POSXct (you need to use / rather than - in the format argument if the your dataframe is at is appears in your example).
I've done that with stringr::str_replace() which allows you to set a function for the replace argument.
Timesheet %>%
mutate(
time_24hr = stringr::str_replace(
time,
"\\d+(?=:..$)",
function(x) {
hr <- as.numeric(x) %% 12
ifelse(am_pm == "PM", hr + 12, hr)
}
),
time_24hr = as.POSIXct(time_24hr, format = "%d/%m/%Y %H:%M")
)
This is the result:
time am_pm time_24hr
1 8/10/2022 9:00 AM 2022-10-08 09:00:00
2 8/10/2022 9:01 PM 2022-10-08 21:01:00
3 8/10/2022 12:01 PM 2022-10-08 12:01:00
4 8/10/2022 12:01 AM 2022-10-08 00:01:00
EDIT. realized that this didn't work for 11 and 12 as the regex was only extracting the first character before :. Also wasn't working for 12:xx times. Fixed both. Added test cases to show that these work now.
How can convert a timestamp to local time and date?
I have tried the following options for this specific timestamp: 1594598065352:
x <- as.POSIXct(as.numeric(as.character('1594598065352'))/1000, origin="1970-01-01", tz="UTC")
x
"2020-07-12 23:54:25 UTC"
x <- as.POSIXct(as.numeric(as.character('1594598065352'))/1000, origin="1970-01-01", tz="DST")
x
"2020-07-12 23:54:25 DST"
x <- as.POSIXct(as.numeric(as.character('1594598065352'))/1000, origin="1970-01-01", tz="GMT")
x
"2020-07-12 23:54:25 GMT"
I get the same result in all options:
2020-07-12 23:54:25
According to this timestamp converter page, I should get this below in my local time zone:
Monday, 13 July 2020 01:54:25.352 GMT+02:00 DST
Any ideas of how I can get this right in R?
You get the same output in all 3 cases because 1) and 3) are same (UTC and GMT) whereas 2) (DST) is not a valid tz value.
If you don't mention the timezone value it should by default give you time in your local time zone.
as.POSIXct(as.numeric('1594598065352')/1000, origin="1970-01-01")
Alternatively, you can run OlsonNames() in your console to get list of valid timezones in R. 'Etc/GMT-2' seems to be the one for you.
as.POSIXct(as.numeric('1594598065352')/1000,origin="1970-01-01", tz = 'Etc/GMT-2')
#[1] "2020-07-13 01:54:25 +02"
I have a date column in a dataframe. I have read this df into R using openxlsx. The column is 'seen' as a character vector when I use typeof(df$date).
The column contains date information in several formats and I am looking to get this into the one format.
#Example
date <- c("43469.494444444441", "12/31/2019 1:41 PM", "12/01/2019 16:00:00")
#What I want -updated
fixed <- c("2019-04-01", "2019-12-31", "2019-12-01")
I have tried many work arounds including openxlsx::ConvertToDate, lubridate::parse_date_time, lubridate::date_decimal
openxlsx::ConvertToDateso far works best but it will only take 1 format and coerce NAs for the others
update
I realized I actually had one of the above output dates wrong.
Value 43469.494444444441 should convert to 2019-04-01.
Here is one way to do this in two-step. Change excel dates separately and all other dates differently. If you have some more formats of dates that can be added in parse_date_time.
temp <- lubridate::parse_date_time(date, c('mdY IMp', 'mdY HMS'))
temp[is.na(temp)] <- as.Date(as.numeric(date[is.na(temp)]), origin = "1899-12-30")
temp
#[1] "2019-01-04 11:51:59 UTC" "2019-12-31 13:41:00 UTC" "2019-12-01 16:00:00 UTC"
as.Date(temp)
#[1] "2019-01-04" "2019-12-31" "2019-12-01"
You could use a helper function to normalize the dates which might be slightly faster than lubridate.
There are weird origins in MS Excel that depend on platform. So if the data are imported from different platforms, you may want to work woth dummy variables.
normDate <- Vectorize(function(x) {
if (!is.na(suppressWarnings(as.numeric(x)))) # Win excel
as.Date(as.numeric(x), origin="1899-12-30")
else if (grepl("A|P", x))
as.Date(x, format="%m/%d/%Y %I:%M %p")
else
as.Date(x, format="%m/%d/%Y %R")
})
For additional date formats just add another else if. Format specifications can be found with ?strptime.
Then just use as.Date() with usual origin.
res <- as.Date(normDate(date), origin="1970-01-01")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019-01-04" "2019-12-31" "2019-12-01"
class(res)
# [1] "Date"
Edit: To achieve a specific output format, use format, e.g.
format(res, "%Y-%d-%m")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019-04-01" "2019-31-12" "2019-01-12"
format(res, "%Y/%d/%m")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019/04/01" "2019/31/12" "2019/01/12"
To lookup the codes type ?strptime.
I want to read this date 1199145600000, which is saved in JSON, in R.
to convert the numerical representation of the date to a string
but when I type:
as.Date(1199145600000,origin = "1904-01-01")
I get the following:
"-5877641-06-23"
When I should be getting this date
1 Jan 2008 00:00:00 GMT
I tried library(lubridate) still with no success.
Any help will be most appreciated.
Your time value is in milliseconds, so you'll need to divide it by 1000 to get it in seconds. Then you can use as.POSIXct() to convert to a datetime
as.POSIXct(1199145600000/1000, origin = "1970-01-01", tz = "GMT")
# [1] "2008-01-01 GMT"
I am working with csv timestamp data given in the form '%j%Y %H:%M with no leading zeroes. Here are some time stamp examples:
112005 22:00
1292005 6:00
R is reading the first line at the 112th day of the 005th year. How can I make R correctly parse this information?
Code I'm using which doesn't work:
train$TIMESTAMP <- strptime(train$TIMESTAMP, format='%j%Y %H:%M', tz='GMT')
train$hour <- as.numeric(format(train$TIMESTAMP, '%H'))
I don't think there's any simple way to decipher where the day stops and the year starts. Maybe you could split it at something that looks like a relevant year (20XX):
gsub("^(\\d{1,3})(20\\d{2})","\\1 \\2",train$TIMESTAMP)
#[1] "11 2005 22:00" "129 2005 6:00"
and do:
strptime(gsub("^(\\d{1,3})(20\\d{2})","\\1 \\2",train$TIMESTAMP), "%j %Y %H:%M")
#[1] "2005-01-11 22:00:00 EST" "2005-05-09 06:00:00 EST"