I have a file that formats time stamps like 25/03/2011 9:15:00 p.m.
How can I parse this text to a Date-Time class with either strptime or as.POSIXct?
Here is what almost works:
> as.POSIXct("25/03/2011 9:15:00", format="%d/%m/%Y %I:%M:%S", tz="UTC")
[1] "2011-03-25 09:15:00 UTC"
Here is what is not working, but I'd like to have working:
> as.POSIXct("25/03/2011 9:15:00 p.m.", format="%d/%m/%Y %I:%M:%S %p", tz="UTC")
[1] NA
I'm using R version 2.13.2 (2011-09-30) on MS Windows. My working locale is "C":
Sys.setlocale("LC_TIME", "C")
It appears the AM/PM indicator can't include punctuation. Try it after removing the punctuation:
td <- "25/03/2011 9:15:00 p.m."
tdClean <- gsub("(.)\\.?[Mm]\\.?","\\1m",td)
as.POSIXct(tdClean, format="%d/%m/%Y %I:%M:%S %p", tz="UTC")
# [1] "2011-03-25 21:15:00 UTC"
Just came across this, as another option you can use stringr package.
library(stringr)
data$date2 <- str_sub(data$date, end = -4)
# this removes the punctuation but holds onto the A/P values
data$date2 <- str_c(data$date2, 'm')
# adds the required m
Related
I have a date column in a dataframe. I have read this df into R using openxlsx. The column is 'seen' as a character vector when I use typeof(df$date).
The column contains date information in several formats and I am looking to get this into the one format.
#Example
date <- c("43469.494444444441", "12/31/2019 1:41 PM", "12/01/2019 16:00:00")
#What I want -updated
fixed <- c("2019-04-01", "2019-12-31", "2019-12-01")
I have tried many work arounds including openxlsx::ConvertToDate, lubridate::parse_date_time, lubridate::date_decimal
openxlsx::ConvertToDateso far works best but it will only take 1 format and coerce NAs for the others
update
I realized I actually had one of the above output dates wrong.
Value 43469.494444444441 should convert to 2019-04-01.
Here is one way to do this in two-step. Change excel dates separately and all other dates differently. If you have some more formats of dates that can be added in parse_date_time.
temp <- lubridate::parse_date_time(date, c('mdY IMp', 'mdY HMS'))
temp[is.na(temp)] <- as.Date(as.numeric(date[is.na(temp)]), origin = "1899-12-30")
temp
#[1] "2019-01-04 11:51:59 UTC" "2019-12-31 13:41:00 UTC" "2019-12-01 16:00:00 UTC"
as.Date(temp)
#[1] "2019-01-04" "2019-12-31" "2019-12-01"
You could use a helper function to normalize the dates which might be slightly faster than lubridate.
There are weird origins in MS Excel that depend on platform. So if the data are imported from different platforms, you may want to work woth dummy variables.
normDate <- Vectorize(function(x) {
if (!is.na(suppressWarnings(as.numeric(x)))) # Win excel
as.Date(as.numeric(x), origin="1899-12-30")
else if (grepl("A|P", x))
as.Date(x, format="%m/%d/%Y %I:%M %p")
else
as.Date(x, format="%m/%d/%Y %R")
})
For additional date formats just add another else if. Format specifications can be found with ?strptime.
Then just use as.Date() with usual origin.
res <- as.Date(normDate(date), origin="1970-01-01")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019-01-04" "2019-12-31" "2019-12-01"
class(res)
# [1] "Date"
Edit: To achieve a specific output format, use format, e.g.
format(res, "%Y-%d-%m")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019-04-01" "2019-31-12" "2019-01-12"
format(res, "%Y/%d/%m")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019/04/01" "2019/31/12" "2019/01/12"
To lookup the codes type ?strptime.
I am trying to parse a column into two variables, "date" and "time" in R. I have installed the lubridate library.
The current csv file has the following timestamp format: yyyyMMdd hh:mm a (e.g. '20170423 12:26 AM') and imports the column as character.
I'm trying this but its not working on my current variable 'Tran_Date' (below code doesn't work):
transactions_file <- as_date('Tran_Date', "%Y%m%d %H:%M %p")
I like the base R solution like this,
Tran_Date <- as.POSIXct("20170423 12:26 AM", format = "%Y%m%d %I:%M %p")
Tran_Date
#> [1] "2017-04-23 00:26:00 CEST"
transactions_file <- data.frame(
date = format(Tran_Date,"%m/%d/%Y"),
time = format(Tran_Date,"%H:%M")) # possibly add %p if you use %I
transactions_file
#> date time
#> 1 04/23/2017 00:26
with lubridate,
# install.packages(c("tidyverse"), dependencies = TRUE)
library(lubridate)
Tran_Date <- ymd_hm("20170423 12:26 AM")
then you could recycle the above or use some combination of day(Tran_Date) cbind paste with month(Tran_Date) and similar with paste(hour(Tran_Date), minute(Tran_Date), sep = ":") or most likely something smarter.
I have several variables that exist in the following format:
/Date(1353020400000+0100)/
I want to convert this format to ddmmyyyy. I found this solution for the same problem using php, but I don't know anything about php, so I'm unable to convert that solution to what I need, which is a solution that I can use in R.
Any suggestions?
Thanks.
If the format is milliseconds since the epoch then anytime() or as.POSIXct() can help you:
R> anytime(1353020400000/1000)
[1] "2012-11-15 17:00:00 CST"
R> anytime(1353020400.000)
[1] "2012-11-15 17:00:00 CST"
R>
anytime() converts to local time, which is Chicago for me. You would have to deal with the UTC offset separately.
Base R can do it too, but you need the dreaded origin:
R> as.POSIXct(1353020400.000, origin="1970-01-01")
[1] "2012-11-15 17:00:00 CST"
R>
As far as I can tell from the linked question, this is milliseconds since the epoch:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "[()+]")
as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
#[1] "2012-11-15 23:00:00 UTC"
If you want to pick up the timezone difference as well, here's an attempt:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "(?=[+-])|[()]", perl=TRUE)
tzo <- sapply(spl, function(x) paste(x[3:4],collapse="") )
dt <- as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
as.POSIXct(paste(format(dt), tzo), tz="UTC", format = '%F %T %z')
#[1] "2012-11-15 22:00:00 UTC"
The package lubridate can come to the rescue as follows:
as.Date("1970-01-01") + lubridate::milliseconds(1353020400000)
Read: Number of milliseconds since epoch (= 1. January 1970, UTC + 0)
A parsing function can now be made using regular expressions:
parse.myDate <- function(text) {
num <- as.numeric(stringr::str_extract(text, "(?<=/Date\\()\\d+"))
as.Date("1970-01-01") + lubridate::milliseconds(num)
}
finally, format the Date with
format(theDate, "%d/%m/%Y %H:%M")
If you also need the time zone information, you can use this instead:
parse.myDate <- function(text) {
parts <- stringr::str_match(text, "^/Date\\((\\d+)([+-])(\\d{4})\\)/$")
as.POSIXct(as.numeric(parts[,2])/1000, origin = "1970-01-01", tz = paste0("Etc/GMT", parts[,3], as.integer(parts[,4])/100))
}
I have two columns of differently formatted date strings that I need to make the same format,
the first is in the form:
vt_dev_date = "6/20/2016 7:45"
the second is in the form
vt_other = "2016-06-14 20:21:29.0"
If could get them both in the same form down to the minute that would be great. I have tried
strptime(vt_dev_date,format = "%Y-%m-%d %H:%M")
strptime(vt_other,"%Y-%m-%d %H:%M")
and for the second one, it works and I get
"2016-06-14 20:21:00 EDT"
But for the first string, it seems that because the month and hour are not padded with zeros, none of the formating tricks will work, becuase if I try
test_string <- "06/20/2016 07:45"
strptime(test_string,format = "%m/%d/%Y %H:%M")
[1] "2016-06-20 07:45:00 EDT"
It works, but I dont think going through every row in the column and padding each date is a great option. Any help would be appreciated.
Thanks,
josh
How about using lubridate , as follows :
library(lubridate)
x <- c("6/20/2016 7:45","2016-06-14 20:21:29.0")
> x
[1] "6/20/2016 7:45" "2016-06-14 20:21:29.0"
> parse_date_time(x, orders = c("mdy hm", "ymd hms"))
[1] "2016-06-20 07:45:00 UTC" "2016-06-14 20:21:29 UTC"
>
I have a vector of character strings looking like this. I want to convert them to dates. The characters for time-zone is posing trouble.
> a
[1] "07/17/2014 5:01:22 PM EDT" "7/17/2014 2:01:05 PM PDT" "07/17/2014 4:00:48 PM CDT" "07/17/2014 3:05:16 PM MDT"
If I use: strptime(a, "%d/%m/%Y %I:%M:%S %p %Z") I get [1] NA
If i omit the "%Z" for time-zone, and use this:
strptime(a, "%m/%d/%Y %I:%M:%S %p", tz = "EST5EDT") I get
[1] "2014-07-17 17:01:22 EDT"
Since my strings contain various time zones - PDT, CDT, EDT, MDT , I can't default all time zones to EST5EDT. One way to overcome is split the vector into different vectors for each time-zone, remove the letters PDT / EDT etc. and apply the right timezone with strptime - "EST5EDT" , "CST6CDT" etc. Is there any other way to solve this?
If the date is always the first part of the elements of the character vector and it is always followed by the time, splitting the elements by the whitespaces is a possibility. If only the date is needed:
dates <- sapply(a, function(x) strsplit(x, split = " ")[[1]][1])
dates <- as.Date(as.character(dates), format = "%m/%d/%Y")
[1] "2014-07-17" "2014-07-17" "2014-07-17" "2014-07-17"
If also the time is needed:
datetime <- sapply(a, function(x) paste(strsplit(x, split = " ")[[1]][1:3],
collapse = " "))
datetime <- strptime(as.character(datetime), format = "%m/%d/%Y %I:%M:%S %p")
[1] "2014-07-17 17:01:22 CEST" "2014-07-17 14:01:05 CEST"
You can set a different timezone using the tz argument here.