Im have a time stamp column that I am converting into a POSIXct. The problem is that there are two different formats in the same column, so if I use the more common conversion the other gets converted into NA.
MC$Date
12/1/15 22:00
12/1/15 23:00
12/2/15
12/2/15 1:00
12/2/15 2:00
I use the following code to convert to a POSIXct:
MC$Date <- as.POSIXct(MC$Date, tz='MST', format = '%m/%d/%Y %H:%M')
The results:
MC$Date
15-12-01 22:00:00
15-12-01 23:00:00
NA
15-12-02 01:00:00
15-12-02 02:00:00
I have tried using a logic vector to identify the issue then correct it but can't find an easy solution.
The lubridate package was designed to deal with situations like this.
dt <- c(
"12/1/15 22:00",
"12/1/15 23:00",
"12/2/15",
"12/2/15 1:00",
"12/2/15 2:00"
)
dt
[1] "12/1/15 22:00" "12/1/15 23:00" "12/2/15" "12/2/15 1:00" "12/2/15 2:00"
lubridate::mdy_hm(dt, truncated = 2)
[1] "2015-12-01 22:00:00 UTC" "2015-12-01 23:00:00 UTC" "2015-12-02 00:00:00 UTC"
[4] "2015-12-02 01:00:00 UTC" "2015-12-02 02:00:00 UTC"
The truncated parameter indicates how many formats can be missing.
You may add the tz parameter to specify which time zone to parse the date with if UTC is not suitable.
I think the logic vector approach could work. Maybe in tandem with an temporary vector for holding the parsed dates without clobbering the unparsed ones. Something like this:
dates <- as.POSIXct(MC$Date, tz='MST', format = '%m/%d/%Y %H:%M')
dates[is.na(dates)] <- as.POSIXct(MC[is.na(dates),], tz='MST', format = '%m/%d/%Y')
MC$Date <- dates
Since all of your datetimes are separated with a space between date and time, you could use strsplit to extract only the date part.
extractDate <- function(x){ strsplit(x, split = " " )[[1]][1] }
MC$Date <- sapply( MC$Date, extractDate )
Then go ahead and convert any way you like, without worrying about the time part getting in the way.
Related
I would like to convert a string to time. I have a time field where the string has only four digits and a letter (A or P). There is no colon between the digits showing it is a time. I would like to convert the string, which is 12 hours, to a 24 hour time so I can drop the A and P.
Here is an example:
time = c("1110A", "1120P", "0420P", "0245P")
I'm looking for a time class that loos like this:
Answer= c('11:10', '23:20', '16:20', '14:45')
Any help would be greatly appreciated.
You can use the function strptime to create dates from strings after making one small change to your strings.
time <- c("1110A", "1120P", "0420P", "02:45P")
time <- gsub(":", "", time)
time <- strptime(x = paste0(time, "m"), format = "%I%M%p")
paste is needed for strptime to parse with the format that we've given it. %I is an hour (00-24), %M is the minute and %p is for parsing AM/PM.
Once it's parsed as a date, you can use format for pretty printing, or use the normal operators on it like +, -, diff, etc....
strptime gives you a lot of flexibility when parsing dates, but sometimes you have to try a few things when dates are not in a standard format.
We could also use the lubridate functions to parse the format after pasteing the date
library(lubridate)
library(glue)
ymd_hm(glue("2018-01-01 {time}M"))
#[1] "2018-01-01 11:10:00 UTC" "2018-01-01 23:20:00 UTC"
#[3] "2018-01-01 16:20:00 UTC" "2018-01-01 14:45:00 UTC"
In your question, you say that you want to be able to subtract these times. I think it makes the most sense to convert it to a POSIXct object. If you want a specific day/month/year you need to append it to your string like below, otherwise you can not specify one and it will assume the date is today:
date2 = as.POSIXct(paste0("01-01-2018 ", time, "m"), format = "%m-%d-%Y %I%M%p")
date2
#[1] "2018-01-01 11:10:00 EST" "2018-01-01 23:20:00 EST" "2018-01-01 16:20:00 EST" "2018-01-01 14:45:00 EST"
I have a date column in a dataframe. I have read this df into R using openxlsx. The column is 'seen' as a character vector when I use typeof(df$date).
The column contains date information in several formats and I am looking to get this into the one format.
#Example
date <- c("43469.494444444441", "12/31/2019 1:41 PM", "12/01/2019 16:00:00")
#What I want -updated
fixed <- c("2019-04-01", "2019-12-31", "2019-12-01")
I have tried many work arounds including openxlsx::ConvertToDate, lubridate::parse_date_time, lubridate::date_decimal
openxlsx::ConvertToDateso far works best but it will only take 1 format and coerce NAs for the others
update
I realized I actually had one of the above output dates wrong.
Value 43469.494444444441 should convert to 2019-04-01.
Here is one way to do this in two-step. Change excel dates separately and all other dates differently. If you have some more formats of dates that can be added in parse_date_time.
temp <- lubridate::parse_date_time(date, c('mdY IMp', 'mdY HMS'))
temp[is.na(temp)] <- as.Date(as.numeric(date[is.na(temp)]), origin = "1899-12-30")
temp
#[1] "2019-01-04 11:51:59 UTC" "2019-12-31 13:41:00 UTC" "2019-12-01 16:00:00 UTC"
as.Date(temp)
#[1] "2019-01-04" "2019-12-31" "2019-12-01"
You could use a helper function to normalize the dates which might be slightly faster than lubridate.
There are weird origins in MS Excel that depend on platform. So if the data are imported from different platforms, you may want to work woth dummy variables.
normDate <- Vectorize(function(x) {
if (!is.na(suppressWarnings(as.numeric(x)))) # Win excel
as.Date(as.numeric(x), origin="1899-12-30")
else if (grepl("A|P", x))
as.Date(x, format="%m/%d/%Y %I:%M %p")
else
as.Date(x, format="%m/%d/%Y %R")
})
For additional date formats just add another else if. Format specifications can be found with ?strptime.
Then just use as.Date() with usual origin.
res <- as.Date(normDate(date), origin="1970-01-01")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019-01-04" "2019-12-31" "2019-12-01"
class(res)
# [1] "Date"
Edit: To achieve a specific output format, use format, e.g.
format(res, "%Y-%d-%m")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019-04-01" "2019-31-12" "2019-01-12"
format(res, "%Y/%d/%m")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019/04/01" "2019/31/12" "2019/01/12"
To lookup the codes type ?strptime.
I have searched but I could not find out how to convert a date from a character string formatted as follows:
date <- "07-21-2015-09:30AM"
I wanted to use as.Date, but I have not manage to. All I get is the following:
as.Date(date, format="%m-%d-%y-%hAM")
NA
as.Date(dates, format="%m-%d-%y-%h")
NA
If we need the 'date' and 'time', one option is as.POSIXct
as.POSIXct(date, format='%m-%d-%Y-%I:%M%p')
#[1] "2015-07-21 09:30:00 EDT"
You can also use the lubridate package like this:
library('lubridate')
date <- "07-21-2015-09:30AM"
mdy_hm(date)
# "2015-07-21 09:30:00 UTC"
I like strptime for this:
strptime(date, format="%m-%d-%Y-%R%p")
#[1] "2015-07-21 09:30:00 EDT"
And in the case that you needed to see the date in the same format as entered, you can call the related strftime. It doesn't change the internal storage of the variable, rather it changes the format only.
strftime(xx, format="%m-%d-%Y-%R%p")
#[1] "07-21-2015-09:30AM"
I have a column in which time is in the following format: 0815A. I need it to be converted into a time format.
I have tried poxscit but there are some errors.
We can use as.POSIXct specifying the correct format
as.POSIXct(paste0(v1, "M"), format = '%I%M%p')
#[1] "2016-07-27 08:15:00 IST" "2016-07-27 21:20:00 IST"
data
v1 <- c("0815A", "0920P")
My day starts at 2016-03-02 00:00:00. Not 2016-03-02 00:00:01.
How do I get the beginning of the day in POSIXct in local time?
My confusing probably comes from the fact that R sees this as the end-date of 2016-03-01? Given that R uses an ISO 8601?
For example if I try to find the beginning of the day using Sys.Date():
as.POSIXct(Sys.Date(), tz = "CET")
"2016-03-01 01:00:00 CET"
Which is not correct - but are there other ways?
I know I can hack my way out using a simple
as.POSIXct(paste(Sys.Date(), "00:00:00", sep = " "), tz = "CET")
But there has to be a more correct way to do this? Base R preferred.
It's a single command---but you want as.POSIXlt():
R> as.POSIXlt(Sys.Date())
[1] "2016-03-02 UTC"
R> format(as.POSIXlt(Sys.Date()), "%Y-%m-%d %H:%M:%S")
[1] "2016-03-02 00:00:00"
R>
It is only when converting to POSIXct happens that the timezone offset to UTC (six hours for me) enters:
R> as.POSIXct(Sys.Date())
[1] "2016-03-01 18:00:00 CST"
R>
Needless to say by wrapping both you get the desired type and value:
R> as.POSIXct(as.POSIXlt(Sys.Date()))
[1] "2016-03-02 UTC"
R>
Filed under once again no need for lubridate or other non-Base R packages.
Notwithstanding that you understandably prefer base R, a "smart way," for certain meaning of "smart," would be:
library(lubridate)
x <- floor_date(Sys.Date(),"day")
> format(x,"%Y-%m-%d-%H-%M-%S")
[1] "2016-03-02-00-00-00"
From ?floor_date:
floor_date takes a date-time object and rounds it down to the nearest
integer value of the specified time unit.
Pretty handy.
Your example is a bit unclear.
You are talking about a 1 minute difference for the day start, but your example shows a 1 hour difference due to the timezone.
You can try
?POSIXct
to get the functionality explained.
Using Sys.Date() withing POSIXct somehow overwrites your timezone setting.
as.POSIXct(Sys.Date(), tz="EET")
"2016-03-01 01:00:00 CET"
While entering a string gives you
as.POSIXct("2016-03-01 00:00:00", tz="EET")
"2016-03-01 EET"
It looks like 00:00:00 is actually the beginning of the day. You can conclude it from the results of the following 2 inequalities
as.POSIXct("2016-03-02 00:00:02 CET")>as.POSIXct("2016-03-02 00:00:01 CET")
TRUE
as.POSIXct("2016-03-02 00:00:01 CET")>as.POSIXct("2016-03-02 00:00:00 CET")
TRUE
So somehow this is a timezone issue. Notice that 00:00:00 is automatically removed from the as.POSIXct result.
as.POSIXct("2016-03-02 00:00:00 CET")
"2016-03-02 CET"