I have a variable that needs to be converted to military time. This variable is very messy because it lacks consistency in the format.
Here is an example of what might be found in the variable.
x <- c("0.9305555555555547", "15:20 Found", "10:00:00 AM Found", "0.125", "Found 1525")
So far I had some success in getting everything in a more consistent format with RegEx:
x <- str_extract(x, "[0-9]+.+[0-9]|[0-9][0-9][:][0-9][0-9]")
x <- str_remove(x, "[:]")
x <- str_remove(x, "[:][0-9][0-9]$")
As you can see I get: "0.9305555555555547", "1520", "1000", "0.125", "1525"
The problem is the decimals need to be multiplied by 2400 to get back to military time but I also don't want to multiply integers (since those are already in military time).
x is essentially a variable in a dataframe.
I was thinking about using if/else logic but I don't know how to implement that.
To clarify, I want:
Input: "0.9305555555555547", "15:20 Found", "10:00:00 AM Found", "0.125", "Found 15:25"
Output: "2233", "1520", "1000", "0300", "1525"
After the pre-processing you did with your regexes, you can implement if/else logic here by using str_detect()
x <- ifelse(str_detect(x, "\\."),
as.integer(as.numeric(x) * 2400),
as.integer(x)) %>%
sprintf("%04d", .)
This will return your desired output as character
Then you could do something like this to parse it to POSIXct
x <- as.POSIXct(x,
format = "%H%M",
origin = "1970-01-01",
tz = "UTC")
We should extend the regex for AM/PM indicators so that the forces do not miss each other. Next, in subsets we handle decimal time, imperial time, 24h time, and return the result.
milt <- function(x) {
u <- trimws(gsub('\\D*(\\d*\\W?[AP]?M?)\\D*', '\\1', x))
u[grep('\\.', u)] <- sprintf('%04d', round(as.double(u[grep('\\.', u)])*2400))
u[grep('[AP]M', u)] <- strftime(strptime(u[grep('[AP]M', u)], '%I:%M:%S %p'), '%H%M')
u[grep(':', u)] <- gsub(':', '', u[grep(':', u)] )
return(u)
}
milt(x)
# [1] "2233" "1520" "1000" "2200" "0300" "1525" "0000" "1020"
Data:
x <- c("0.9305555555555547", "15:20 Found", "10:00:00 AM Found",
"10:00:00 PM Found", "0.125", "Found 1525", "0000", "10:20")
I followed your logic literally and got the exact same result.
Numeric conversion
Multiply by 2400 and back to character
for loop to detect dots in the character and delete after that
for loop to put 0 in front of numbers with less than 4 characters
x <- c("0.9305555555555547", "15:20 Found", "10:00:00 AM Found", "0.125", "Found 1525")
x <- str_extract(x, "[0-9]+.+[0-9]|[0-9][0-9][:][0-9][0-9]")
x <- str_remove(x, "[:]")
x <- str_remove(x, "[:][0-9][0-9]$")
x <- as.numeric(x)
x <- as.character(ifelse(x<1,x*2400,x))
for(i in 1:length(x)){
ii <- stri_locate_first_regex(x[i],"\\.")[1]
if(!is.na(ii)){
x[i] <- str_sub(x[i],1,ii-1)
}
}
for(i in 1:length(x)){
while (nchar(x[i])<4) {
x[i] <- paste0("0",x[i])
}
}
x
[1] "2233" "1520" "1000" "0300" "1525"
>
Edit: I realized this:
the decimals need to be multiplied by 2400 to get back to military time
Isn't quite right. "2400" isn't a decimal number (technically it's sexagesimal, base 60), so decimal multiplication won't give a correct result. I've changed my code accordingly.
Rather than using the same regex on everything right out the gate, I would first determine the format of each element of x, then process the element accordingly.
I like the hms library for working with times and use it below, but you could also use the base POSIXct or POSIXlt class.
library(tidyverse)
library(hms)
x <- c("0.9305555555555547", "15:20 Found", "10:00:00 AM Found", "0.125", "Found 1525")
# ----
# define functions for parsing each possible format in `x`
parse_decimal <- function(x) {
hrs <- as.numeric(str_extract(x, "^0\\.\\d+")) * 24
min <- (hrs %% 1) * 60
hrs <- floor(hrs)
hms(hours = hrs, minutes = min)
}
parse_timestring <- function(x) {
out <- str_remove_all(x, "\\D")
hr_digits <- if_else(str_length(out) == 3, 1, 2)
hrs <- as.numeric(str_sub(out, end = hr_digits))
hrs <- if_else(str_detect(str_to_upper(x), "P\\.?M"), hrs + 12, hrs)
min <- as.numeric(str_sub(out, start = hr_digits + 1))
hms(hours = hrs, minutes = min)
}
# ----
# test each element of x, and pass to appropriate parsing Fx
time <- case_when(
str_detect(x, "^[01]$|^0\\.\\d") ~ parse_decimal(x),
str_detect(x, "\\d{1,2}:?\\d{2}") ~ parse_timestring(x),
TRUE ~ NA_real_
)
time
# 22:20:00.000000
# 15:20:00.000000
# 10:00:00.000000
# 03:00:00.000000
# 15:25:00.000000
Related
I need to turn my time variable (all data formatted in HH:MM:SS) into a a numeric and decimal of the hour.
For example 07:05:00 turns into 7.083 using base R (I'm in a secure lab so can't access packages).
Is there a way to do this using base R code?
You can easily write your own parser:
parse_time <- function(x) {
res <- do.call(rbind, strsplit(x, ":", TRUE))
mode(res) <- "numeric"
c(res %*% (1/c(1, 60, 3600)))
}
parse_time(c("07:05:00","07:05:30"))
#[1] 7.083333 7.091667
This was how I managed it in the end:
Test$Hours <- as.character(Test$Offence.Start.Time)
Test$Hours <- sapply(strsplit(Test$Hours,":"),
function(x) {
x <- as.numeric(x)
x[1]+x[2]/60})
I'm reading in data from another platform where a combination of the strings listed below is used for expressing timestamps:
\* = current time
t = current day (00:00)
mo = month
d = days
h = hours
m = minutes
For example, *-3d is current time minus 3 days, t-3h is three hours before today morning (midnight yesterday).
I'd like to be able to ingest these equations into R and get the corresponding POSIXct value. I'm trying using regex in the below function but lose the numeric multiplier for each string:
strTimeConverter <- function(z){
ret <- stringi::stri_replace_all_regex(
str = z,
pattern = c('^\\*',
'^t',
'([[:digit:]]{1,})mo',
'([[:digit:]]{1,})d',
'([[:digit:]]{1,})h',
'([[:digit:]]{1,})m'),
replacement = c('Sys.time()',
'Sys.Date()',
'*lubridate::months(1)',
'*lubridate::days(1)',
'*lubridate::hours(1)',
'*lubridate::minutes(1)'),
vectorize_all = F
)
return(ret)
# return(eval(expr = parse(text = ret)))
}
> strTimeConverter('*-5mo+3d+4h+2m')
[1] "Sys.time()-*lubridate::months(1)+*lubridate::days(1)+*lubridate::hours(1)+*lubridate::minutes(1)"
> strTimeConverter('t-5mo+3d+4h+2m')
[1] "Sys.Date()-*lubridate::months(1)+*lubridate::days(1)+*lubridate::hours(1)+*lubridate::minutes(1)"
Expected output:
# *-5mo+3d+4h+2m
"Sys.time()-5*lubridate::months(1)+3*lubridate::days(1)+4*lubridate::hours(1)+4*lubridate::minutes(1)"
# t-5mo+3d+4h+2m
"Sys.Date()-5*lubridate::months(1)+3*lubridate::days(1)+4*lubridate::hours(1)+4*lubridate::minutes(1)"
I assumed that wrapping the [[:digit]]{1,} in parentheses () would preserve them but clearly that's not working. I defined the pattern like this else the code replaces repeat occurrences e.g. * gets converted to Sys.time() but then the m in Sys.time() gets replaced with *lubridate::minutes(1).
I plan on converting the (expected) output to R date-time using eval(parse(text = ...)) - currently commented out in the function.
I'm open to using other packages or approach.
Update
After tinkering around for a bit, I found the below version works - I'm replacing strings in the order such that newly replaced characters are not replaced again:
strTimeConverter <- function(z){
ret <- stringi::stri_replace_all_regex(
str = z,
pattern = c('y', 'd', 'h', 'mo', 'm', '^t', '^\\*'),
replacement = c('*years(1)',
'*days(1)',
'*hours(1)',
'*days(30)',
'*minutes(1)',
'Sys.Date()',
'Sys.time()'),
vectorize_all = F
)
ret <- gsub(pattern = '\\*', replacement = '*lubridate::', x = ret)
rdate <- (eval(expr = parse(text = ret)))
attr(rdate, 'tzone') <- 'UTC'
return(rdate)
}
sample_string <- '*-5mo+3d+4h+2m'
strTimeConverter(sample_string)
This works but is not very elegant and will likely fail as I'm forced to incorporate other expressions (e.g. yd for day of the year e.g. 124).
You can use backreferences in the replacements like this:
library(stringr)
x <- c("*-5mo+3d+4h+2m", "t-5mo+3d+4h+2m")
repl <- c('^\\*' = 'Sys.time()', '^t' = 'Sys.Date()', '(\\d+)mo' = '\\1*lubridate::months(1)', '(\\d+)d' = '\\1*lubridate::days(1)', '(\\d+)h' = '\\1*lubridate::hours(1)', '(\\d+)m' = '\\1*lubridate::minutes(1)')
stringr::str_replace_all(x, repl)
## => [1] "Sys.time()-5*lubridate::months(1)+3*lubridate::days(1)+4*lubridate::hours(1)+2*lubridate::minutes(1)"
## [2] "Sys.Date()-5*lubridate::months(1)+3*lubridate::days(1)+4*lubridate::hours(1)+2*lubridate::minutes(1)"
See the R demo online.
See, for example, '(\\d+)mo' = '\\1*lubridate::months(1)'. Here, (\d+)mo matches and captures into Group 1 one or more digits, and mo is just matched. Then, when the match is found, \1 in \1*lubridate::months(1) inserts the contents of Group 1 into the resulting string.
Note that it might make the replacements safer if you cap the time period match with a word boundary (\b) on the right:
repl <- c('^\\*' = 'Sys.time()', '^t' = 'Sys.Date()', '(\\d+)mo\\b' = '\\1*lubridate::months(1)', '(\\d+)d\\b' = '\\1*lubridate::days(1)', '(\\d+)h\\b' = '\\1*lubridate::hours(1)', '(\\d+)m\\b' = '\\1*lubridate::minutes(1)')
It won't work if the time spans are glued one to another without any non-word delimiters, but you have + in your example strings, so it is safe here.
Actually, you can make it work with the function you used, too. Just make sure the backreferences have the $n syntax:
x <- c("*-5mo+3d+4h+2m", "t-5mo+3d+4h+2m")
pattern = c('^\\*', '^t', '(\\d+)mo', '(\\d+)d', '(\\d+)h', '(\\d+)m')
replacement = c('Sys.time()', 'Sys.Date()', '$1*lubridate::months(1)', '$1*lubridate::days(1)', '$1*lubridate::hours(1)', '$1*lubridate::minutes(1)')
stringi::stri_replace_all_regex(x, pattern, replacement, vectorize_all=FALSE)
Output:
[1] "Sys.time()-5*lubridate::months(1)+3*lubridate::days(1)+4*lubridate::hours(1)+2*lubridate::minutes(1)"
[2] "Sys.Date()-5*lubridate::months(1)+3*lubridate::days(1)+4*lubridate::hours(1)+2*lubridate::minutes(1)"
Another option to produce the time directly, would be the following:
strTimeConvert <- function(base=Sys.time(), delta="-5mo+3d+4h+2m"){
mo <- gsub(".*([+-]\\d+)mo.*", "\\1", x)
ds <- gsub(".*([+-]\\d+)d.*", "\\1", x)
hs <- gsub(".*([+-]\\d+)h.*", "\\1", x)
ms <- gsub(".*([+-]\\d+)m.*", "\\1", x)
out <- base + months(as.numeric(mo)) + days(as.numeric(ds)) +
hours(as.numeric(hs)) + minutes(as.numeric(ms))
out
}
strTimeConvert()
# [1] "2020-07-21 20:32:19 EDT"
D <- "06.12.1948" # which is dd.mm.yyyy
as.Date(D, "%d.%m.%y") # convert to date
[1] "2019-12-06" # ????
what is it that I am missing?
Sys.getlocale(category = "LC_ALL")
[1] "LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252"
The format is case-sensitive ("%y" is ambiguous and system dependent, I believe):
as.Date(D, "%d.%m.%Y")
[1] "1948-12-06"
The help topic ?strptime has details:
‘%y’ Year without century (00-99). On input, values 00 to 68 are
prefixed by 20 and 69 to 99 by 19 - that is the behaviour
specified by the 2004 and 2008 POSIX standards, but they do
also say ‘it is expected that in a future version the default
century inferred from a 2-digit year will change’.
To avoid remembering formats of the date we can use packaged solutions.
1) With lubridate
lubridate::dmy(D)
#[1] "1948-12-06"
2) Using anytime
anytime::anydate(D)
#[1] "1948-06-12"
Might be helpful for someone. I have found this function in tutorial "Handling date-times in R" by Cole Beck. The function identifies format of your data.
# FUNCTION guessDateFormat #x vector of character dates/datetimes #returnDates return
# actual dates rather than format convert character datetime to POSIXlt datetime, or
# at least guess the format such that you could convert to datetime
guessDateFormat <- function(x, returnDates = FALSE, tzone = "") {
x1 <- x
# replace blanks with NA and remove
x1[x1 == ""] <- NA
x1 <- x1[!is.na(x1)]
if (length(x1) == 0)
return(NA)
# if it's already a time variable, set it to character
if ("POSIXt" %in% class(x1[1])) {
x1 <- as.character(x1)
}
dateTimes <- do.call(rbind, strsplit(x1, " "))
for (i in ncol(dateTimes)) {
dateTimes[dateTimes[, i] == "NA"] <- NA
}
# assume the time part can be found with a colon
timePart <- which(apply(dateTimes, MARGIN = 2, FUN = function(i) {
any(grepl(":", i))
}))
# everything not in the timePart should be in the datePart
datePart <- setdiff(seq(ncol(dateTimes)), timePart)
# should have 0 or 1 timeParts and exactly one dateParts
if (length(timePart) > 1 || length(datePart) != 1)
stop("cannot parse your time variable")
timeFormat <- NA
if (length(timePart)) {
# find maximum number of colons in the timePart column
ncolons <- max(nchar(gsub("[^:]", "", na.omit(dateTimes[, timePart]))))
if (ncolons == 1) {
timeFormat <- "%H:%M"
} else if (ncolons == 2) {
timeFormat <- "%H:%M:%S"
} else stop("timePart should have 1 or 2 colons")
}
# remove all non-numeric values
dates <- gsub("[^0-9]", "", na.omit(dateTimes[, datePart]))
# sep is any non-numeric value found, hopefully / or -
sep <- unique(na.omit(substr(gsub("[0-9]", "", dateTimes[, datePart]), 1, 1)))
if (length(sep) > 1)
stop("too many seperators in datePart")
# maximum number of characters found in the date part
dlen <- max(nchar(dates))
dateFormat <- NA
# when six, expect the century to be omitted
if (dlen == 6) {
if (sum(is.na(as.Date(dates, format = "%y%m%d"))) == 0) {
dateFormat <- paste("%y", "%m", "%d", sep = sep)
} else if (sum(is.na(as.Date(dates, format = "%m%d%y"))) == 0) {
dateFormat <- paste("%m", "%d", "%y", sep = sep)
} else stop("datePart format [six characters] is inconsistent")
}else if (dlen == 8) {
if (sum(is.na(as.Date(dates, format = "%Y%m%d"))) == 0) {
dateFormat <- paste("%Y", "%m", "%d", sep = sep)
} else if (sum(is.na(as.Date(dates, format = "%m%d%Y"))) == 0) {
dateFormat <- paste("%m", "%d", "%Y", sep = sep)
} else stop("datePart format [eight characters] is inconsistent")
} else {
stop(sprintf("datePart has unusual length: %s", dlen))
}
if (is.na(timeFormat)) {
format <- dateFormat
} else if (timePart == 1) {
format <- paste(timeFormat, dateFormat)
} else if (timePart == 2) {
format <- paste(dateFormat, timeFormat)
} else stop("cannot parse your time variable")
if (returnDates)
return(as.POSIXlt(x, format = format, tz = tzone))
format
}
# generate some dates
mydates <- format(as.POSIXct(sample(31536000, 20), origin = "2011-01-01", tz = "UTC"), "%m.%d.%Y %H:%M")
mydates
## [1] "02/07/2011 06:51" "11/21/2011 17:03" "09/17/2011 22:42" "02/16/2011 13:45"
## [5] "12/14/2011 19:11" "09/08/2011 09:22" "12/06/2011 14:06" "02/02/2011 11:00"
## [9] "03/27/2011 06:12" "01/05/2011 15:09" "04/15/2011 04:17" "10/20/2011 14:20"
## [13] "11/13/2011 21:46" "02/26/2011 03:24" "12/29/2011 11:02" "03/17/2011 02:24"
## [17] "02/27/2011 13:51" "06/27/2011 08:36" "03/14/2011 10:54" "01/28/2011 14:14"
guessDateFormat(mydates)
[1] "%m.%d.%Y %H:%M"
Lubridate is the best option for this in my opinion. The following will work fine.
`data %>% mutate(date_variable = as.Date(dmy(date_variable)))`
Interestingly though I found dmy() to behave weirdly when as.Date() and dmy() were called in separate steps
I have this input: 2020-03-11 and I want to return 03-11, How can I do that in R?. For example, if I want to have only month, I can assign the month this way: Month=lubridate::month(data)
Here are a few ways. All are vectorized, i.e. x can be a vector of such strings. No packages are used.
x <- "2020-03-11" # input
substring(x, 6)
sub("\\d+-", "", x)
trimws(trimws(x, "left", "\\d"), "left", "-")
format(as.Date(x), "%m-%d")
A different approach is to create an S3 subclass of Date that represents a Date but displays just the month and day storing the full date so that it is recoverable. as.md constructs an object of this new class, as.Date.md converts it back to Date class and format.md formats it. If we print such as object it will look for print.md but we have not defined it so it will use print.Date, the print method of the super class of md, and that method calls format invoking format.md.
as.md <- function(x, ...) structure(as.Date(x), class = c("md", "Date"))
as.Date.md <- function(x, ...) structure(x, class = "Date")
format.md <- function(x, format = "%m-%d", ...) format(as.Date(x), format = format, ...)
y <- as.md(x)
y
## [1] "03-11"
as.Date(y) # recover the full date
##[1] "2020-03-11"
data.frame(x, y)
## x y
## 1 2020-03-11 03-11
One easy way to do this is:
library(lubridate)
x <- "2020-03-11"
month <-month(x)
day <- day(x)
paste(month,"-",day)
and here is the result:
"3 - 11"
So, basically, I used lubridate to extract day and month and then used paste function to put those two together.
Another alternative is to use the code below (no lubridate):
format(as.Date(x), "%m-%d")
here is the result:
"03-11"
I am trying to write a function which takes a vector of dates as an input and returns a vector of dates -- where the output is the date of the first Tuesday of the month which matches the input date.
So 2012-11-19 --> 2012-11-06, etc.
I have had some success with a single date, but have not been able to generalise to the vector case. Could someone please help?
This is what I have so far:
firstTuesday <- function(tt){
ct <- as.POSIXct(tt)
lt <- as.POSIXlt(tt)
firstOf <- as.POSIXlt(ct - 60*60*24* (lt$mday - 1))
if (firstOf$wday > 2)
{
adjDays <- (9 - firstOf$wday)
firstTues <- as.POSIXlt(as.POSIXct(firstOf) + 60*60*24*adjDays)
}
else {
adjDays <- (2 - firstOf$wday)
firstTues <- as.POSIXlt(as.POSIXct(firstOf) + 60*60*24*adjDays)
}
return(firstTues)
}
Which works for a single date: firstTuesday(Sys.Date()) but yielded junk for vectors of dates (due to issues with if not being a vectorised control operator, i think).
I got around my limited understanding by using indexing. The following code seems to do the trick.
firstTuesday <- function(tt){
ct <- as.POSIXct(tt)
lt <- as.POSIXlt(tt)
firstOf <- as.POSIXlt(ct - 60*60*24* (lt$mday - 1))
firstTue <- as.POSIXct(firstOf)
idx <- firstOf$wday > 2
firstTue[idx] <- as.POSIXct(firstOf[idx]) + 60*60*24*(9 - firstOf$wday[idx])
firstTue[!idx] <- as.POSIXct(firstOf[!idx]) + 60*60*24*(2 - firstOf$wday[!idx])
return(firstTue)
}
This uses lubridate and makes the logic a little simpler. Given a vector of dates the second function will return a vector of characters, similar to your input. You can change things around to suit your needs.
library(lubridate)
getTuesday = function(x) {
date = ymd(x)
first = floor_date(date,"month")
dow = sapply(seq(0,6),function(x) wday(first+days(x)))
firstTuesday = first + days(which(dow==3)-1)
return(firstTuesday)
}
getMultipleTuesdays = function(y) {
tmp = lapply(y, getTuesday)
tmp = lapply(tmp, as.character)
return(unlist(tmp))
}
Edit
Sample input/output
getMultipleTuesdays(c("2012-11-19","2012-11-19","2011-01-15"))
[1] "2012-11-06" "2012-11-06" "2011-01-04"
Here's a simple solution using base functions:
firstDayOfMonth <- function(dates, day="Mon", abbreviate=TRUE) {
# first 7 days of month
s <- lapply(as.Date(format(dates,"%Y-%m-01")), seq, by="day", length.out=7)
# first day of month
d <- lapply(s, function(d) d[weekdays(d,abbreviate)==day])
# unlist converts to atomic, so use do.call(c,...) instead
do.call(c, d)
}
Well, maybe the do.call at the end isn't so simple... but it's a handy piece of knowledge. :)
R> d <- as.Date(c("2012-11-19","2012-11-19","2011-01-15"))
R> firstDayOfMonth(d, "Tuesday", FALSE)
[1] "2012-11-06" "2012-11-06" "2011-01-04"