convert string to time-format in R language - r

I have strings like this:
100
200
...
2300
how to transfer this to time format?
01:00:00
02:00:00
...
23:00:00
do I have to add 0 to the string?
I have tried
Data$Time <- formatC(Data$Time, digits = 6, flag = "0")
But it's not working

We can use sprintf to do the formatting and then convert to Time with as.ITime
library(data.table)
as.ITime(sub("(..)", "\\1:", sprintf("%04d:00", v1)))
#[1] "01:00:00" "02:00:00" "23:00:00"
As #Claudio mentioned, if it is a vector of strings, replace the %04d with %04s
data
v1 <- c(100, 200, 2300)

Related

Converting character to dates with hours and minutes

I'm having trouble converting character values into date (hour + minutes), I have the following codes:
start <- c("2022-01-10 9:35PM","2022-01-10 10:35PM")
end <- c("2022-01-11 7:00AM","2022-01-11 8:00AM")
dat <- data.frame(start,end)
These are all in character form. I would like to:
Convert all the datetimes into date format and into 24hr format like: "2022-01-10 9:35PM" into "2022-01-10 21:35",
and "2022-01-11 7:00AM" into "2022-01-11 7:00" because I would like to calculate the difference between the dates in hrs.
Also I would like to add an ID column with a specific ID, the desired data would like this:
ID <- c(101,101)
start <- c("2022-01-10 21:35","2022-01-10 22:35")
end <- c("2022-01-11 7:00","2022-01-11 8:00")
diff <- c(9,10) # I'm not sure how the calculations would turn out to be
dat <- data.frame(ID,start,end,diff)
I would appreciate all the help there is! Thanks!!!
You can use lubridate::ymd_hm. Don't use floor if you want the exact value.
library(dplyr)
library(lubridate)
dat %>%
mutate(ID = 101,
across(c(start, end), ymd_hm),
diff = floor(end - start))
start end ID diff
1 2022-01-10 21:35:00 2022-01-11 07:00:00 101 9 hours
2 2022-01-10 22:35:00 2022-01-11 08:00:00 101 9 hours
The base R approach with strptime is:
strptime(dat$start, "%Y-%m-%d %H:%M %p")
[1] "2022-01-10 09:35:00 CET" "2022-01-10 10:35:00 CET"

R convert yy-mm string to date format [duplicate]

I have a data frame (df) like the following:
Date Arrivals
2014-07 100
2014-08 150
2014-09 200
I know that I can convert the yearmon dates to the first date of each month as follows:
df$Date <- as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
However, given that my data is not available until the end of the month I want to index it to the end rather than the beginning, and I cannot figure it out. Any help appreciated.
If the Date variable is an actual yearmon class vector, from the zoo package, the as.Date.yearmon method can do what you want via its argument frac.
Using your data, and assuming that the Date was originally a character vector
library("zoo")
df <- data.frame(Date = c("2014-07", "2014-08", "2014-09"),
Arrivals = c(100, 150, 200))
I convert this to a yearmon vector:
df <- transform(df, Date2 = as.yearmon(Date))
Assuming this is what you have, then you can achieve what you want using as.Date() with frac = 1:
df <- transform(df, Date3 = as.Date(Date2, frac = 1))
which gives:
> df
Date Arrivals Date2 Date3
1 2014-07 100 Jul 2014 2014-07-31
2 2014-08 150 Aug 2014 2014-08-31
3 2014-09 200 Sep 2014 2014-09-30
That shows the individual steps. If you only want the final Date this is a one-liner
## assuming `Date` is a `yearmon` object
df <- transform(df, Date = as.Date(Date, frac = 1))
## or if not a `yearmon`
df <- transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
The argument frac in the fraction of the month to assign to the resulting dates when converting from yearmon objects to Date objects. Hence, to get the first day of the month, rather than convert to a character and paste on "-01" as your Question showed, it's better to coerce to a Date object with frac = 0.
If the Date in your df is not a yearmon class object, then you can solve your problem by converting it to one and then using the as.Date() method as described above.
Here is a way to do it using the zoo package.
R code:
library(zoo)
df
# Date Arrivals
# 1 2014-07 100
# 2 2014-08 150
# 3 2014-09 200
df$Date <- as.Date(as.yearmon(df$Date), frac = 1)
# output
# Date Arrivals
# 1 2014-07-31 100
# 2 2014-08-31 150
# 3 2014-09-30 200
Using lubridate, you can add a month and subtract a day to get the last day of the month:
library(lubridate)
ymd(paste0(df$Date, '-01')) + months(1) - days(1)
# [1] "2014-07-31" "2014-08-31" "2014-09-30"

Parse time data with subseconds

I have some time data
01:09:00
00:14:00
00:00:00
11:47:00
10:34:00
08:15:00
The data are measured in %M:%S:00 (to the first numbers are the minutes, the second numbers are the seconds). I would like to convert this into a total number of seconds. This is easy to do with lubridate but R keeps thinking the format is in %H:%M:%S.
Can lubridate calculate the total number of seconds elapsed in the format my data are in? If not, how is the best way to transform the data into an appropriate format?
I've thought about converting to character and just splicing out the minutes and seconds.
library(lubridate)
foo = function(x){
hms(sapply(strsplit(x, ":"), function(xx) paste("01", xx[1], xx[2], sep = ":")))
}
a = "01:09:00"
b = "00:14:00"
foo(a) - foo(b)
#[1] "1M -5S"
#OR
as.period(foo(a) - foo(b), unit = "secs")
#[1] "55S"
Maybe the following will do it.
NumSeconds <- function(x){
f <- function(y)
sum(sapply(strsplit(y, split=":"), as.numeric) * c(60, 1, 0))
unname(sapply(x, f))
}
x <- scan(what = "character", text = "
01:09:00
00:14:00
00:00:00
11:47:00
10:34:00
08:15:00")
NumSeconds(x)
[1] 69 14 0 707 634 495
You may use data.table::as.ITime and specify format as "%M:%S"*:
x <- c("01:09:00", "10:34:00")
as.integer(as.ITime(x, format = "%M:%S"))
# [1] 69 634
*The format argument is passed to strptime and...
Each input string is processed as far as necessary for the format specified: any trailing characters are ignored.
[...]
Note that %S does not read fractional parts on output.
Or, most likely faster, substr:
as.integer(substr(x, 1, 2)) * 60 + as.integer(substr(x, 4, 5))
# [1] 69 634

I want to convert a number to time in R

say I have a number 1234, and I need to convert that to 12:34 i.e 12:34pm and eventually convert that to minutes in the day starting from 0000.
A bit of integer division and modulo should work:
x <- c(1234,830)
(x %/% 100) * 60 + x %% 100
#[1] 754 510
If you absolutely need a time representation first:
tmp <- as.POSIXct(sprintf("%04d", x), format="%H%M")
tmp - trunc(tmp, "day")
#Time differences in mins
#[1] 754 510
We can do this with sub and times from chron
library(chron)
times(sub("(.{2})", "\\1:", sprintf("%04d:00", x)))
#[1] 12:34:00 08:30:00
If we need to convert to 'minute' then
library(lubridate)
minute(as.period(hms(sub("(.{2})", "\\1:", sprintf("%04d:00", x))), unit = "minute"))
#[1] 754 510
data
x <- c(1234,830)
Assuming your time integer is based on a 24 hour format (which should be otherwise you can't distinguish between am and pm):
time <- 1234
time_converted <- sub("(\\d+)(\\d{2})", "\\1:\\2", time)
> time_converted
[1] "12:34"
minutes <- as.POSIXlt(time_converted, format="%H:%M")$hour *60 + as.POSIXlt(time_converted, format="%H:%M")$min
> minutes
[1] 754
You could use:
x <- as.POSIXct(x = "1234", format = "%H%M", tz = "UTC")
minutes(x) + hour(x) * 60
Result:
[1] 754

Parsing ambiguous timestamps

I have been provided a dataset with an ambiguous date format, e.g:
d_raw <- c("1102001 23:00", "1112001 0:00")
I would like to try to parse this date into a POSIXlt object in R. The source of the file assures me that the file is in chronological order, that the date format is month, then day, then year, and that there are no gaps in the time series.
Is there any way to parse this date format, using the ordering to resolve ambiguities? E.g. the first element above should parse to c("2001-01-10 23:00:00", "2001-01-11 00:00:00") rather than c("2001-01-10 23:00:00", "2001-11-01 00:00:00").
How about this (using regular expressions)
d_raw <- c("192001 16:00", "1102001 23:00", "1112001 0:00")
re <- "^(.+?)([1-9]|[1-3][0-9])(\\d{4}) (\\d{1,2}):(\\d{2})$"
m <- regexec(re, d_raw)
parts <- regmatches(d_raw, m)
lapply(parts, function(x) {
x<-as.numeric(x[-1])
ISOdate(x[3], x[1], x[2], x[4], x[5])
})
# [[1]]
# [1] "2001-01-09 16:00:00 GMT"
#
# [[2]]
# [1] "2001-01-10 23:00:00 GMT"
#
# [[3]]
# [1] "2001-01-11 GMT"
If you had more test cases that would be helpful just to make sure the regular expression correctly works.
I pity you for your horrible data vendor, so I decided to try and fix this for you.
# make up some horrid data
d_bad <- as.POSIXlt(seq(as.Date("2014-01-01"), as.Date("2014-12-31"), by=1))
d_raw <- paste0(d_bad$mon+1, d_bad$mday, d_bad$year+1900)
d_new <- d_raw
# not ambiguous when nchar is 6
d_new <- ifelse(nchar(d_new)==6,
paste0("0", substr(d_new,1,1), "0", substr(d_new,2,nchar(d_new))), d_new)
# now not ambiguous when nchar is 7 and it doesn't begin with a "1"
d_new <- ifelse(nchar(d_new)==7 & substr(d_new,1,1) != "1",
paste0("0",d_new), d_new)
# now guess a leading zero and parse
d_new <- ifelse(nchar(d_new)==7, paste0("0",d_new), d_new)
d_try <- as.Date(d_new, "%m%d%Y")
# now only days in October, November, and December might be wrong
bad <- cumsum(c(1L,as.integer(diff(d_try)))-1L) < 0L
# put the leading zero in the day, but remember "bad" rows have an
# extra leading zero, so make sure to skip it
d_try2 <- ifelse(bad,
paste0(substr(d_new,2,3),"0", substr(d_new,4,nchar(d_new))), d_new)
# convert to Date, POSIXlt, whatever and do a happy dance
d_YAY <- as.Date(d_try2, "%m%d%Y")
data.frame(d_raw, d_new, d_try, bad, d_try2, d_YAY)
# d_raw d_new d_try bad d_try2 d_YAY
# 1 112014 01012014 2014-01-01 FALSE 01012014 2014-01-01
# 2 122014 01022014 2014-01-02 FALSE 01022014 2014-01-02
# 3 132014 01032014 2014-01-03 FALSE 01032014 2014-01-03
# 4 142014 01042014 2014-01-04 FALSE 01042014 2014-01-04
# 5 152014 01052014 2014-01-05 FALSE 01052014 2014-01-05
# 6 162014 01062014 2014-01-06 FALSE 01062014 2014-01-06
I only did this with Dates in order to keep the example data set small. Doing this for POSIXlt would be very similar, except you would need to change the as.Date calls to as.POSIxlt and adjust the format accordingly.

Resources