I have a data frame df like:
ID time
a 121:24:30
b 130:30:00
The time column is of factor after importing data.
I want convert the values of time column into minutes. At first, I have tried:
df$time <- times(df$time)
But I got warning message:
"out of day time entry"
I notice the value in the hour position is more than 24 in my dataset.
So how am I supposed to do now?
Thanks in advance!
You could use the lubridate package for this.
library(lubridate)
x <- hms(df$time)
(hour(x) * 60) + minute(x) + (second(x) / 60)
# [1] 7284.5 7830.0
Assuming your data is saved as dat use the following
#convert to character
dat$time <- as.character(dat$time)
#split by ":"
times <- strsplit(dat$time, ":")
# get minutes
dat$time <- sapply(times, function(x){
x = as.numeric(x)
x[1]*60+x[2]+x[3]/60
})
Another option (just for fun) is to play around with the gsubfn package
s <- factor(c("121:24:30", "130:30:00"))
library(gsubfn)
as.numeric(gsubfn("(\\d+):(\\d+):(\\d+)",
~ as.numeric(x)*60 + as.numeric(y) + as.numeric(z)/60,
as.character(s)))
## [1] 7284.5 7830.0
Related
This question already has answers here:
How to prevent ifelse() from turning Date objects into numeric objects
(7 answers)
Closed 2 years ago.
I have the following dataset:
A B
2007-11-22 2004-11-18
<NA> 2004-11-10
when the value of column A is NA, I want this value to be replaced by the date in B, except with an additional 25 days added.
Here is what the outcome should look like:
A B
2007-11-22 2004-11-18
2004-12-05 2004-11-10
So far, I have tried the following if else formula, but with no success.
library(lubridate)
data$A<- ifelse(is.na(data$A),data$B+days(25),data$A)
Could anyone tell me what's wrong with it or give me an alternate solution? The code to build my dataset is below.
A<-c("2007-11-22 01:00:00", NA)
B<-c("2004-11-18","2004-11-10")
data<-data.frame(A,B)
data$A<-as.Date(data$A);data$B<-as.Date(data$B)
The reason of the issue can be traced back from the source code of ifelse. When you type View(ifelse), you will see some lines in the bottom of the source code as below
ans <- test
len <- length(ans)
ypos <- which(test)
npos <- which(!test)
if (length(ypos) > 0L)
ans[ypos] <- rep(yes, length.out = len)[ypos]
if (length(npos) > 0L)
ans[npos] <- rep(no, length.out = len)[npos]
ans
where test is logic array, and ans is initialized as a copy of test. When running ans[ypos] <- rep(yes, length.out = len)[ypos], the class of ans is coerced to numeric, rather than Date. That's why you have integers on A column after using ifelse.
You can try the code below
data$A <- as.Date(ifelse(is.na(data$A), data$B + days(25), data$A), origin = "1970-01-01")
which gives
> data
A B
1 2007-11-22 2004-11-18
2 2004-12-05 2004-11-10
Assuming the data given reproducibly in the Note at the end -- in particular we assume both columns are of Date class -- compute a logical vector is_na which indicates which entries are NA and then set those from B.
is_na <- is.na(data$A)
data$A[is_na] <- data$B[is_na] + 25
This would also work and has the advantage that it does not overwrite data:
transform(data, A = replace(A, is.na(A), B[is.na(A)] + 25))
Note
Lines <- "
A B
2007-11-22 2004-11-18
NA 2004-11-10"
data <- read.table(text = Lines, header = TRUE)
data[] <- lapply(data, as.Date) # convert to Date class
Instead of ifelse you could use coalesce
library(tidyverse)
library(lubridate)
A <- c("2007-11-22 01:00:00", NA)
B <- c("2004-11-18","2004-11-10")
data <-data.frame(A,B)
data <- data %>%
mutate(A = as_date(A),
B = as_date(B),
A = coalesce(A,B+days(25)))
I've got the following time frame:
A <- c('2016-01-01', '2019-01-05')
B <- c('2017-05-05','2019-06-05')
X_Period <- interval("2015-01-01", "2019-12-31")
Y_Periods <- interval(A, B)
I'd like to find the non overlapping periods between X_Period and Y_Periods so that the result would be:
[1]'2015-01-01'--'2015-12-31'
[2]'2017-05-06'--'2019-01-04'
[3]'2019-06-06'--'2019-31-12'
I'm trying to use setdiff but it does not work
setdiff(X_Period, Y_Periods)
Here is an option:
library(lubridate)
seq_X <- as.Date(seq(int_start(X_Period), int_end(X_Period), by = "1 day"))
seq_Y <- as.Date(do.call("c", sapply(Y_Periods, function(x)
seq(int_start(x), int_end(x), by = "1 day"))))
unique_dates_X <- seq_X[!seq_X %in% seq_Y]
lst <- aggregate(
unique_dates_X,
by = list(cumsum(c(0, diff.Date(unique_dates_X) != 1))),
FUN = function(x) c(min(x), max(x)),
simplify = F)$x
lapply(lst, function(x) interval(x[1], x[2]))
#[[1]]
#[1] 2015-01-01 UTC--2015-12-31 UTC
#
#[[2]]
#[1] 2017-05-06 UTC--2019-01-04 UTC
#
#[[3]]
#[1] 2019-06-06 UTC--2019-12-31 UTC
The strategy is to convert the intervals to by-day sequences (one for X_Period and one for Y_Period); then we find all days that are only part of X_Period (and not part of Y_Periods). We then aggregate to determine the first and last date in all sub-sequences of consecutive dates. The resulting lst is a list with those start/end dates. To convert to interval, we simply loop through the list and convert the start/end dates to an interval.
I have a data frame like this:
df = data.frame(dt = c('0101-01-01','0023-10-20'), comment = c('BC','AD'))
the second dt is actually year -23 according to comment.
how can I make R recognise the first date is a BC and get the time difference from these two dates?
We convert to numeric after changing to yearmon class, change the sign to - for those having 'BC' in 'comment' and take the difference
library(zoo)
v2 <- as.numeric(as.yearmon(df$dt))
If we want to make the 'year' more approximate
v2 <- lubridate::year(df$dt) +
(strptime(df$dt, format = "%Y-%m-%d")$yday + 1)/365
i1 <- df$comment == "BC"
v2[i1] <- -1* v2[i1]
diff(v2)
#[1] 124.75
I'm currently struggling with R and calculating the time difference in days.
I have data.frame with around 60 000 rows. In this data frame there are two columns called "start" and "end". Both columns contain data in UNIX time format WITH milliseconds - as you can see by the last three digits.
Start <- c("1470581434000", "1470784954000", "1470811368000", "1470764345000")
End <- c("1470560601000", "1470581549000", "1470785452000", "1470764722000")
d <- data.frame(Start, End)
My desired output should be a extra column called timediff where the time difference is outline in days.
I tried it with timediff and strptime which I found here. But nothing worked out.
Maybe one of you worked with calculation of time differences in the past.
Thanks a lot
There is a very small and fast solution:
Start_POSIX <- as.POSIXct(as.numeric(Start)/1000, origin="1970-01-01")
End_POSIX <- as.POSIXct(as.numeric(End)/1000, origin="1970-01-01")
difftime(Start_POSIX, End_POSIX)
Time differences in mins
[1] 347.216667 3390.083333 431.933333 -6.283333
or if you want another unit:
difftime(Start_POSIX, End_POSIX, unit = "sec")
Time differences in secs
[1] 20833 203405 25916 -377
You have a few steps you'll need to take:
# 1. Separate the milliseconds.
# To do this, insert a period in front of the last three digits
Start <-
sub(pattern = "(\\d{3}$)", # get the pattern of three digits at the end of the string
replacement = ".\\1", # replace with a . and then the pattern
x = Start)
# 2. Convert to numeric
Start <- as.numeric(Start)
# 3. Convert to POSIXct
Start <- as.POSIXct(Start,
origin = "1970-01-01")
For convenience, it would be good to put these all into a function
# Bundle all three steps into one function
unixtime_to_posixct <- function(x)
{
x <- sub(pattern = "(\\d{3}$)",
replacement = ".\\1",
x = x)
x <- as.numeric(x)
as.POSIXct(x,
origin = "1970-01-01")
}
And with that, you can get your differences in days
#* Put it all together.
library(dplyr)
library(magrittr)
Start <- c("1470581434000", "1470784954000", "1470811368000", "1470764345000")
End <- c("1470560601000", "1470581549000", "1470785452000", "1470764722000")
d <- data.frame(Start,
End,
stringsAsFactors = FALSE)
lapply(
X = d,
FUN = unixtime_to_posixct
) %>%
as.data.frame() %>%
mutate(diff = difftime(Start, End, units = "days"))
I have a data.table of millions of rows and one of the columns is date column. I would like to add 12 months to all the dates in that column and create a new column. So I use the dplyr and lubridate packages E.g.
library(dplyr)
library(lubridate)
new_data <- data %>% mutate(date12m = date %m+% months(12))
This works, however it is very slow for large datasets. Am I missing something? How can this be sped up? I generally don't expect R to run for more than 10 minutes for such a simple task
Edit:
I note that my solution is already more efficient than using as.yearmon. Thanks to Colonel Beauvel for the solution
a <- data.frame(date = rep(today(),1000000))
func = function(u) {
d = as.Date(as.yearmon(u)+1, frac=1)
if(day(u)>day(d)) return(d)
day(d) = day(u)
d
}
pt <- proc.time()
a <- a %>% mutate(date12m = func(date))
data.table::timetaken(pt)
pt <- proc.time()
a <- a %>% mutate(date12m = date %m+% 12)
data.table::timetaken(pt)
Just add 1 with month:
x=seq.Date(from=as.Date("2007-01-01"), to=as.Date("2014-12-12"), by="day")
month(x) = month(x) + 1
#> head(x)
#[1] "2007-02-01" "2007-02-02" "2007-02-03" "2007-02-04" "2007-02-05" "2007-02-06"
Edit : as per #akrun comment here is the solution, using as.yearmon from zoo package. The trick is to do quick check when taking the day of the last date of the next month:
library(zoo)
func = function(u)
{
d = as.Date(as.yearmon(u)+1/12, frac=1)
if(day(u)>day(d)) return(d)
day(d) = day(u)
d
}
x=as.Date(c("2014-01-31","2015-02-28","2013-03-02"))
#> as.Date(sapply(x, func))
#[1] "2014-02-28" "2015-03-28" "2013-04-02"
I am also working with big data frames in R, you can use the package DescTools, it has a function named AddMonths(date,NoOfMonths).
It works quite well for me.
> a <- ymd("2011-09-9")
> b <- AddMonths(a,1)
> b
[1] "2011-10-09"